Commit Graph

1442 Commits

Author SHA1 Message Date
Michael Kruse 4485ae0890 [CodeGen] Allow undefined loads in statement instances outside context.
A check in assert-builds was meant to verify that a load provides a
value in all statement instances (i.e. its domain).  The domain is
commonly gist'ed within the parameter context to contain fewer
constraints.  However, statement instances outside the context are
no valid executions, hence the value provided can be undefined.

Refine the check for valid loads to only needed to be defined within
the SCoP context.

In addition, the JSONImporter had to be changed to allow importing
access relations that are broader than the current access relation,
but still defined over all statement instances.

This should fix the compiler crash in test-suite's oggenc of the
-polly-process-unprofitable buildbot.

llvm-svn: 329655
2018-04-10 01:20:51 +00:00
Michael Kruse db6f71e48d [ScopInfo] Avoid iterator invalidation.
Commit r329640 introduced the removal of all MemoryAccesses of a Scop.
It accidentally continued iterating over a vector whose iterators
have been invalidated by a MemoryAccess removal.

Make a copy of the MemoryAccesses to remove to iterate over while
removing them.

llvm-svn: 329653
2018-04-10 01:20:41 +00:00
Michael Kruse 192e7f72ca [ScopInfo] Completely remove MemoryAccesses when their parent statement is removed.
Removing a statement left its MemoryAccesses in some lists and maps of
the SCoP.  Which lists depends on at which phase of the SCoP
construction the statement is deleted.  Follow-up passes could still see
the already deleted MemoryAccesses by iterating through these
lists/maps, resulting in an access violation.

When removing a ScopStmt, also remove all its MemoryAccesses by using
the same mechnism that removes a MemoryAccess.

llvm-svn: 329640
2018-04-09 23:13:05 +00:00
Michael Kruse df8e140349 Remove immediate dominator heuristic for error block detection.
This patch removes the heuristic in
- Polly :: lib/Support/ScopHelper.cpp

The heuristic forces blocks that directly follow a loop header to not to be considered error blocks.
It was introduced in r249611 with the following commit message:

>   This replaces the support for user defined error functions by a
>   heuristic that tries to determine if a call to a non-pure function
>   should be considered "an error". If so the block is assumed not to be
>   executed at runtime. While treating all non-pure function calls as
>   errors will allow a lot more regions to be analyzed, it will also
>   cause us to dismiss a lot again due to an infeasible runtime context.
>   This patch tries to limit that effect. A non-pure function call is
>   considered an error if it is executed only in conditionally with
>   regards to a cheap but simple heuristic.

In the code below `CCK_Abort2()` would be considered as an error block, but not `CCK_Abort1()` due to this heuristic.
```
for (int i = 0; i < n; i+=1) {
  if (ErrorCondition1)
    CCK_Abort1(); // No __attribute__((noreturn))
  if (ErrorCondition2)
    CCK_Abort2(); // No __attribute__((noreturn))
}
```

This does not seem useful. Checking error conditions in the beginning of some work is quite common. It causes a switch default-case to be not considered an error block in SPEC's cactuBSSN. The comment justifying the heuristic mentions a "load", which does not seem to be applicable here. It has been proposed to remove the heuristic.

In addition, the patch fixes the following test cases:
- Polly :: ScopDetect/mod_ref_read_pointer.ll
- Polly :: ScopInfo/max-loop-depth.ll
- Polly :: ScopInfo/mod_ref_access_pointee_arguments.ll
- Polly :: ScopInfo/mod_ref_read_pointee_arguments.ll
- Polly :: ScopInfo/mod_ref_read_pointer.ll
- Polly :: ScopInfo/mod_ref_read_pointers.ll

The test cases failed after removing the heuristic.

Differential Revision: https://reviews.llvm.org/D45274

Contributed-by: Lorenzo Chelini <l.chelini@icloud.com>
llvm-svn: 329548
2018-04-09 06:07:44 +00:00
Huihui Zhang 71e54ccd06 [Polly][IslAst] Fix minimal dependence distance.
Summary:
When checking the parallelism of a scheduling dimension, we first check if excluding reduction dependences the loop is parallel or not.
If the loop is not parallel, then we need to return the minimal dependence distance of all data dependences, including the previously subtracted reduction dependences.


Reviewers: grosser, Meinersbur, efriedma, eli.friedman, jdoerfert, bollu

Reviewed By: Meinersbur

Subscribers: llvm-commits, pollydev

Tags: #polly

Differential Revision: https://reviews.llvm.org/D45236

llvm-svn: 329214
2018-04-04 18:08:13 +00:00
Tobias Grosser e5340a8ce9 Move code generation test case to test/CodeGen/
llvm-svn: 327857
2018-03-19 15:05:30 +00:00
Philip Pfaffe 15186d4938 [Polly][CMake] Fix lit setup for building the in the mono repo
Summary:
When building polly as part of the monorepo (actually, as part of any setup
using LLVM_ENABLE_PROJECTS), the LLVMPolly library used in the lit tests ends
up in a different directory in the build tree than in an in-tree build

Reviewers: Meinersbur, grosser, bollu

Reviewed By: Meinersbur

Subscribers: mgorny, bollu, pollydev, llvm-commits

Differential Revision: https://reviews.llvm.org/D44078

llvm-svn: 326702
2018-03-05 14:43:04 +00:00
Tobias Grosser b94863001a [ScopInfo] Do not use the set dimension ids to carry loop information
isl does not guarantee that set dimension ids will be preserved, so using them
to carry information is not a good idea. Furthermore, the loop information can
be derived without problem from the statement itself. As this even requires
less code than propagating loop information on set dimension ids, starting from
this commit we just derive the loop information in collectSurroundingLoops
directly from the IR.

Interestingly this also results in a couple of isl sets to take a simpler
representation.

llvm-svn: 326664
2018-03-03 19:27:54 +00:00
Tobias Grosser fa8079d0dc Update isl to isl-0.18-1047-g4a20ef8
This update:

  - Removes several deprecated functions (e.g., isl_band).
  - Improves the pretty-printing of sets by detecting modulos and "false"
    equalities.
  - Minor improvements to coalescing and increased robustness of the isl
    scheduler.

This update does not yet include isl commit isl-0.18-90-gd00cb45
(isl_pw_*_alloc: add missing check for compatible spaces, Wed Sep 6 12:18:04
2017 +0200), as this additional check is too tight and unfortunately causes
two test case failures in Polly. A patch has been submitted to isl and will be
included in the next isl update for Polly.

llvm-svn: 325557
2018-02-20 07:26:42 +00:00
Michael Kruse a6716d9d81 [ScopBuilder] scalar-indep: Fix mutually referencing PHIs.
Two or more PHIs mutually using each other directly or indirectly as
incoming value could cause that a PHI WRITE be added before the PHI READ
(i.e. it overwrites the current incoming value with the next incoming
value before it being read).

Fix by ensuring that the PHI WRITE and PHI READ are in the same statement.

This should fix the miscompile of SingleSource/Benchmark/Misc/whetstone
from the test-suite.

llvm-svn: 324934
2018-02-12 21:09:40 +00:00
Michael Kruse a43ba2d84f [ScopBuilder] Make -polly-stmt-granularity=scalar-indep the default.
Splitting basic blocks into multiple statements if there are now
additional scalar dependencies gives more freedom to the scheduler, but
more statements also means higher compile-time complexity. Switch to
finer statement granularity, the additional compile time should be
limited by the number of operations quota.

The regression tests are written for the -polly-stmt-granularity=bb
setting, therefore we add that flag to those tests that break with the
new default. Some of the tests only fail because the statements are
named differently due to a basic block resulting in multiple statements,
but which are removed during simplification of statements without
side-effects. Previous commits tried to reduce this effect, but it is
not completely avoidable.

Differential Revision: https://reviews.llvm.org/D42151

llvm-svn: 324169
2018-02-03 06:59:47 +00:00
Michael Kruse a230f22f4b [ScopBuilder] Prefer PHI Write accesses in the statement the incoming value is defined.
Theoretically, a PHI write can be added to any statement that represents
the incoming basic block. We previously always chose the last because
the incoming value's definition is guaranteed to be defined.

With this patch the PHI write is added to the statement that defines the
incoming value. It avoids the requirement for a scalar dependency between
the defining statement and the statement containing the write. As such the
logic for -polly-stmt-granularity=scalar-indep that ensures that there is
such scalar dependencies can be removed.

Differential Revision: https://reviews.llvm.org/D42147

llvm-svn: 323284
2018-01-23 23:56:36 +00:00
Dimitry Andric e6de5a100d Assume the shared library path variable is LD_LIBRARY_PATH on systems
except Darwin and Windows.  This prevents inserting an environment
variable with an empty name (which is illegal and leads to a Python
exception) on any of the BSDs.

llvm-svn: 323041
2018-01-20 14:35:05 +00:00
Daniel Neilson 751a2cebc5 Change memcpy/memove/memset to have dest and source alignment attributes (Step 1).
Summary:
 Upstream LLVM is changing the the prototypes of the @llvm.memcpy/memmove/memset
intrinsics. This change updates the polly tests for this change.

 The @llvm.memcpy/memmove/memset intrinsics currently have an explicit argument
which is required to be a constant integer. It represents the alignment of the
dest (and source), and so must be the minimum of the actual alignment of the
two.

 This change removes the alignment argument in favour of placing the alignment
attribute on the source and destination pointers of the memory intrinsic call.

 For example, code which used to read:
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false)
will now read
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false)

 At this time the source and destination alignments must be the same (Step 1).
Step 2 of the change, to be landed shortly, will relax that contraint and allow
the source and destination to have different alignments.

llvm-svn: 322963
2018-01-19 17:12:48 +00:00
Michael Kruse 9cfb0ac223 [ScopBuilder] Revise statement naming when there are multiple statements per BB.
The goal is to have -polly-stmt-granularity=bb and
-polly-stmt-granularity=scalar-indep to have the same names if there is
just one statement per basic block.

This fixes a fluke when Polybench's jacobi-2d is optimized differently
depending on the -polly-stmt-granularity option, although both options
create the same SCoP, just with different statement names.

The new naming scheme is:

With -polly-use-llvm-names=0:
Stmt<BBIdx as decimal><Idx within BB as letter>

With -polly-use-llvm-names=1:
Stmt_BBName_<Idx within BB as letter>

The <Idx within BB> suffix is omitted for the main statement of a BB. The
main statement is either the one containing the first store or call
(those cannot be removed by the simplifyer), or if there is no such
instruction, the first. If after simplification there is just a single
statement left, it should be the main statement and have the same names as
with -polly-stmt-granularity=bb.

Differential Revision: https://reviews.llvm.org/D42136

llvm-svn: 322852
2018-01-18 15:15:50 +00:00
Eli Friedman a75d53c83f [polly] [ScopInfo] Don't use isl_val_get_num_si.
isl_val_get_num_si crashes on overflow, so don't use it on arbitrary
integers.

Testcase only crashes on platforms where long is 32 bits because of the
signature of isl_val_get_num_si; not sure if it's possible to write a
testcase which crashes if long is 64 bits.

There are a few other places in polly which use isl_val_get_num_si;
they probably need to be fixed as well. I don't think polly uses any
of the other "long" isl APIs in an unsafe manner.

Differential Revision: https://reviews.llvm.org/D42129

llvm-svn: 322766
2018-01-17 21:59:02 +00:00
Michael Kruse 271deb17b0 [CodeGen] Fix noalias annotations for memcpy/memmove.
Memory transfer instructions take two pointers. It is not defined to
which of those a noalias annotation applies. To ensure correctness,
do not add noalias annotations to memcpy/memmove instructions anymore.

The caused a miscompile with test-suite's MultiSource/Applications/obsequi.
Since r321138, the MemCpyOpt pass would remove memcpy/memmove calls if
known to copy uninitialized memory. In that case, it was initialized
by another memcpy, but the annotation for the target pointer said
it would not alias. The annotation was actually meant for the source
pointer, which was was an alloca and could not alias with the target
pointer.

llvm-svn: 321371
2017-12-22 17:44:53 +00:00
Michael Kruse 5c2441901f Fix isl out-of-quota errors affecting later quota guards.
If an out-of-quota error occurred, the last error would be
isl_error_quota unless a different error occured. We typically check
whether the max-operations occured by comparing to that error value
after leaving the quota guard. This would check whether there ever
was a quota-error, not just in the last quota guards.

The observable bug occurred if the max-operations limit was reached in
DeLICM, and if -polly-dependences-computout=0, DependenceInfo would
think that the quota for computing dependencies was the reason,
i.e., fail the operation even if the calculation itself was successful.

Fix by reseting the last error to isl_error_none when entering a
quota guard, signaling that no quota error occured unless in the
guard's scope.

llvm-svn: 321329
2017-12-22 01:10:31 +00:00
Michael Kruse 5f0e8a46cf [ScopBuilder] Split statements on encountering store instructions.
Introduce -polly-stmt-granularity=store option.

Contributed-by: Nandini Singhal <cs15mtech01004@iith.ac.in>

Differential Revision: https://reviews.llvm.org/D37337

llvm-svn: 320360
2017-12-11 12:51:24 +00:00
Philip Pfaffe f6f8b25e58 [NFC] In GPGPU testcases, replace numeric registers in CHECK directives.
Using numeric registers is flaky, since as soon as one additional
instruction is generated by us, all the tests need to be adapted.

llvm-svn: 319544
2017-12-01 14:16:39 +00:00
Philip Pfaffe 4fe21814d1 Handle Top-Level-Regions in polly::isHoistableLoad
Summary:
This can be seen as a follow-up on my previous differential [D33411](https://reviews.llvm.org/D33411).
We received a bug report where this error was triggered. I have tried my best to recreate the issue in a minimal lit testcase which is also part of this differential.

I only handle return instructions as predecessors to a virtual TLR-exit right now. From inspecting the codebase, it seems `unreachable` instructions may also be of interest here. If requested, I can extend my patches to consider them as well. I would also apply this on `ScopHelper.cpp::isErrorBlock` (see D33411), of course.

Reviewers: philip.pfaffe, bollu

Reviewed By: bollu

Subscribers: Meinersbur, pollydev, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D40492

llvm-svn: 319431
2017-11-30 13:06:10 +00:00
Michael Kruse 163cacb469 [CodeGen] Detect empty domain because of parameters context.
Isl does not allow generating isl_ast_expr from an isl_pw_aff that has an
empty domain (i.e. has no pieces). We already detected the case if the
isl_pw_aff comes with an empty domain.

isl_ast_build also considers the domain empty if it is disjoint with the
parameter context (e.g. parameters values that we exclude by runtime
versioning).

Intersect the access relation domain with the parameter context to
also detect such practically empty access domains. The effective
pointer used in the generated code is unimportand because it will never
be executed.

This fixes llvm.org/PR35362

llvm-svn: 318806
2017-11-21 22:11:10 +00:00
Philip Pfaffe 00fd43b327 Port ScopInfo to the isl cpp bindings
Summary:
Most changes are mechanical, but in one place I changed the program semantics
by fixing a likely bug:

In `Scop::hasFeasibleRuntimeContext()`, I'm now explicitely handling the
error-case. Before, when the call to `addNonEmptyDomainConstraints()`
returned a null set, this (probably) accidentally worked because
isl_bool_error converts to true. I'm checking for nullptr now.

Reviewers: grosser, Meinersbur, bollu

Reviewed By: Meinersbur

Subscribers: nemanjai, kbarton, pollydev, llvm-commits

Differential Revision: https://reviews.llvm.org/D39971

llvm-svn: 318632
2017-11-19 22:13:34 +00:00
Michael Kruse 68821a8b91 [ZoneAlgo/ForwardOpTree] Normalize PHIs to their known incoming values.
Represent PHIs by their incoming values instead of an opaque value of
themselves. This allows ForwardOpTree to "look through" the PHIs and
forward the incoming values since forwardings PHIs is currently not
supported.

This is particularly useful to cope with PHIs inserted by GVN LoadPRE.
The incoming values all resolve to a load from a single array element
which then can be forwarded.

It should in theory also reduce spurious conflicts in value mapping
(DeLICM), but I have not yet found a profitable case yet, so it is
not included here.

To avoid transitive closure and potentially necessary overapproximations
of those, PHIs that may reference themselves are excluded from
normalization and keep their opaque self-representation.

Differential Revision: https://reviews.llvm.org/D39333

llvm-svn: 317008
2017-10-31 16:11:46 +00:00
Michael Kruse ff426d974d [DeLICM] Fix wrong assumed access execution order.
ForwardOpTree may already transform a scalar access to an array
accesses. The access remains implicit (isOriginalScalarKind(), meaning
that the access is always executed at the begin/end of a statement), but
targets an array (isLatestArrayKind(), which is unrelated to whether the
execution is implicit/explicit).

Fix by properly using isOriginalXXX() to determine execution order.

This fixes the buildbots on MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG.

llvm-svn: 316995
2017-10-31 12:50:25 +00:00
Michael Kruse 06618bf71a [OpenMP] Fix reference collection of latest base ptrs.
When collecting base pointers that need to be made available in parallel
subfunctions, use the base pointer associated with the latest
ScopArrayInfo, instead of the original one.

llvm-svn: 316983
2017-10-31 10:28:22 +00:00
Philip Pfaffe 9b1d1e6ae7 Fix two testcases. NFC intended.
Add missing %loadPolly directive to support out of tree builds. One of
the changes is somewhat bigger, because the directive turns on LLVM
names, and the testcase deosn't use those.

llvm-svn: 316870
2017-10-29 21:00:48 +00:00
Michael Kruse 822dfe271b [ForwardOpTree] Reload know values.
For scalar accesses, change the access target to an array element that
is known to contain the same value.

This may become an alternative to forwardKnownLoad which creates new
loads (and therefore closer to forwarding speculatives). Reloading does
not require the known value originating from a load, but can be a store
as well.

Differential Revision: https://reviews.llvm.org/D39325

llvm-svn: 316766
2017-10-27 14:26:14 +00:00
Michael Kruse b6b65834a1 [Simplify] Mark (and sweep) based on latest access relation.
Previously we marked scalars based on the original access function. However,
when a scalar read access is redirected, the original definition
(or incoming values of a PHI) is not used anymore, and can be deleted
(unless referenced by use that has not been redirected).

llvm-svn: 316660
2017-10-26 12:34:36 +00:00
Michael Kruse 37d57dac63 [DeLICM] Add more tests for loop layouts. NFC.
llvm-svn: 316642
2017-10-26 08:03:28 +00:00
Michael Kruse 19cd61dc11 [DeLICM] Do not try to map to multiple array elements.
Add check and skip when the store used to determine the target accesses
multiple array elements. Only a single array location should for
mapping the scalar. Having multiple creates problems when deciding which
element to load from. While MemoryAccess::getAddressFunction() should
select just one of them, other problems arise in code that assumes
that there is just one target element per statement instance.

This fixes llvm.org/PR34989

This also reverts r313902 which fixed llvm.org/PR34485 also caused by
a non-functional target array element. This patch avoids the situation
to occur in the first place.

llvm-svn: 316432
2017-10-24 13:05:24 +00:00
Anna Thomas 0026d91437 [Polly] Add XFAIL to large-numbers-in-boundary-context.ll
After rL315683 (improve SCEV to calculate max BETakenCount when end
bound of loop is variant and loop is of form {Start,+1, Stride} LT End)
this test in polly started failing.
However, as discussed in https://reviews.llvm.org/rL315683,
this polly test is not a loops bound test and the MaxBECount calculated by
SCEV looks correct. The max BECount is the value calculated even when the end
bound of loop is invariant.

As discussed with Tobias offline, I'm marking this as an XFAIL, until he
gets a chance to update the testcase, so the build bot goes to green.

llvm-svn: 315912
2017-10-16 15:12:39 +00:00
Michael Kruse cc345e6e94 [ScopBuilder] Introduce -polly-stmt-granularity=scalar-indep option.
The option splits BasicBlocks into minimal statements such that no
additional scalar dependencies are introduced.

The algorithm is based on a union-find structure, and unites sets if
putting them into separate statements would introduce a scalar
dependencies. As a consequence, instructions may be split into separate
statements such their relative order is different than the statements
they are in. This is accounted for instructions whose relative order
matters (e.g. memory accesses).

The algorithm is generic in that heuristic changes can be made
relatively easily. We might relax the order requirement for read-reads
or accesses to different base pointers. Forwardable instructions can be
made to not cause a join.

This implementation gives us a speed-up of 82% in SPEC 2006 456.hmmer
benchmark by allowing loop-distribution in a hot loop such that one of
the loops can be vectorized.

Differential Revision: https://reviews.llvm.org/D38403

llvm-svn: 314983
2017-10-05 13:43:00 +00:00
Tobias Grosser c52b71db15 [GPGPU] Make sure escaping invariant load hoisted scalars are preserved
We make sure that the final reload of an invariant scalar memory access uses the
same stack slot into which the invariant memory access was stored originally.
Earlier, this was broken as we introduce a new stack slot aside of the preload
stack slot, which remained uninitialized and caused our escaping loads to
contain garbage. This happened due to us clearing the pre-populated values
in EscapeMap after kernel code generation. We address this issue by preserving
the original host values and restoring them after kernel code generation.
EscapeMap is not expected to be used during kernel code generation, hence we
clear it during kernel generation to make sure that any unintended uses are
noticed.

llvm-svn: 314894
2017-10-04 10:24:23 +00:00
Jakub Kuderski 119753ad14 UnXFAIL tests that previously failed VerifyDFSNumbers
They started passing again by the DT::eraseNode fix in r314847.

llvm-svn: 314850
2017-10-03 21:23:56 +00:00
Jakub Kuderski 3c3bf74022 XFAIL two test that fail VerifyDFSNumbers DominatorTree check
This test XFAILs two test that start to fail when verifying DT's
DFS numbers, as per Tobias' suggestion.

Related VerifyDFSNumbers patch: D38331.

llvm-svn: 314800
2017-10-03 14:31:53 +00:00
Michael Kruse f5745b4e7d [ScopBuilder] Build invariant loads separately.
Create the MemoryAccesses of invariant loads separately and before
all other MemoryAccesses.

Invariant loads are classified as synthesizable and therefore are not
contained in any statement. When iterating over all instructions of all
statements, the invariant loads are consequently not processed and
iterating over them separately becomes necessary.

This patch can change the order in which MemoryAccesses are created, but
otherwise has no functional change.

Some temporary code is introduced to ensure correctness, but will be
removed in the next commit.

llvm-svn: 314664
2017-10-02 11:41:27 +00:00
Michael Kruse 89a6f3db02 [ScopBuilder] Build escaping dependencies separately.
Instructions that compute escaping values might be synthesizable and
therefore not contained in any ScopStmt. When buildAccessFunctions is
changed to only iterate over the instruction list of statement,
"free" instructions still need to be written. We do this after the
main MemoryAccesses have been created.

This can change the order in which MemoryAccesses are created, but has
otherwise no functional change.

llvm-svn: 314663
2017-10-02 11:41:19 +00:00
Michael Kruse c013399197 [ScopDetect] Do not add loads out of the SCoP to required invariant loads.
Loads before the SCoP are always invariant within the SCoP and
therefore are no "required invariant loads". An assertion failes in
ScopBuilder when it finds such an invariant load.

Fix by not adding such loads to the required invariant load list. This
likely will cause the region to be not considered a valid SCoP.
We may want to unconditionally accept instructions defined before
the region as valid invariant conditions instead of rejecting them.

This fixes a compilation crash of SPEC CPU2006 453.povray's
render.cpp.

llvm-svn: 314636
2017-10-01 22:19:28 +00:00
Tobias Grosser d215e684b3 Add missing REQUIRES line
llvm-svn: 314625
2017-10-01 13:14:40 +00:00
Tobias Grosser 2fb847fbf6 [GPGPU] Set Polly's RTC to false in case invariant load hoisting fails
This matches the behavior we already have in lib/Codegen/CodeGeneration.cpp and
makes sure that we fall back to the original code. It seems when invariant load
hoisting was introduced to the GPGPU backend we missed to reset the RTC flag,
such that kernels where invariant load hoisting failed executed the 'optimized'
SCoP, which however is set to a simple 'unreachable'. Unsurprisingly, this
results in hard to debug issues that are a lot of fun to debug.

llvm-svn: 314624
2017-10-01 12:39:14 +00:00
Tobias Grosser 1f93d0f1f9 [ScopInfo] Allow PHI nodes that reference an error block
As long as these PHI nodes are only referenced by terminator instructions.

llvm-svn: 314212
2017-09-26 15:00:10 +00:00
Tobias Grosser 5e531dfef4 [ScopInfo] Allow invariant loads in branch conditions
In case the value used in a branch condition is a load instruction, assume this
load to be invariant.

llvm-svn: 314146
2017-09-25 20:27:15 +00:00
Tobias Grosser 0a62b2d887 [ScopInfo] Allow uniform branch conditions
If all but one branch come from an error condition and the incoming value from
this branch is a constant, we can model this branch.

llvm-svn: 314116
2017-09-25 16:37:15 +00:00
Tobias Grosser ee457594c2 [ScopDetect/Info] Look through PHIs that follow an error block
In case a PHI node follows an error block we can assume that the incoming value
can only come from the node that is not an error block. As a result, conditions
that seemed non-affine before are now in fact affine.

This is a recommit of r312663 after fixing
test/Isl/CodeGen/phi_after_error_block_outside_of_scop.ll

llvm-svn: 314075
2017-09-24 09:25:30 +00:00
Tobias Grosser 75d133f0ac [IslExprBuilder] Do not generate RTC with more than 64 bit
Such RTCs may introduce integer wrapping intrinsics with more than 64 bit,
which are translated to library calls on AOSP that are not part of the
runtime and will consequently cause linker errors.

Thanks to Eli Friedman for reporting this issue and reducing the test case.

llvm-svn: 314065
2017-09-23 15:32:07 +00:00
Michael Kruse bfca5f4334 [DeLICM] Allow non-injective PHIRead->PHIWrite mapping.
Remove an assertion that tests the injectivity of the
PHIRead -> PHIWrite relation.  That is, allow a single PHI write to be
used by multiple PHI reads.  This may happen due to some statements
containing the PHI write not having the statement instances that would
overwrite the previous incoming value due to (assumed/invalid) contexts.
This result in that PHI write is mapped to multiple targets which is not
supported.  Codegen will select one one of the targets using
getAddressFunction().  However, the runtime check should protect us from
this case ever being executed.

We therefore allow injective PHI relations.  Additional calculations to
detect/santitize this case would probably not be worth the compuational
effort.

This fixes llvm.org/PR34485

llvm-svn: 313902
2017-09-21 19:08:23 +00:00
Michael Kruse 6d7a7896ce [ScopInfo] Use map for value def/PHI read accesses.
Before this patch, ScopInfo::getValueDef(SAI) used
getStmtFor(Instruction*) to find the MemoryAccess that writes a
MemoryKind::Value. In cases where the value is synthesizable within the
statement that defines, the instruction is not added to the statement's
instruction list, which means getStmtFor() won't return anything.

If the synthesiable instruction is not synthesiable in a different
statement (due to being defined in a loop that and ScalarEvolution
cannot derive its escape value), we still need a MemoryKind::Value
and a write to it that makes it available in the other statements.
Introduce a separate map for this purpose.

This fixes MultiSource/Benchmarks/MallocBench/cfrac where
-polly-simplify could not find the writing MemoryAccess for a use. The
write was not marked as required and consequently was removed.

Because this could in principle happen as well for PHI scalars,
add such a map for PHI reads as well.

llvm-svn: 313881
2017-09-21 14:23:11 +00:00
Michael Kruse 0e370cf1a7 Check whether IslAstInfo and DependenceInfo were computed for the same Scop.
Since -polly-codegen reports itself to preserve DependenceInfo and IslAstInfo,
we might get those analysis that were computed by a different ScopInfo for a
different Scop structure. This would be unfortunate because DependenceInfo and
IslAstInfo hold references to resources allocated by
ScopInfo/ScopBuilder/Scop (e.g. isl_id). If -polly-codegen and
DependenceInfo/IslAstInfo do not agree on which Scop to use, unpredictable
things can happen.

When the ScopInfo/Scop object is freed, there is a high probability that the
new ScopInfo/Scop object will be created at the same heap position with the
same address. Comparing whether the Scop or ScopInfo address is the expected
therefore is unreliable.

Instead, we compare the address of the isl_ctx object. Both, DependenceInfo
and IslAstInfo must hold a reference to the isl_ctx object to ensure it is
not freed before the destruction of those analyses which might happen after
the destruction of the Scop/ScopInfo they refer to.  Hence, the isl_ctx
will not be freed and its address not reused as long there is a
DependenceInfo or IslAstInfo around.

This fixes llvm.org/PR34441

llvm-svn: 313842
2017-09-21 00:01:13 +00:00
Michael Kruse 8dceb76066 [ScheduleOptimizer] Fix and test schedule tree statistics.
Fix walking over the schedule tree to collect its properties
(Number of permutable bands etc.).

Also add regression tests for these statistics.

llvm-svn: 313750
2017-09-20 11:53:05 +00:00
Michael Kruse ef8325ba50 [ForwardOpTree] Test the max operations quota.
cl::opt<unsigned long> is not specialized and hence the option
-polly-optree-max-ops impossible to use.

Replace by supported option cl::opt<unsigned>.

Also check for an error state when computing the written value, which
happens when the quota runs out.

llvm-svn: 313546
2017-09-18 17:43:50 +00:00
Michael Kruse eac3eebfea [test] Enable -polly-codegen-verify for regression tests.
In r301670 IR verification was disabled. Since then, CodeGen writing
malformed IR would only be noticed by unpredictable behavior in
follow-up passes (e.g. segfaults, infinite loops) or IR verification in
the backend assert builds.

Re-enable -polly-codegen-verify at for the regression tests to ensure
that malformed IR is detected where Polly generated malformed IR in the
past and changes in CodeGen are at least partially covered by
check-polly
(otherwise malformed IR may only get noticed when the buildbots run the
test-suite).

Differential Revision: https://reviews.llvm.org/D37969

llvm-svn: 313527
2017-09-18 12:34:11 +00:00
Zachary Turner ce92db13ea Resubmit "[lit] Force site configs to run before source-tree configs"
This is a resubmission of r313270.  It broke standalone builds of
compiler-rt because we were not correctly generating the llvm-lit
script in the standalone build directory.

The fixes incorporated here attempt to find llvm/utils/llvm-lit
from the source tree returned by llvm-config.  If present, it
will generate llvm-lit into the output directory.  Regardless,
the user can specify -DLLVM_EXTERNAL_LIT to point to a specific
lit.py on their file system.  This supports the use case of
someone installing lit via a package manager.  If it cannot find
a source tree, and -DLLVM_EXTERNAL_LIT is either unspecified or
invalid, then we print a warning that tests will not be able
to run.

Differential Revision: https://reviews.llvm.org/D37756

llvm-svn: 313407
2017-09-15 22:10:46 +00:00
Zachary Turner 83dcb68468 Revert "[lit] Force site configs to run before source-tree configs"
This patch is still breaking several multi-stage compiler-rt bots.
I already know what the fix is, but I want to get the bots green
for now and then try re-applying in the morning.

llvm-svn: 313335
2017-09-15 02:56:40 +00:00
Zachary Turner a0e55b6403 [lit] Force site configs to be run before source-tree configs
This patch simplifies LLVM's lit infrastructure by enforcing an ordering
that a site config is always run before a source-tree config.

A significant amount of the complexity from lit config files arises from
the fact that inside of a source-tree config file, we don't yet know if
the site config has been run.  However it is *always* required to run
a site config first, because it passes various variables down through
CMake that the main config depends on.  As a result, every config
file has to do a bunch of magic to try to reverse-engineer the location
of the site config file if they detect (heuristically) that the site
config file has not yet been run.

This patch solves the problem by emitting a mapping from source tree
config file to binary tree site config file in llvm-lit.py. Then, during
discovery when we find a config file, we check to see if we have a
target mapping for it, and if so we use that instead.

This mechanism is generic enough that it does not affect external users
of lit. They will just not have a config mapping defined, and everything
will work as normal.

On the other hand, for us it allows us to make many simplifications:

* We are guaranteed that a site config will be executed first
* Inside of a main config, we no longer have to assume that attributes
  might not be present and use getattr everywhere.
* We no longer have to pass parameters such as --param llvm_site_config=<path>
  on the command line.
* It is future-proof, meaning you don't have to edit llvm-lit.in to add
  support for new projects.
* All of the duplicated logic of trying various fallback mechanisms of
  finding a site config from the main config are now gone.

One potentially noteworthy thing that was required to implement this
change is that whereas the ninja check targets previously used the first
method to spawn lit, they now use the second. In particular, you can no
longer run lit.py against the source tree while specifying the various
`foo_site_config=<path>` parameters.  Instead, you need to run
llvm-lit.py.

Differential Revision: https://reviews.llvm.org/D37756

llvm-svn: 313270
2017-09-14 16:47:58 +00:00
Roman Gareev 925ce50f1b Unroll and separate the remaining parts of isolation
The remaining parts produced by the full partial tile isolation can contain
hot spots that are worth to be optimized. Currently, we rely on the simple
loop unrolling pass, LiCM and the SLP vectorizer to optimize such parts.
However, the approach can suffer from the lack of the information about
aliasing that Polly provides using additional alias metadata or/and the lack
of the information required by simple loop unrolling pass.

This patch is the first step to optimize the remaining parts. To do it, we
unroll and separate them. In case of, for instance, Intel Kaby Lake, it helps
to increase the performance of the generated code from 39.87 GFlop/s to
49.23 GFlop/s.

The next possible step is to avoid unrolling performed by Polly in case of
isolated and remaining parts and rely only on simple loop unrolling pass and
the Loop vectorizer.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D37692

llvm-svn: 312929
2017-09-11 17:46:47 +00:00
Michael Kruse 2f5cbc449a [CodeGen] Bitcast scalar writes to actual value.
The type of NewValue might change due to ScalarEvolution
looking though bitcasts. The synthesized NewValue therefore
becomes the type before the bitcast.

llvm-svn: 312718
2017-09-07 12:15:01 +00:00
Michael Kruse 8ee179d3b4 Revert "[ScopDetect/Info] Look through PHIs that follow an error block"
This reverts commit
r312410 - [ScopDetect/Info] Look through PHIs that follow an error block

The commit caused generation of invalid IR due to accessing a parameter
that does not dominate the SCoP.

llvm-svn: 312663
2017-09-06 19:05:40 +00:00
Michael Kruse 48c726f925 [test] Add forgotten REQUIRES: line.
llvm-svn: 312632
2017-09-06 13:11:24 +00:00
Michael Kruse bd84ce8931 [ZoneAlgo] Handle non-StoreInst/LoadInst MemoryAccesses including memset.
Up to now ZoneAlgo considered array elements access by something else
than a LoadInst or StoreInst as not analyzable. This patch removes that
restriction by using the unknown ValInst to describe the written
content, repectively the element type's null value in case of memset.

Differential Revision: https://reviews.llvm.org/D37362

llvm-svn: 312630
2017-09-06 12:40:55 +00:00
Michael Kruse 420c4863a9 [Simplify] Actually remove unsed instruction from region header.
Since r312249 instructions of a entry block of region statements are
not marked as root anymore and hence can theoretically be removed
if unused. Theoretically, because the instruction list was not changed.

Still, MemoryAccesses for unused instructions were removed. This lead
to a failed assertion in the code generator  when the MemoryAccess for
the still listed instruction was not found.

This hould fix the
Assertion failed: ArrayAccess && "No array access found for instruction!",
file ScopInfo.h, line 1494
compiler crashes.

llvm-svn: 312566
2017-09-05 19:44:39 +00:00
Tobias Grosser d6e0679c4e [ForwardOp] Remove read accesses for all instructions that have been moved
Before this patch, OpTree did not consider forwarding an operand tree consisting
of only single LoadInst as useful. The motivation was that, like an access to a
read-only variable, it would just replace one MemoryAccess by another. However,
in contrast to read-only accesses, this would replace a scalar access by an
array access, which is something worth doing.

In addition, leaving scalar MemoryAccess is problematic in that VirtualUse
prioritizes inter-Stmt use over intra-Stmt. It was possible that the same LLVM
value has a MemoryAccess for accessing the remote Stmt's LoadInst as well as
having the same LoadInst in its own instruction list (due to being forwarded
from another operand tree).

With this patch we ensure that if a LoadInst is forwarded is any operand tree,
also the operand tree containing just the LoadInst is forwarded as well, which
effectively removes the scalar MemoryAccess such that only the array access
remains, not both.

Thanks Michael for the detailed explanation.

Reviewers: Meinersbur, bellu, singam-sanjay, gareevroman

Subscribers: hfinkel, pollydev, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D37424

llvm-svn: 312456
2017-09-03 19:52:15 +00:00
Tobias Grosser 701d943d12 [IslAst] Do not assert in case of empty min/max alias locations
In certain situations, the context in the isl_ast_build could result for the
min/max locations of our alias sets to become empty, which would cause an
internal error in isl, which is then unable to derive a value for these
expressions. Check these conditions before code generating expressions and
instead assume that alias check succeeded. This is valid, as the corresponding
memory accesses will not be executed under any valid context.

This fixed llvm.org/PR34432. Thanks to Qirun Zhang for reporting.

llvm-svn: 312455
2017-09-03 19:47:19 +00:00
Tobias Grosser 99ccf05694 [ScopHelper] Do not crash on unreachable blocks
This resolves llvm.org/PR34433. Thanks to Zhendong Su for reporting.

llvm-svn: 312451
2017-09-03 18:01:22 +00:00
Tobias Grosser 4baedc70d1 [ScopDetect/Info] Look through PHIs that follow an error block
In case a PHI node follows an error block we can assume that the incoming value
can only come from the node that is not an error block. As a result, conditions
that seemed non-affine before are now in fact affine.

llvm-svn: 312410
2017-09-02 08:25:55 +00:00
Siddharth Bhat 3928e3f50a [ISLNodeBuilder] Materialize Fortran array sizes of arrays without memory accesses.
In Polly, we specifically add a paramter to represent the outermost dimension
 size of fortran arrays. We do this because this information is statically
 available from the fortran metadata generated by dragonegg.
 However, we were only materializing these parameters (meaning, creating an
 llvm::Value to back the isl_id) from *memory accesses*. This is wrong,
 we should materialize parameters from *scop array info*.

 It is wrong because if there is a case where we detect 2 fortran arrays,
 but only one of them is accessed, we may not materialize the other array's
 dimensions at all.

 This is incorrect. We fix this by looping over all
 `polly::ScopArrayInfo` in a scop, rather that just all `polly::MemoryAccess`.

 Differential Revision: https://reviews.llvm.org/D37379

llvm-svn: 312350
2017-09-01 18:55:43 +00:00
Michael Kruse 0c6c555beb Fix Memory Access of failing tests.
Mark scalar dependences for different statements belonging to same BB
as 'Inter'.

Contributed-by: Nandini Singhal <cs15mtech01004@iith.ac.in>

Differential Revision: https://reviews.llvm.org/D37147

llvm-svn: 312324
2017-09-01 11:36:52 +00:00
Tobias Grosser 2307f86c47 [ForwardOpTree] Allow forwarding in the presence of region statements
Summary:
After region statements now also have instruction lists, this is a
straightforward extension.

Reviewers: Meinersbur, bollu, singam-sanjay, gareevroman

Reviewed By: Meinersbur

Subscribers: hfinkel, pollydev, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D37298

llvm-svn: 312249
2017-08-31 16:04:49 +00:00
Siddharth Bhat 56572c6a5e [PPCGCodeGen] Convert intrinsics to libdevice functions whenever possible.
This is useful when we face certain intrinsics such as `llvm.exp.*`
which cannot be lowered by the NVPTX backend while other intrinsics can.

So, we would need to keep blacklists of intrinsics that cannot be
handled by the NVPTX backend. It is much simpler to try and promote
all intrinsics to libdevice versions.

This patch makes function/intrinsic very uniform, and will always try to use
a libdevice version if it exists.

Differential Revision: https://reviews.llvm.org/D37056

llvm-svn: 312239
2017-08-31 13:03:37 +00:00
Tobias Grosser c43d0360cc [BlockGenerator] Generate entry block of regions from instruction lists
The adds code generation support for the previous commit.

This patch has been re-applied, after the memory issue in the previous patch
has been fixed.

llvm-svn: 312211
2017-08-31 03:17:35 +00:00
Tobias Grosser bd15d13d4e [ScopInfo] Use statement lists for entry blocks of region statements
By using statement lists in the entry blocks of region statements, instruction
level analyses also work on region statements.

We currently only model the entry block of a region statements, as this is
sufficient for most transformations the known-passes currently execute. Modeling
instructions in the presence of control flow (e.g. infinite loops) is left
out to not increase code complexity too much. It can be added when good use
cases are found.

This change set is reapplied, after a memory corruption issue had been fixed.

llvm-svn: 312210
2017-08-31 03:15:56 +00:00
Tobias Grosser d3edc16416 Revert "[ScopInfo] Use statement lists for entry blocks of region statements"
This reverts commit r312128. It aused some memory issues.

llvm-svn: 312209
2017-08-31 02:43:49 +00:00
Tobias Grosser 6f1f5cbb5b Revert "[BlockGenerator] Generate entry block of regions from instruction lists"
This reverts commit r312129. It caused some memory issues.

llvm-svn: 312208
2017-08-31 02:43:27 +00:00
Adrian Prantl 6120801066 Adapt testcase to LLVM change in DIGlobalVariableExpression.
llvm-svn: 312147
2017-08-30 18:12:35 +00:00
Tobias Grosser 1e34508bcc [BlockGenerator] Generate entry block of regions from instruction lists
The adds code generation support for the previous commit.

llvm-svn: 312129
2017-08-30 15:08:30 +00:00
Tobias Grosser 6fbe4c8501 [ScopInfo] Use statement lists for entry blocks of region statements
By using statement lists in the entry blocks of region statements, instruction
level analyses also work on region statements.

We currently only model the entry block of a region statements, as this is
sufficient for most transformations the known-passes currently execute. Modeling
instructions in the presence of control flow (e.g. infinite loops) is left
out to not increase code complexity too much. It can be added when good use
cases are found.

llvm-svn: 312128
2017-08-30 15:08:21 +00:00
Michael Kruse 591255183b [ScopBuilder] Introduce metadata for splitting scop statement.
This patch allows annotating of metadata in ir instruction
(with "polly_split_after"), which specifies where to split a particular
scop statement.

Contributed-by: Nandini Singhal <cs15mtech01004@iith.ac.in>

Differential Revision: https://reviews.llvm.org/D36402

llvm-svn: 312107
2017-08-30 10:11:06 +00:00
Michael Kruse 4728184342 [ZoneAlgo] More fine-grained bail-out.
ZoneAlgo used to bail out for the complete SCoP if it encountered
something violating its assumption. This meant the neither OpTree can
forward any load nor DeLICM do anything in such cases, even if their
transformations are unrelated to the violations.

This patch adds a list of compatible elements (currently with the
granularity of entire arrays) that can be used for analysis. OpTree
and DeLICM can then check whether their transformations only concern
compatible elements, and skip non-compatible ones.

This will be useful for e.g. Polybench's benchmarks covariance,
correlation, bicg, doitgen, durbin, gramschmidt, adi that have
assumption violation, but which are not necessarily relevant
for all transformations.

Differential Revision: https://reviews.llvm.org/D37219

llvm-svn: 311929
2017-08-28 20:39:07 +00:00
Tobias Grosser ee8ad1c0ff [IslAst] Do not compare arrays in alias check which are known to be identical
This possibly helps to avoid run-time check failures in the COSMO kernels.

llvm-svn: 311920
2017-08-28 20:17:02 +00:00
Tobias Grosser 93ab558d2e [Detect] Consider nested loop profitable if entry block is not in loop
In cases where the entry block of a scop was not contained in a loop that was
part of the scop region and at the same time there was a loop surrounding the
scop, we missed to count the loops in the scop and consequently did not consider
the scop profitable. We correct this by only moving to the loop parent, in case
the current loop is loop contained in the scop.

This increases the number of loops in COSMO which we assume to be profitable
from 3974 to 4981.

llvm-svn: 311863
2017-08-27 21:39:25 +00:00
Tobias Grosser 6d0970f64e Revert "[polly] Fix ScopDetectionDiagnostic test failure caused by r310940"
This reverts commit 950849ece9bb8fdd2b41e3ec348b9653b4e37df6.

This commit broke various buildbots.

llvm-svn: 311692
2017-08-24 19:47:15 +00:00
Michael Kruse b795bfc0d4 [CodeGen] Detect impossible partial write conditions more reliably.
Whether a partial write is tautological/unsatisfiable not only
depends on the access domain, but also on the domain covered
by its node in the AST.

In the example below, there are two instances of Stmt_cond_false. It may have a partial write access that is not executed in instance Stmt_cond_false(0).

      for (int c0 = 0; c0 < tmp5; c0 += 1) {
        Stmt_for_body344(c0);
        if (tmp5 >= c0 + 2)
          Stmt_cond_false(c0);
        Stmt_cond_end(c0);
      }
      if (tmp5 <= 0) {
        Stmt_for_body344(0);
        Stmt_cond_false(0);
        Stmt_cond_end(0);
      }

Isl cannot derive a subscript for an array element that is never accessed.
This caused an error in that no subscript expression has been generated
in IslNodeBuilder::createNewAccesses, but BlockGenerator expected one
to exist because there is an execution of that write, just not in that
ast node.

Fixed by instead of determining whether the access domain is empty,
inspect whether isl generated a constant "false" ast expression in
the current ast node.

This should fix a compiler crash of the aosp buildbot.

llvm-svn: 311663
2017-08-24 14:51:35 +00:00
Andreas Simbuerger e478e2de83 [Polly][WIP] Scalar fully indexed expansion
Summary:
This patch comes directly after https://reviews.llvm.org/D34982 which allows fully indexed expansion of MemoryKind::Array. This patch allows expansion for MemoryKind::Value and MemoryKind::PHI.

MemoryKind::Value seems to be working with no majors modifications of D34982. A test case has been added. Unfortunatly, no "run time" checks can be done for now because as @Meinersbur explains in a comment on D34982, DependenceInfo need to be cleared and reset to take expansion into account in the remaining part of the Polly pipeline. There is no way to do that in Polly for now.

MemoryKind::PHI is not working. Test case is in place, but not working. To expand MemoryKind::Array, we expand first the write and then after the reads. For MemoryKind::PHI, the idea of the current implementation is to exchange the "roles" of the read and write and expand first the read according to its domain and after the writes.
But with this strategy, I still encounter the problem of union_map in new access map.
For example with the following source code (source code of the test case) :

```
void mse(double A[Ni], double B[Nj]) {
  int i,j;
  double tmp = 6;
  for (i = 0; i < Ni; i++) {
    for (int j = 0; j<Nj; j++) {
      tmp = tmp + 2;
    }
    B[i] = tmp;
  }
}
```

Polly gives us the following statements and memory accesses :

```
    Statements {
    	Stmt_for_body
            Domain :=
                { Stmt_for_body[i0] : 0 <= i0 <= 9999 };
            Schedule :=
                { Stmt_for_body[i0] -> [i0, 0, 0] };
            ReadAccess :=	[Reduction Type: NONE] [Scalar: 1]
                { Stmt_for_body[i0] -> MemRef_tmp_04__phi[] };
            MustWriteAccess :=	[Reduction Type: NONE] [Scalar: 1]
                { Stmt_for_body[i0] -> MemRef_tmp_11__phi[] };
            Instructions {
                  %tmp.04 = phi double [ 6.000000e+00, %entry.split ], [ %add.lcssa, %for.end ]
            }
    	Stmt_for_inc
            Domain :=
                { Stmt_for_inc[i0, i1] : 0 <= i0 <= 9999 and 0 <= i1 <= 9999 };
            Schedule :=
                { Stmt_for_inc[i0, i1] -> [i0, 1, i1] };
            MustWriteAccess :=	[Reduction Type: NONE] [Scalar: 1]
                { Stmt_for_inc[i0, i1] -> MemRef_tmp_11__phi[] };
            ReadAccess :=	[Reduction Type: NONE] [Scalar: 1]
                { Stmt_for_inc[i0, i1] -> MemRef_tmp_11__phi[] };
            MustWriteAccess :=	[Reduction Type: NONE] [Scalar: 1]
                { Stmt_for_inc[i0, i1] -> MemRef_add_lcssa__phi[] };
            Instructions {
                  %tmp.11 = phi double [ %tmp.04, %for.body ], [ %add, %for.inc ]
                  %add = fadd double %tmp.11, 2.000000e+00
                  %exitcond = icmp ne i32 %inc, 10000
            }
    	Stmt_for_end
            Domain :=
                { Stmt_for_end[i0] : 0 <= i0 <= 9999 };
            Schedule :=
                { Stmt_for_end[i0] -> [i0, 2, 0] };
            MustWriteAccess :=	[Reduction Type: NONE] [Scalar: 1]
                { Stmt_for_end[i0] -> MemRef_tmp_04__phi[] };
            ReadAccess :=	[Reduction Type: NONE] [Scalar: 1]
                { Stmt_for_end[i0] -> MemRef_add_lcssa__phi[] };
            MustWriteAccess :=	[Reduction Type: NONE] [Scalar: 0]
                { Stmt_for_end[i0] -> MemRef_B[i0] };
            Instructions {
                  %add.lcssa = phi double [ %add, %for.inc ]
                  store double %add.lcssa, double* %arrayidx, align 8
                  %exitcond5 = icmp ne i64 %indvars.iv.next, 10000
            }
    }

```

and the following dependences :
```
{ Stmt_for_inc[i0, 9999] -> Stmt_for_end[i0] : 0 <= i0 <= 9999;
Stmt_for_inc[i0, i1] -> Stmt_for_inc[i0, 1 + i1] : 0 <= i0 <= 9999 and 0 <= i1 <= 9998;
Stmt_for_body[i0] -> Stmt_for_inc[i0, 0] : 0 <= i0 <= 9999;
Stmt_for_end[i0] -> Stmt_for_body[1 + i0] : 0 <= i0 <= 9998 }
```

When trying to expand this memory access :
```
{ Stmt_for_inc[i0, i1] -> MemRef_tmp_11__phi[] };
```

The new access map would look like this :
```
{ Stmt_for_inc[i0, 9999] -> MemRef_tmp_11__phi_exp[i0] : 0 <= i0 <= 9999; Stmt_for_inc[i0, i1] ->MemRef_tmp_11__phi_exp[i0, 1 + i1] : 0 <= i0 <= 9999 and 0 <= i1 <= 9998 }
```

The idea to implement the expansion for PHI access is an idea from @Meinersbur and I don't understand why my implementation does not work. I should have miss something in the understanding of the idea.

Contributed by: Nicolas Bonfante <nicolas.bonfante@gmail.com>

Reviewers: Meinersbur, simbuerg, bollu

Reviewed By: Meinersbur

Subscribers: llvm-commits, pollydev, Meinersbur

Differential Revision: https://reviews.llvm.org/D36647

llvm-svn: 311619
2017-08-24 00:04:45 +00:00
Michael Kruse 7fac28fa4f [ScopDetect] Include zero-iteration loops in loop count.
Loop with zero iteration are, syntactically, loops. They have been
excluded from the loop counter even for the non-profitable counters.
This seems to be unintentially as the sentinel value of '0' minimal
iterations does exclude such loops.

Fix by never considering the iteration count when the sentinel
value of 0 is found.

This makes the recently added NumTotalLoops couter redundant
with NumLoopsOverall, which now is equivalent. Hence, NumTotalLoops
is removed as well.

Note: The test case 'ScopDetect/statistics.ll' effectively does not
check profitability, because -polly-process-unprofitable is passed
to all test cases.

llvm-svn: 311551
2017-08-23 13:29:59 +00:00
Jakub Kuderski 0ac1e585fc [polly] Fix ScopDetectionDiagnostic test failure caused by r310940
Summary:
ScopDetection used to check if a loop withing a region was infinite and emitted a diagnostic in such cases. After r310940 there's no point checking against that situation, as infinite loops don't appear in regions anymore.

The test failure was observed on these two polly buildbots:
http://lab.llvm.org:8011/builders/polly-arm-linux/builds/8368
http://lab.llvm.org:8011/builders/polly-amd64-linux/builds/10310

This patch XFAILs `ReportLoopHasNoExit.ll` and turns infinite loop detection into an assert.

Reviewers: grosser, sanjoy, bollu

Reviewed By: grosser

Subscribers: efriedma, aemerson, kristof.beyls, dberlin, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D36776

llvm-svn: 311503
2017-08-22 22:01:53 +00:00
Tobias Grosser 4a07bbe3f6 [IRBuilder] Only emit alias scop metadata for arrays, but not scalars
Summary:
There is no need to emit alias metadata for scalars, as basicaa will easily
distinguish them from arrays. This reduces the size of the metadata we generate.
This is especially useful after we moved to -polly-position=before-vectorizer,
where a lot more scalar dependences are introduced, which increased the size of
the alias analysis metadata and made us commonly reach the limits after which
we do not emit alias metadata that have been introduced to prevent quadratic
growth of this alias metadata.

This improves 2mm performance from 1.5 seconds to 0.17 seconds.

Reviewers: Meinersbur, bollu, singam-sanjay

Reviewed By: Meinersbur

Subscribers: pollydev, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D37028

llvm-svn: 311498
2017-08-22 21:58:48 +00:00
Roman Gareev 0956a606ff Disable the Loop Vectorizer in case of GEMM
Currently, in case of GEMM and the pattern matching based optimizations, we
use only the SLP Vectorizer out of two LLVM vectorizers. Since the Loop
Vectorizer can get in the way of optimal code generation, we disable the Loop
Vectorizer for the innermost loop using mark nodes and emitting the
corresponding metadata.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D36928

llvm-svn: 311473
2017-08-22 17:38:46 +00:00
Michael Kruse a28260f486 [test] Do not pipe binary data to FileCheck.
llvm-svn: 311470
2017-08-22 17:09:56 +00:00
Michael Kruse 5b228bbb12 [ScopDetection] Add stat for total number of loops.
The total number of loops is useful as a baseline comparing how many
loops have been optimized in different configurations.

llvm-svn: 311469
2017-08-22 17:09:51 +00:00
Tobias Grosser 6683c81af8 test/GPGPU/invalid-kernel-assert-verifymodule.ll also requires assertions
llvm-svn: 311423
2017-08-22 03:12:29 +00:00
Siddharth Bhat 7bc77e87c8 [ScopInfo] Add option to treat all function parameters as dereferencible.
Dragonegg generates most function parameters as pointers to the actual
parameters. However, it does not mark these parameters with the
dereferencable attribute.

Polly is conservative when it comes to invariant load
hoisting, thus we add runtime checks to invariant load hoisted pointers
when we do not know that pointers are dereferencable. This is correct behaviour,
but is a performance penalty.

Add a flag that allows all pointer parameters to be dereferencable. That
way, polly can speculatively load-hoist paramters to functions without
runtime checks.

Differential Revision: https://reviews.llvm.org/D36461

llvm-svn: 311329
2017-08-21 11:57:04 +00:00
Tobias Grosser b09bd74da8 [GPGPU] Add llvm.powi to the libdevice supported functions
These intrinsics are used in COSMO.

llvm-svn: 311324
2017-08-21 09:52:08 +00:00
Tobias Grosser 5170b6627a [GPGPU] Add log / logf to the libdevice supported functions
These two functions are used in COSMO

llvm-svn: 311322
2017-08-21 09:00:31 +00:00
Michael Kruse d091bf8d8e [MatMul] Make MatMul detection independent of internal isl representations.
The pattern recognition for MatMul is restrictive.

The number of "disjuncts" in the isl_map containing constraint
information was previously required to be 1
(as per isl_*_coalesce - which should ideally produce a domain map with
a single disjunct, but does not under some circumstances).

This was changed and made more flexible.

Contributed-by: Annanay Agarwal <cs14btech11001@iith.ac.in>

Differential Revision: https://reviews.llvm.org/D36460

llvm-svn: 311302
2017-08-20 21:31:11 +00:00
Tobias Grosser e32498c9c3 Revert "[GPGPU] Simplify PPCGSCop to reduce compile time [NFC]"
We still see some issues with parameter space mismatches. Revert this to get
a clean baseline. We will recommit after these issues have been resolved.

This reverts commit 0e360a14194f722ded7aa2bc9d4be2ed2efeeb49.

llvm-svn: 311268
2017-08-19 23:49:26 +00:00
Tobias Grosser ecb94a0392 [GPGPU] Correctly initialize array order and fixed_element information
Summary:
This information is necessary for PPCG to perform correct life range reordering.
With these changes applied we can live-range reorder some of the important
kernels in COSMO.

We also update and rename one test case, which previously could not be optimized
and now is optimized thanks to live-range reordering. To preserve test coverage
we add a new test case scalar-writes-in-scop-requires-abort.ll, which exercises
our automatic abort in case of scalar writes in the kernel.

Reviewers: Meinersbur, bollu, singam-sanjay

Subscribers: nemanjai, pollydev, llvm-commits, kbarton

Tags: #polly

Differential Revision: https://reviews.llvm.org/D36929

llvm-svn: 311259
2017-08-19 20:21:22 +00:00
Philipp Schaad 50139f0f38 [PPCG] Only add Kernel argument sizes for OpenCL, not CUDA runtime
Kernel argument sizes now only get appended to the kernel launch parameter list if the OpenCL runtime is selected, not if CUDA runtime is chosen.

Differential revision: D36925

llvm-svn: 311248
2017-08-19 17:04:57 +00:00
Tobias Grosser 43df2020e7 [GPGPU] Collect parameter dimension used in MemoryAccesses
When using -polly-ignore-integer-wrapping and -polly-acc-codegen-managed-memory
we add parameter dimensions lazily to the domains, which results in PPCG not
including parameter dimensions that are only used in memory accesses in the
kernel space. To make sure these parameters are still passed to the kernel, we
collect these parameter dimensions and align the kernel's parameter space
before code-generating it.

llvm-svn: 311239
2017-08-19 12:58:28 +00:00
Andreas Simbuerger 8d5b257d02 [Polly][Bug fix] Wrong dependences filtering during Fully Indexed expansion
Summary:
When trying to expand memory accesses, the current version of Polly uses statement Level dependences. The actual implementation is not working in case of multiple dependences per statement. For example in the following source code :
```
void mse(double A[Ni], double B[Nj], double C[Nj], double D[Nj]) {
  int i,j;
  for (j = 0; j < Ni; j++) {
    for (int i = 0; i<Nj; i++)
S:    B[i] = i;
    for (int i = 0; i<Nj; i++)
T:    D[i] = i;

U:  A[j] = B[j];
      C[j] = D[j];
  }
}
```
The statement U has two dependences with S and T. The current version of polly fails during expansion.

This patch aims to fix this bug. For that, we use Reference Level dependences to be able to filter dependences according to statement and memory ref. The principle of expansion remains the same as before.

We also noticed that we need to bail out if load come after store (at the same position) in same statement. So a check was added to isExpandable.

Contributed by: Nicholas Bonfante <nicolas.bonfante@insa-lyon.fr>

Reviewers: Meinersbur, simbuerg, bollu

Reviewed By: Meinersbur, simbuerg

Subscribers: pollydev, llvm-commits

Differential Revision: https://reviews.llvm.org/D36791

llvm-svn: 311165
2017-08-18 15:01:18 +00:00
Tobias Grosser ec02acfb98 [GPGPU] Simplify PPCGSCop to reduce compile time [NFC]
Summary:
Drop unused parameter dimensions to reduce the size of the sets we are working
with. Especially the computed dependences tend to accumulate a lot of parameters
that are present in the input memory accesses, but often not necessary to
express the actual dependences. As isl represents maps and sets with dense
matrices, reducing the dimensionality of isl sets commonly reduces code
generation performance.

This reduces compile time from 17 to 11 seconds for our test case. While this is
not impressive, this patch helped me to identify the previous two performance
improvements and additionally also increases readability of the isl data
structures we use.

Reviewers: Meinersbur, bollu, singam-sanjay

Reviewed By: bollu

Subscribers: nemanjai, pollydev, llvm-commits, kbarton

Tags: #polly

Differential Revision: https://reviews.llvm.org/D36869

llvm-svn: 311161
2017-08-18 13:38:12 +00:00
Siddharth Bhat b46847c035 [ScopInliner] Add a simple Scop-based inliner to polly.
We add a ScopInliner pass which inlines functions based on a simple heuristic:
Let `g` call `f`.
If we can model all of `f` as a Scop, we inline `f` into `g`.

This requires `-polly-detect-full-function` to be enabled. So, the pass
asserts that `-polly-detect-full-function` is enabled.

Differential Revision: https://reviews.llvm.org/D36832

llvm-svn: 311126
2017-08-17 21:57:23 +00:00
Siddharth Bhat a2c4112791 [ManagedMemoryRewrite] Rewrite malloc, free correctly inside `Constant`s.
Reuse the machinery built for replacing global arrays to replace malloc/free as
well. Example replacement that was missed earlier:

```
call void \
    bitcast (void (i8*)* @free to void (%custom_type*)*) (%custom_type* %13)
```

- Since the `bitcast` is a `ConstantExpr`, `replaceAllUsesWith` would miss
this. We don't miss this anymore.

Differential Revision: https://reviews.llvm.org/D36825

llvm-svn: 311121
2017-08-17 20:26:38 +00:00
Tobias Grosser abc5416be1 [GPGPU] Make test case independent of LLVM names
In release builds LLVM may not pass along LLVM names consistently. We make the
test cases independent of the LLVM-IR names to avoid spurious test case
failures.

llvm-svn: 311118
2017-08-17 20:09:02 +00:00
Siddharth Bhat 8a2c07f6d4 [ManagedMemoryRewrite] Learn how to rewrite global arrays, allocas.
- If we have global arrays, we would like to rewrite them to global
  pointers which are allocated using `cudaMallocManaged`.

- If we have allocas in a function, we would like to rewrite them to
  heap-allocations with `cudaMallocManaged` and `cudaFree`.

- With these rewrite mechanisms, we can offload _any_ function to the
  GPU with no code rewrite whatsover.

Differential Revision: https://reviews.llvm.org/D36516

llvm-svn: 311080
2017-08-17 11:22:52 +00:00
Tobias Grosser ed6a4acc7f Add rewrite by-reference parameter pass
Summary:
This pass detangles induction variables from functions, which take variables by
reference. Most fortran functions compiled with gfortran pass variables by
reference. Unfortunately a common pattern, printf calls of induction variables,
prevent in this situation the promotion of the induction variable to a register,
which again inhibits any kind of loop analysis. To work around this issue
we developed a specialized pass which introduces separate alloca slots for
known-read-only references, which indicate the mem2reg pass that the induction
variables can be promoted to registers and consquently enable SCEV to work.

We currently hardcode the information that a function
_gfortran_transfer_integer_write does not read its second parameter, as
dragonegg does not add the right annotations and we cannot change old dragonegg
releases. Hopefully flang will produce the right annotations.

Reviewers: Meinersbur, bollu, singam-sanjay

Reviewed By: bollu

Subscribers: mgorny, pollydev, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D36800

llvm-svn: 311066
2017-08-17 05:25:08 +00:00
Tobias Grosser 5502eb0986 Add missing 'REQUIRES' line
llvm-svn: 311046
2017-08-16 22:02:03 +00:00
Tobias Grosser e2a45f32dc [GPGPU] Also record invariant loads as kernel subtree values
Before this change kernels that used invariant loads would have resulted in
invalid PTX code.

llvm-svn: 311042
2017-08-16 21:37:53 +00:00
Jakub Kuderski 8fb57125b0 [Polly] XFAIL ReportLoopHasNoExit tests after r310940
ReportLoopHasNoExit started failing after r310940 that added
infinite loops to postdominators. The change made regions not
contain infinite loops anymore.

This patch unbreaks the polly tree by XFAILING the
ReportLoopHasNoExit test. Full fix is under review in D36776.

llvm-svn: 310980
2017-08-16 00:18:39 +00:00
Philip Pfaffe c3bcdc2f1a [JSON] Make the failure to parse a jscop file a hard error
Summary:
Before, if we fail to parse a jscop file, this will be reported as an
error and importing is aborted. However, this isn't actually strong
enough, since although the import is aborted, the scop has already been
modified and is very likely broken. Instead, make this a hard failure
and throw an LLVM error. This new behaviour requires small changes to
the tests for the legacy pass, namely using `not` to verify the error.
Further, fixed the jscop file for the
base_pointer_load_is_inst_inside_invariant_1 testcase.

Reviewed By: Meinersbur

Split out of D36578.

llvm-svn: 310599
2017-08-10 14:53:25 +00:00
Philip Pfaffe e18f3f6708 Fix 310555: Require pollyacc instead of asserts
llvm-svn: 310595
2017-08-10 14:21:04 +00:00
Philip Pfaffe 0360e5a3c2 Fix r310304: Fix the lit testcases.
In opt, Polly passes are only available after -load.

llvm-svn: 310581
2017-08-10 10:54:26 +00:00
Tobias Grosser 4db39c4829 Add missing 'REQUIRES' line
llvm-svn: 310555
2017-08-10 08:11:47 +00:00
Tobias Grosser cff9696e11 [GPGPU] Make the ast_build available to block generator
This is necessary for partial writes (as used by delicm) to work.

llvm-svn: 310553
2017-08-10 08:00:56 +00:00
Siddharth Bhat 9298ff2dee [ManagedMemoryRewrite] [Polly] Erase original malloc and free. [NFC]
We do not need to keep `malloc` and `free` around since they are
replaced by `polly_{malloc,free}Managed.`

llvm-svn: 310504
2017-08-09 18:19:46 +00:00
Siddharth Bhat 5a1f872623 [ManagedMemoryRewrite] Remove test case that was submitted by mistake. [NFC]
llvm-svn: 310473
2017-08-09 13:34:54 +00:00
Siddharth Bhat c4a4af47f3 [ManagedMemoryRewrite] Introduce a new pass to rewrite modules to use managed memory.
This pass is useful to automatically convert a codebase that uses malloc/free
to use their managed memory counterparts.

Currently, rewrite malloc and free to the `polly_{malloc,free}Managed` variants.

A future patch will teach ManagedMemoryRewrite to rewrite global arrays
as pointers to globally allocated managed memory.

Differential Revision: https://reviews.llvm.org/D36513

llvm-svn: 310471
2017-08-09 12:59:23 +00:00
Michael Kruse 40d083956c [CodeGen] Use isLatestArrayKind().
Codegen with -polly-parallel queried the unmapped MemoryAccess, but only
the MemoryKind after mapping is relevant for codegen.

This should fix various fails of the
perf-x86_64-penryn-O3-polly-parallel-fast buildbot.

llvm-svn: 310466
2017-08-09 12:27:51 +00:00
Siddharth Bhat 34eeabbca3 [PPCGCodeGeneration] Compute element size in bytes for arrays correctly.
Previously, we used to compute this with `elementSizeInBits / 8`. This
would yield an element size of 0 when the array had element size < 8 in
bits.

To fix this, ask data layout what the size in bytes should be.

Differential Revision: https://reviews.llvm.org/D36459

llvm-svn: 310448
2017-08-09 08:29:16 +00:00
Michael Kruse 235726ee4b [test] Add descriptions and pseudocode to tests. NFC.
llvm-svn: 310385
2017-08-08 17:26:19 +00:00
Roman Gareev 1563f039f5 Use SCEV information for the second level aliasing
We introduce another level of alias metadata to distinguish the individual
non-aliasing accesses that have inter iteration alias-free base pointers
marked with "Inter iteration alias-free" mark nodes. To distinguish two
accesses, the comparison of raw pointers representing base pointers is used.

In case of, for example, ublas's prod function that implements GEMM, and
DeLiCM we can get accesses to same location represented by different raw
pointers. Consequently, we create different alias sets that can prevent
accesses from, for example, being sinked or hoisted.

To avoid the issue, we compare the corresponding SCEV information instead
of the corresponding raw pointers.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D35761

llvm-svn: 310380
2017-08-08 16:50:28 +00:00
Roman Gareev dbde718676 Do not use isl_set_project_out to get all loop prefixes
Currently, only convex isolation sets can be efficiently processed by isl.
Consequently, as a temporary solution, we use a different algorithm for partial
tile isolation that helps to build convex isolation sets in some cases.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D36278

llvm-svn: 310374
2017-08-08 16:15:33 +00:00
Siddharth Bhat 9aca1cb519 [NFC] [PPCGCodeGen] Add missing REQUIRES: pollyacc line.
llvm-svn: 310354
2017-08-08 12:26:37 +00:00
Siddharth Bhat 71dfb3eb07 [Polly] [PPCGCodeGeneration] Handle failing of invariant load hoisting gracefully.
To do this, we replicate what `CodeGeneration` does. We expose
`markNodeUnreachable` from `CodeGeneration` to `PPCGCodeGeneration`.

Differential Revision: https://reviews.llvm.org/D36457

llvm-svn: 310350
2017-08-08 12:00:59 +00:00
Michael Kruse 27c010a22e [DeLICM] Properly handle PHI writes becoming empty partial writes.
It is possible that partial writes are empty (write is never executed).
In this case, when in PHINode's incoming edge is never taken such that
the incoming write becomes an empty partial write, if enabled. The
issue is that when converting the union_map to an map, it's space
cannot be derived from the union_map itself. Rather, we need to
determine its space independently.

This fixes test-suite's MultiSource/Benchmarks/ASC_Sequoia/CrystalMk.

llvm-svn: 310348
2017-08-08 11:27:12 +00:00
Tobias Grosser 327e9ecb0d [ScheduleOptimizer] Make matmul pattern detection work with delicm output
In certain cases delicm might decide to not leave the original array write in
the loop body, but to remove it and instead leave a transformed phi node as
write access. This commit teached the matmul pattern detection to order the
memory accesses according to when the access actually happens and use this
information to detect the new pattern. This makes pattern based matmul
optimization work for 2mm and 3mm in polybench 4 after
polly-position=before-vectorizer has been enabled.

llvm-svn: 310338
2017-08-08 06:15:15 +00:00
Tobias Grosser 736c44c848 [test] Add some missing options that become necessary after the recent default changes
llvm-svn: 310315
2017-08-07 22:10:23 +00:00
Tobias Grosser a98081c9f5 [test] Add one more test case for the previous commit
llvm-svn: 310312
2017-08-07 22:02:06 +00:00
Tobias Grosser 2ef378120d [ZoneAlgo] Allow two writes that write identical values into same array slot
Two write statements which write into the very same array slot generally are
conflicting. However, in case the value that is written is identical, this
does not cause any problem. Hence, allow such write pairs in this specific
situation.

llvm-svn: 310311
2017-08-07 22:01:29 +00:00
Andreas Simbuerger 81fb6b3e40 [Polly] Fully-Indexed static expansion
This commit implements the initial version of fully-indexed static
expansion.

```
 for(int i = 0; i<Ni; i++)
   for(int j = 0; j<Ni; j++)
S:     B[j] = j;
T: A[i] = B[i]
```

After the pass, we want this :
```
 for(int i = 0; i<Ni; i++)
   for(int j = 0; j<Ni; j++)
S:     B[i][j] = j;
T: A[i] = B[i][i]
```

For now we bail (fail) in the following cases:
  - Scalar access
  - Multiple writes per SAI
  - MayWrite Access
  - Expansion that leads to an access to the original array

Furthermore: We still miss checks for escaping references to the array
base pointers. A future commit will add the missing escape-checks to
stay correct in those cases. The expansion is still locked behind a
CLI-Option and should not yet be used.

Patch contributed by: Nicholas Bonfante <bonfante.nicolas@gmail.com>

Reviewers: simbuerg, Meinersbur, bollu

Reviewed By: Meinersbur

Subscribers: mgorny, llvm-commits, pollydev

Differential Revision: https://reviews.llvm.org/D34982

llvm-svn: 310304
2017-08-07 20:54:20 +00:00
Michael Kruse 70af4f579d [ForwardOpTree] Use known array content analysis to forward load instructions.
This is an addition to the -polly-optree pass that reuses the array
content analysis from DeLICM to find array elements that contain the
same value as the value loaded when the target statement instance
is executed.

The analysis is now enabled by default.

The known content analysis could also be used to rematerialize any
llvm::Value that was written to some array element, but currently
only loads are forwarded.

Differential Revision: https://reviews.llvm.org/D36380

llvm-svn: 310279
2017-08-07 18:40:29 +00:00
Tobias Grosser aabfbfa5fc Add missing 'REQUIRES: pollyacc' line
llvm-svn: 310197
2017-08-06 11:21:09 +00:00
Tobias Grosser b99c11710c [GPGPU] Make sure managed arrays are prepared at the beginning of the scop
Summary:
This resolves some "instruction does not dominate use" errors, as we used to
prepare the arrays at the location of the first kernel, which not necessarily
dominated all other kernel calls.

Reviewers: Meinersbur, bollu, singam-sanjay

Subscribers: nemanjai, pollydev, llvm-commits, kbarton

Differential Revision: https://reviews.llvm.org/D36372

llvm-svn: 310196
2017-08-06 11:10:38 +00:00
Tobias Grosser 5b307cdb8a [GPGPU] Rename all, not only the first libdevice function
llvm-svn: 310194
2017-08-06 03:04:15 +00:00
Siddharth Bhat e53c924b0f [Polly] [PPCGCodeGeneration] Deal with loops outside the Scop correctly in PPCGCodeGeneration.
A Scop with a loop outside it is not handled currently by
PPCGCodeGeneration. The test case is such that the Scop has only one inner loop
that is detected. This currently breaks codegen.

The fix is to reuse the existing mechanism in `IslNodeBuilder` within
`GPUNodeBuilder.

Differential Revision: https://reviews.llvm.org/D36290

llvm-svn: 310193
2017-08-06 02:39:05 +00:00
Tobias Grosser c1cfe0a828 Add missing REQUIRES line
llvm-svn: 309943
2017-08-03 14:46:53 +00:00
Tobias Grosser b5563c6817 Make sure that all parameter dimensions are set in schedule
Summary:
In case the option -polly-ignore-parameter-bounds is set, not all parameters
will be added to context and domains. This is useful to keep the size of the
sets and maps we work with small. Unfortunately, for AST generation it is
necessary to ensure all parameters are part of the schedule tree. Hence,
we modify the GPGPU code generation to make sure this is the case.

To obtain the necessary information we expose a new function
Scop::getFullParamSpace(). We also make a couple of functions const to be
able to make SCoP::getFullParamSpace() const.

Reviewers: Meinersbur, bollu, gareevroman, efriedma, huihuiz, sebpop, simbuerg

Subscribers: nemanjai, kbarton, pollydev, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D36243

llvm-svn: 309939
2017-08-03 13:51:15 +00:00
Michael Kruse 291fd8074e [test] Fix test case without Polly-ACC.
llvm-svn: 309938
2017-08-03 13:44:31 +00:00
Siddharth Bhat eadf76d34a [PPCGCodeGeneration] Construct `isl_multi_pw_aff` of PPCGArray.bounds even when polly-ignore-parameter-bounds is turned on.
When we have `-polly-ignore-parameter-bounds`, `Scop::Context` does not contain
all the paramters present in the program.

The construction of the `isl_multi_pw_aff` requires all the indivisual `pw_aff`
to have the same parameter dimensions. To achieve this, we used to realign
every `pw_aff` with `Scop::Context`. However, in conjunction with
`-polly-ignore-parameter-bounds`, this is now incorrect, since `Scop::Context`
does not contain all parameters.

We set this up correctly by creating a space that has all the parameters
used by all the `isl_pw_aff`. Then, we realign all `isl_pw_aff` to this space.

llvm-svn: 309934
2017-08-03 12:09:33 +00:00
Singapuram Sanjay Srivallabh 188053af5e Remove debug metadata from copied instruction to prevent GPUModule verification failure
Summary:
**Remove debug metadata from instruction to be copied to prevent the source file's debug metadata being copied into GPUModule and eventually failing Module verification and ASM string codegeneration.**

When copying the instruction onto the Module meant for the GPU, debug metadata attached to an instruction causes all related metadata to be pulled into the Module, including the DICompileUnit, which is not listed in llvm.dbg.cu of the Module. This fails the verification of the Module and generation of the ASM string.

The only debug metadata of the instruction, the DebugLoc, is unset by this patch.

This patch reattempts https://reviews.llvm.org/D35630 by targeting only those instructions that are to end up in a Module meant for the GPU.

Reviewers: grosser, bollu

Reviewed By: grosser

Subscribers: pollydev

Tags: #polly

Differential Revision: https://reviews.llvm.org/D36161

llvm-svn: 309822
2017-08-02 15:20:07 +00:00
Michael Kruse bc88a78cb4 [Simplify] Rewrite redundant write detection algorithm.
The previous algorithm was to search a writes and the sours of its value
operand, and see whether the write just stores the same read value back,
which includes a search whether there is another write access between
them. This is O(n^2) in the max number of accesses in a statement
(+ the complexity of isl comparing the access functions).

The new algorithm is more similar to the one used for searching for
overwrites and coalescable writes. It scans over all accesses in order
of execution while tracking which array elements still have the same
value since it was read. This is O(n), not counting the complexity
within isl. It should be more reliable than trying to catch all
non-conforming cases in the previous approach. It is also less code.

We now also support if the write is a partial write of the read's
domain, and to some extent non-affine subregions.

Differential Revision: https://reviews.llvm.org/D36137

llvm-svn: 309734
2017-08-01 20:01:34 +00:00
Michael Kruse 693ef99935 [Simplify] Improve scalability.
With a lot of reads and writes to the same array in a statement,
some isl sets that capture the state between access can become
complex such that isl takes more considerable time and memory
for operations on them.

The problems identified were:

- is_subset() takes considerable time with many disjoints in the
  arguments. We limit the number of disjoints to 4, any additional
  information is thrown away.

- subtract() can lead to many disjoints. We instead assume that any
  array element is possibly accessed, which removes all disjoints.

- subtract_domain() may lead to considerable processing, even if all
  elements are are to be removed. Instead, we remove determine and
  remove the affected spaces manually. No behaviour is changed.

llvm-svn: 309728
2017-08-01 19:39:11 +00:00
Siddharth Bhat 1ec9cba4e3 [NFC] Add 'REQUIRES: pollyacc' on 'test/GPGPU/invariant-load-hoisting-of-array.ll'
- Should fix broken build due to `r309681`.

llvm-svn: 309686
2017-08-01 14:52:18 +00:00
Siddharth Bhat edf9581e4c [PPCGCodeGeneration] Correct usage of llvm::Value with getLatestValue.
It is possible that the `HostPtr` that coresponds to an array could be
invariant load hoisted. Make sure we use the invariant load hoisted
value by using `IslNodeBuilder::getLatestValue`.

Differential Revision: https://reviews.llvm.org/D36001

llvm-svn: 309681
2017-08-01 14:26:39 +00:00
Michael Kruse 9f6e41cdba [ForwardOpTree] Support synthesizable values.
This allows -polly-optree to move instructions that depend on
synthesizable values.

The difficulty for synthesizable values is that their value depends on
the location. When it is moved over a loop header, and the SCEV
expression depends on the loop induction variable (SCEVAddRecExpr), it
would use the current induction variable instead of the last one.

At the moment we cannot forward PHI nodes such that crossing the header
of loops referenced by SCEVAddRecExpr is not possible (assuming the loop
header has at least two incoming blocks: for entering the loop and the
backedge, such any instruction to be forwarded must have a phi between
use and definition).

A remaining issue is when the forwarded value is used after the loop,
but is only synthesizable inside the loop. This happens e.g. if
ScalarEvolution is unable to determine the number of loop iterations or
the initial loop value. We do not forward in this situation.

Differential Revision: https://reviews.llvm.org/D36102

llvm-svn: 309609
2017-07-31 19:46:21 +00:00
Michael Kruse 57cc92b790 [Simplify] Remove all kinds of redundant scalar writes.
In addition to array and PHI writes, also allow scalar value writes.
The only kind of write not allowed are writes by functions
(including memcpy/memmove/memset).

llvm-svn: 309582
2017-07-31 17:04:55 +00:00
Tobias Grosser 8fc6cdfb1c [GPGPU] Add support for NVIDIA libdevice
Summary:
This allows us to map functions such as exp, expf, expl, for which no
LLVM intrinsics exist. Instead, we link to NVIDIA's libdevice which provides
high-performance implementations of a wide range of (math) functions. We
currently link only a small subset, the exp, cos and copysign functions. Other
functions will be enabled as needed.

Reviewers: bollu, singam-sanjay

Reviewed By: bollu

Subscribers: tstellar, tra, nemanjai, pollydev, mgorny, llvm-commits, kbarton

Tags: #polly

Differential Revision: https://reviews.llvm.org/D35703

llvm-svn: 309560
2017-07-31 14:03:16 +00:00
Tobias Grosser 39977e4e76 Revert "Remove Debug metadata from copied instruction to prevent Module verification failure"
This reverts commit r309490 as it triggers on our AOSP buildbut error messages
of the form:

inlinable function call in a function with debug info must have a !dbg location

llvm-svn: 309556
2017-07-31 11:43:38 +00:00
Singapuram Sanjay Srivallabh cf9a813368 Remove Debug metadata from copied instruction to prevent Module verification failure
Summary:
**Remove debug metadata from instruction to be copied to prevent the source file's debug metadata being copied into GPUModule and eventually failing Module verification and ASM string codegeneration.**

When copying the instruction onto the Module meant for the GPU, debug metadata attached to an instruction causes all related metadata to be pulled into the Module, including the DICompileUnit, which is not listed in llvm.dbg.cu of the Module. This fails the verification of the Module and generation of the ASM string.

The only debug metadata of the instruction, the DebugLoc, is unset by this patch.

Reviewers: grosser, bollu, Meinersbur

Reviewed By: grosser, bollu

Subscribers: pollydev

Tags: #polly

Differential Revision: https://reviews.llvm.org/D35630

llvm-svn: 309490
2017-07-29 18:03:49 +00:00
Michael Kruse ce9617f4fe [Simplify] Implement write accesses coalescing.
Write coalescing combines write accesses that

- Write the same llvm::Value.
- Write to the same array.
- Unless they do not write anything in a statement instance (partial
  writes), write to the same element.
- There is no other access between them that accesses the same element.

This is particularly useful after DeLICM, which leaves partial writes to
disjoint domains.

Differential Revision: https://reviews.llvm.org/D36010

llvm-svn: 309489
2017-07-29 16:21:16 +00:00
Michael Kruse 4335c3992a [test] Add test case for -polly-simplify. NFC.
llvm-svn: 309458
2017-07-29 00:06:06 +00:00