Commit Graph

472 Commits

Author SHA1 Message Date
Tobias Grosser b3a85884f7 Do not use wrapping ranges to bound non-affine accesses
When deriving the range of valid values of a scalar evolution expression might
be a range [12, 8), where the upper bound is smaller than the lower bound and
where the range is expected to possibly wrap around. We theoretically could
model such a range as a union of two non-wrapping ranges, but do not do this
as of yet. Instead, we just do not derive any bounds. Before this change,
we could have obtained bounds where the maximal possible value is strictly
smaller than the minimal possible value, which is incorrect and also caused
assertions during scop modeling.

llvm-svn: 294891
2017-02-12 08:11:12 +00:00
Tobias Grosser 75dfaa1dbe BlockGenerator: Do not redundantly reload from PHI-allocas in non-affine stmts
Before this change we created an additional reload in the copy of the incoming
block of a PHI node to reload the incoming value, even though the necessary
value has already been made available by the normally generated scalar loads.
In this change, we drop the code that generates this redundant reload and
instead just reuse the scalar value already available.

Besides making the generated code slightly cleaner, this change also makes sure
that scalar loads go through the normal logic, which means they can be remapped
(e.g. to array slots) and corresponding code is generated to load from the
remapped location. Without this change, the original scalar load at the
beginning of the non-affine region would have been remapped, but the redundant
scalar load would continue to load from the old PHI slot location.

It might be possible to further simplify the code in addOperandToPHI,
but this would not only mean to pull out getNewValue, but to also change the
insertion point update logic. As this did not work when trying it the first
time, this change is likely not trivial. To not introduce bugs last minute, we
postpone further simplications to a subsequent commit.

We also document the current behavior a little bit better.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D28892

llvm-svn: 292486
2017-01-19 14:12:45 +00:00
Tobias Grosser bb1e386c4d Update tests to more precise analysis results in LLVM core
LLVM's range analysis became a little tighter, which means Polly can derive
tighter bounds as well.

llvm-svn: 291718
2017-01-11 22:53:34 +00:00
Tobias Grosser 94e5371dde Update to isl-0.18-43-g0b4256f
Even more isl coalesce changes.

llvm-svn: 290783
2016-12-31 07:46:11 +00:00
Tobias Grosser bedef00e2c [ScopInfo] Fold constant coefficients in array dimensions to the right
This allows us to delinearize code such as the one below, where the array
sizes are A[][2 * n] as there are n times two elements in the innermost
dimension. Alternatively, we could try to generate another dimension for the
struct in the innermost dimension, but as the struct has constant size,
recovering this dimension is easy.

   struct com {
     double Real;
     double Img;
   };

   void foo(long n, struct com A[][n]) {
     for (long i = 0; i < 100; i++)
       for (long j = 0; j < 1000; j++)
         A[i][j].Real += A[i][j].Img;
   }

   int main() {
     struct com A[100][1000];
     foo(1000, A);

llvm-svn: 288489
2016-12-02 08:10:56 +00:00
Johannes Doerfert b6c5a5dd01 [FIX] Do not try to hoist obviously overwritten loads
llvm-svn: 288328
2016-12-01 11:10:45 +00:00
Tobias Grosser b45ae5601b [ScopDetect] Expand statistics of the detected scops
We now collect:

  Number of total loops
  Number of loops in scops
  Number of scops
  Number of scops with maximal loop depth 1
  Number of scops with maximal loop depth 2
  Number of scops with maximal loop depth 3
  Number of scops with maximal loop depth 4
  Number of scops with maximal loop depth 5
  Number of scops with maximal loop depth 6 and larger
  Number of loops in scops (profitable scops only)
  Number of scops (profitable scops only)
  Number of scops with maximal loop depth 1 (profitable scops only)
  Number of scops with maximal loop depth 2 (profitable scops only)
  Number of scops with maximal loop depth 3 (profitable scops only)
  Number of scops with maximal loop depth 4 (profitable scops only)
  Number of scops with maximal loop depth 5 (profitable scops only)
  Number of scops with maximal loop depth 6 and larger (profitable scops only)

These statistics are certainly completely accurate as we might drop scops
when building up their polyhedral representation, but they should give a good
indication of the number of scops we detect.

llvm-svn: 287973
2016-11-26 07:37:46 +00:00
Tobias Grosser 5c00b0dc74 [ScopDetectionDiagnostic] Collect statistics for each diagnostic type
Our original statistics were added before we introduced a more fine-grained
diagnostic system, but the granularity of our statistics has never been
increased accordingly. This change introduces now one statistic counter per
diagnostic to enable us to collect fine-grained statistics about who certain
scops are not detected. In case coarser grained statistics are needed, the
user is expected to combine counters manually.

llvm-svn: 287968
2016-11-26 05:53:09 +00:00
Johannes Doerfert 6cd59e9076 Probably overwritten loads should not be considered hoistable
Do not assume a load to be hoistable/invariant if the pointer is used by
another instruction in the SCoP that might write to memory and that is
always executed.

llvm-svn: 287272
2016-11-17 22:25:17 +00:00
Tobias Grosser a2f8fa33aa [ScopDetect] Evaluate and verify branches at branch condition, not icmp
The validity of a branch condition must be verified at the location of the
branch (the branch instruction), not the location of the icmp that is
used in the branch instruction. When verifying at the wrong location, we
may accept an icmp that is defined within a loop which itself dominates, but
does not contain the branch instruction. Such loops cannot be modeled as
we only introduce domain dimensions for surrounding loops. To address this
problem we change the scop detection to evaluate and verify SCEV expressions at
the right location.

This issue has been around since at least r179148 "scop detection: properly
instantiate SCEVs to the place where they are used", where we explicitly
set the scope to the wrong location. Before this commit the scope
was not explicitly set, which probably also resulted in the scope around the
ICmp to be choosen.

This resolves http://llvm.org/PR30989

Reported-by: Eli Friedman <efriedma@codeaurora.org>
llvm-svn: 286769
2016-11-13 19:27:04 +00:00
Tobias Grosser f67433abd9 SCEVAffinator: pass parameter-only set to addRestriction if BB=nullptr
Assumptions can either be added for a given basic block, in which case the set
describing the assumptions is expected to match the dimensions of its domain.
In case no basic block is provided a parameter-only set is expected to describe
the assumption.

The piecewise expressions that are generated by the SCEVAffinator sometimes
have a zero-dimensional domain (e.g., [p] -> { [] : p <= -129 or p >= 128 }),
which looks similar to a parameter-only domain, but is still a set domain.

This change adds an assert that checks that we always pass parameter domains to
addAssumptions if BB is empty to make mismatches here fail early.

We also change visitTruncExpr to always convert to parameter sets, if BB is
null. This change resolves http://llvm.org/PR30941

Another alternative to this change would have been to inspect all code to make
sure we directly generate in the SCEV affinator parameter sets in case of empty
domains. However, this would likely complicate the code which combines parameter
and non-parameter domains when constructing a statement domain. We might still
consider doing this at some point, but as this likely requires several non-local
changes this should probably be done as a separate refactoring.

Reported-by: Eli Friedman <efriedma@codeaurora.org>
llvm-svn: 286444
2016-11-10 11:44:10 +00:00
Tobias Grosser 4d543d654a SCEVValidator: add new parameters resulting from constant extraction
When extracting constant expressions out of SCEVs, new parameters may be
introduced, which have not been registered before. This change scans
SCEV expressions after constant extraction again to make sure newly
introduced parameters are registered.

We may for example extract the constant '8' from the expression '((8 * ((%a *
%b) + %c)) + (-8 * %a))' and obtain the expression '(((-1 + %b) * %a) + %c)'.
The new expression has a new parameter '(-1 + %b) * %a)', which was not
registered before, but must be registered to not crash.

This closes http://llvm.org/PR30953

Reported-by: Eli Friedman <efriedma@codeaurora.org>
llvm-svn: 286430
2016-11-10 06:45:28 +00:00
Eli Friedman b9c6f01a81 [ScopInfo] Make memset etc. affine where possible.
We don't actually check whether a MemoryAccess is affine in very many
places, but one important one is in checks for aliasing.

Differential Revision: https://reviews.llvm.org/D25706

llvm-svn: 285746
2016-11-01 20:53:11 +00:00
Eli Friedman 6768285dcc Add missing test from r284848.
Original commit title: [SCEVAffinator] Make precise modular math
more correct.

llvm-svn: 285745
2016-11-01 20:45:28 +00:00
Michael Kruse 426e6f71f8 [ScopInfo] Fix: use raw source pointer.
When adding an llvm.memcpy instruction to AliasSetTracker, it uses the raw
source and target pointers which preserve bitcasts.
MemAccInst::getPointerOperand() also returns the raw target pointers, but
Scop::buildAliasGroups() did not for the source pointer. This lead to mismatches
between AliasSetTracker and ScopInfo on which pointer to use.

Fixed by also using raw pointers in Scop::buildAliasGroups().

llvm-svn: 285071
2016-10-25 13:37:43 +00:00
Eli Friedman 286c5a76ba [SCEVAffinator] Make precise modular math more correct.
Integer math in LLVM IR is modular. Integer math in isl is
arbitrary-precision. Modeling LLVM IR math correctly in isl requires
either adding assumptions that math doesn't actually overflow, or
explicitly wrapping the math. However, expressions with the "nsw" flag
are special; we can pretend they're arbitrary-precision because it's
undefined behavior if the result wraps. SCEV expressions based on IR
instructions with an nsw flag also carry an nsw flag (roughly; actually,
the real rule is a bit more complicated, but the details don't matter
here).

Before this patch, SCEV flags were also overloaded with an additional
function: the ZExt code was mutating SCEV expressions as a hack to
indicate to checkForWrapping that we don't need to add assumptions to
the operand of a ZExt; it'll add explicit wrapping itself. This kind of
works... the problem is that if anything else ever touches that SCEV
expression, it'll get confused by the incorrect flags.

Instead, with this patch, we make the decision about whether to
explicitly wrap the math a bit earlier, basing the decision purely on
the SCEV expression itself, and not its users.

Differential Revision: https://reviews.llvm.org/D25287

llvm-svn: 284848
2016-10-21 18:08:02 +00:00
Michael Kruse 6b87504973 [test] Fix buildbot after SCEV change.
Update test after commit r284501:
[SCEV] Make CompareValueComplexity a little bit smarter

Contributed-by: Sanjoy Das <sanjoy@playingwithpointers.com>
llvm-svn: 284543
2016-10-18 22:58:09 +00:00
Michael Kruse 2ddb279a39 [test] Add missing colon.
llvm-svn: 284349
2016-10-16 22:05:51 +00:00
Michael Kruse f0c06900ed [test] Add -polly-unprofitable-scalar-accs to test that needs it.
The test non_affine_loop_used_later.ll also tests the profability heuristic. Add
the option -polly-unprofitable-scalar-accs explicitely to ensure that the test
succeeds if the default value is changed.

llvm-svn: 284338
2016-10-16 18:13:01 +00:00
Michael Kruse fa53c86dc1 [ScopInfo/CodeGen] ExitPHI reads are implicit.
Under some conditions MK_Value read accessed where converted to MK_ExitPHI read
accessed. This is unexpected because MK_ExitPHI read accesses are implicit after
the scop execution. This behaviour was introduced in r265261, which fixed a
failed assertion/crash in CodeGen.

Instead, we fix this failure in CodeGen itself. createExitPHINodeMerges(),
despite its name, also handles accesses of kind MK_Value, only to skip them
because they access values that are usually not PHI nodes in the SCoP region's
exit block. Except in the situation observed in r265261.

Do not convert value accessed to ExitPHI accesses and do not handle
value accesses like ExitPHI accessed in CodeGen anymore.

llvm-svn: 284023
2016-10-12 16:31:09 +00:00
Michael Kruse 6ab4476835 [ScopInfo] Add -polly-unprofitable-scalar-accs option.
With this option one can disable the heuristic that assumes that statements with
a scalar write access cannot be profitably optimized. Such a statement instances
necessarily have WAW-dependences to itself. With DeLICM scalar accesses can be
changed to array accesses, which can avoid these WAW-dependence.

llvm-svn: 283233
2016-10-04 17:33:39 +00:00
Michael Kruse ca7cbcca37 [ScopInfo] Scalar access do not have indirect base pointers.
ScopArrayInfo used to determine base pointer origins by looking up whether the
base pointer is a load. The "base pointer" for scalar accesses is the
llvm::Value being accessed. This is only a symbolic base pointer, it
represents the alloca variable (.s2a or .phiops) generated for it at code
generation.

This patch disables determining base pointer origin for scalars.
A test case where this caused a crash will be added in the next commit. In that
test SAI tried to get the origin base pointer that was only declared later,
therefore not existing. This is probably only possible for scalars used in
PHINode incoming blocks.

llvm-svn: 283232
2016-10-04 17:33:34 +00:00
Michael Kruse 4a080de057 Add %loadPolly to test command line.
Required for out-of-tree builds of Polly.

llvm-svn: 279657
2016-08-24 19:12:48 +00:00
Eli Friedman 28671c83d6 [SCEVValidator] Don't reorder multiplies in extractConstantFactor.
The existing code would add the operands in the wrong order, and eventually
crash because the SCEV expression doesn't exactly match the parameter SCEV
expression in SCEVAffinator::visit. (SCEV doesn't sort the operands to
getMulExpr in general.)

Differential Revision: https://reviews.llvm.org/D23592

llvm-svn: 279087
2016-08-18 16:30:42 +00:00
Tobias Grosser b143e31164 [ScopInfo] Make scalars used by PHIs in non-affine regions available
Normally this is ensured when adding PHI nodes, but as PHI node dependences
do not need to be added in case all incoming blocks are within the same
non-affine region, this was missed.

This corrects an issue visible in LNT's sqlite3, in case invariant load hoisting
was disabled.

llvm-svn: 278792
2016-08-16 11:44:48 +00:00
Tobias Grosser 7cb809983d [tests] Force invariant load hoisting for test cases that need it -- III
llvm-svn: 278673
2016-08-15 15:56:24 +00:00
Tobias Grosser ad61c170d5 [tests] Force invariant load hoisting for test cases that need it II
llvm-svn: 278669
2016-08-15 13:58:16 +00:00
Tobias Grosser 75b9c7df4d [test] Correct spelling in test case
and explicitly enable invariant load hoisting for this test case.

llvm-svn: 278668
2016-08-15 13:58:04 +00:00
Tobias Grosser 6e6264c142 [tests] Force invariant load hoisting for test cases that need it
This will make it easier to switch the default of Polly's invariant load
hoisting strategy and also makes it very clear that these test cases
indeed require invariant code hoisting to work.

llvm-svn: 278667
2016-08-15 13:27:49 +00:00
Tobias Grosser c59b3ce044 [BlockGenerator] Also eliminate dead code not originating from BB
After having generated the code for a ScopStmt, we run a simple dead-code
elimination that drops all instructions that are known to be and remain unused.
Until this change, we only considered instructions for dead-code elimination, if
they have a corresponding instruction in the original BB that belongs to
ScopStmt. However, when generating code we do not only copy code from the BB
belonging to a ScopStmt, but also generate code for operands referenced from BB.
After this change, we now also considers code for dead code elimination, which
does not have a corresponding instruction in BB.

This fixes a bug in Polly-ACC where such dead-code referenced CPU code from
within a GPU kernel, which is possible as we do not guarantee that all variables
that are used in known-dead-code are moved to the GPU.

llvm-svn: 278103
2016-08-09 08:59:05 +00:00
Tobias Grosser 9d12d8ade3 BlockGenerator: remove dead instructions in normal statements
This ensures that no trivially dead code is generated. This is not only cleaner,
but also avoids troubles in case code is generated in a separate function and
some of this dead code contains references to values that are not available.
This issue may happen, in case the memory access functions have been updated
and old getelementptr instructions remain in the code. With normal Polly,
a test case is difficult to draft, but the upcoming GPU code generation can
possibly trigger such problems. We will later extend this dead-code elimination
to region and vector statements.

llvm-svn: 276263
2016-07-21 11:48:36 +00:00
Michael Kruse 3b0a9934fa Add CHECK line to test case. NFC.
Check not only that the compiler is not crashing, but also whether the
probablematic part (The sequence of instructions simplified to '4') is reflected
in the output.

Thanks to Tobias for the hint.

llvm-svn: 275189
2016-07-12 16:37:50 +00:00
Michael Kruse e448364320 [SCEVAffinator] Fix assertion checking for constant divisor.
An assertion in visitSDivInstruction() checked whether the divisor is constant
by checking whether the argument is a ConstantInt. However, SCEVValidator allows
the divisor to be simplified to a constant by ScalarEvolution.

We synchronize the implementation of SCEVValidator and SCEVAffinator to both
accept simplified SCEV expressions.

llvm-svn: 275174
2016-07-12 15:08:47 +00:00
Tobias Grosser 42eef3acd7 Add test case forgotten in r275053
llvm-svn: 275055
2016-07-11 12:15:06 +00:00
Michael Kruse 586e579fe8 Fix assertion due to buildMemoryAccess.
For llvm the memory accesses from nonaffine loops should be visible,
however for polly those nonaffine loops should be invisible/boxed.

This fixes llvm.org/PR28245

Cointributed-by: Huihui Zhang <huihuiz@codeaurora.org>

Differential Revision: http://reviews.llvm.org/D21591

llvm-svn: 274842
2016-07-08 12:38:28 +00:00
Tobias Grosser 2ea7c6e8d1 Ensure parameter names are isl-compatible
Without this change it is not possible for isl to parse the resulting objects
from their string representation.

llvm-svn: 274350
2016-07-01 13:40:28 +00:00
Johannes Doerfert 4ba65a5622 [GSoC 2016]New function pass ScopInfoWrapperPass
This patch adds a new function pass ScopInfoWrapperPass so that the
polyhedral description of a region, the SCoP, can be constructed and
used in a function pass.

Patch by Utpal Bora <cs14mtech11017@iith.ac.in>

Differential Revision: http://reviews.llvm.org/D20962

llvm-svn: 273856
2016-06-27 09:32:30 +00:00
Tobias Grosser 423642a597 Recommit: "Look through IntToPtr & PtrToInt instructions"
IntToPtr and PtrToInt instructions are basically no-ops that we can handle as
such. In order to generate them properly as parameters we had to improve the
ScopExpander, though the change is the first in the direction of a more
aggressive scalar synthetization.

This patch was originally contributed by Johannes Doerfert in r271888, but was
in conflict with the revert in r272483. This is a recommit with some minor
adjustment to the test cases to take care of differing instruction names.

llvm-svn: 272485
2016-06-11 19:26:08 +00:00
Tobias Grosser 3717aa5ddb This reverts recent expression type changes
The recent expression type changes still need more discussion, which will happen
on phabricator or on the mailing list. The precise list of commits reverted are:

- "Refactor division generation code"
- "[NFC] Generate runtime checks after the SCoP"
- "[FIX] Determine insertion point during SCEV expansion"
- "Look through IntToPtr & PtrToInt instructions"
- "Use minimal types for generated expressions"
- "Temporarily promote values to i64 again"
- "[NFC] Avoid unnecessary comparison for min/max expressions"
- "[Polly] Fix -Wunused-variable warnings (NFC)"
- "[NFC] Simplify min/max expression generation"
- "Simplify the type adjustment in the IslExprBuilder"

Some of them are just reverted as we would otherwise get conflicts. I will try
to re-commit them if possible.

llvm-svn: 272483
2016-06-11 19:17:15 +00:00
Johannes Doerfert 695c6b476a [FIX] Model the rounding behaviour of SRem correctly
llvm-svn: 272001
2016-06-07 12:00:37 +00:00
Johannes Doerfert c0ece9b67e [NFC] Generate runtime checks after the SCoP
We now generate runtime checks __after__ the SCoP code generation and
  not before, though they are still inserted at the same position int
  the code. This allows to modify the runtime check during SCoP code
  generation.

llvm-svn: 271894
2016-06-06 13:32:52 +00:00
Johannes Doerfert dedb7693ec Look through IntToPtr & PtrToInt instructions
IntToPtr and PtrToInt instructions are basically no-ops that we can handle as
  such. In order to generate them properly as parameters we had to improve the
  ScopExpander, though the change is the first in the direction of a more
  aggressive scalar synthetization.

llvm-svn: 271888
2016-06-06 12:12:27 +00:00
Johannes Doerfert 4b2fd892ec [FIX] Do not recognize division by 0 as affine
llvm-svn: 271885
2016-06-06 12:08:34 +00:00
Johannes Doerfert 0767a511ba Use minimal types for generated expressions
We now use the minimal necessary bit width for the generated code. If
  operations might overflow (add/sub/mul) we will try to adjust the types in
  order to ensure a non-wrapping computation. If the type adjustment is not
  possible, thus the necessary type is bigger than the type value of
  --polly-max-expr-bit-width, we will use assumptions to verify the computation
  will not wrap. However, for run-time checks we cannot build assumptions but
  instead utilize overflow tracking intrinsics.

llvm-svn: 271878
2016-06-06 09:57:41 +00:00
Sanjoy Das 2084d784df [Polly] Fix test case after rL271151
Summary:
After rL271151 (SCEV change) SCEV no longer unconditionally transfers
nuw/nsw from the increment operation to the post-inc value; this
transfer only happens if there is undefined behavior in the program if
the increment overflowed (as opposed to just generating poison).

The loops in `wraping_signed_expr_1.ll` are in non-canonical
form (they're not rotated), and that defeats LLVM's poison-is-UB
analysis.  IMO the easiest fix here is to run `wraping_signed_expr_1.ll`
through `-loop-rotate` to canonicalize the loops, which is what this
patch does.

Reviewers: jdoerfert, Meinersbur, grosser

Subscribers: grosser, mcrosier, pollydev

Differential Revision: http://reviews.llvm.org/D20778

llvm-svn: 271536
2016-06-02 16:58:41 +00:00
Johannes Doerfert 6631bfdd1c [FIX] Correctly translate i1 expressions
llvm-svn: 271534
2016-06-02 16:57:12 +00:00
Johannes Doerfert fd7ddf1479 [FIX] Test case broken by r271522.
llvm-svn: 271531
2016-06-02 16:33:01 +00:00
Tobias Grosser 6a114505c4 Temporarily xfail test case which broke after a fix in SCEV
This should keep the buildbots quite while we decide how to update the test
case.

llvm-svn: 271169
2016-05-29 06:03:44 +00:00
Johannes Doerfert 25227fe7b0 Optimistic assume required invariant loads to be invariant
Before this patch we bailed if a required invariant load was potentially
  overwritten. However, now we will optimistically assume it is actually
  invariant and, to this end, restrict the valid parameter space as well as the
  execution context with regards to potential overwrites of the location.

llvm-svn: 270416
2016-05-23 10:40:54 +00:00
Johannes Doerfert cda1bd5048 Revert "Optimistic assume required invariant loads to be invariant"
This reverts commit 787e642207ca978f2e800140529fc7049ea1f3de until the
lnt failures are fixed.

llvm-svn: 270061
2016-05-19 13:47:34 +00:00
Johannes Doerfert cb77542d1c Optimistic assume required invariant loads to be invariant
So far we bailed if a required invariant load was potentially overwritten in
  the SCoP. From now on we will optimistically assume it is actually invariant
  and, to this end, restrict the valid parameter space.

llvm-svn: 270060
2016-05-19 13:24:10 +00:00
Johannes Doerfert 60dd9e1346 Compute the MaxLoopDepth during domain construction [NFC]
llvm-svn: 270052
2016-05-19 12:33:14 +00:00
Johannes Doerfert 6f1bb7a9d9 Support truncate operations
Truncate operations are basically modulo operations, thus we can model
  them that way. However, for large types we assume the operand to fit
  in the new type size instead of introducing a modulo with a very large
  constant.

llvm-svn: 269300
2016-05-12 15:13:49 +00:00
Johannes Doerfert 27d12d3d1f Invalidate unprofitable SCoPs after creation
If a profitable run is performed we will check if the SCoP seems to be
  profitable after creation but before e.g., dependence are computed. This is
  needed as SCoP detection only approximates the actual SCoP representation.
  In the end this should allow us to be less conservative during the SCoP
  detection while keeping the compile time in check.

llvm-svn: 269074
2016-05-10 16:38:09 +00:00
Johannes Doerfert bf9473b2d8 Weaken profitability constraints during ScopDetection
Regions with one affine loop can be profitable if the loop is
  distributable. To this end we will allow them to be treated as
  profitable if they contain at least two non-trivial basic blocks.

llvm-svn: 269064
2016-05-10 14:42:30 +00:00
Johannes Doerfert ede4ecaefb [FIX] Cleanup isl objects prior to early exit
llvm-svn: 269061
2016-05-10 14:01:21 +00:00
Johannes Doerfert 2b92a0e4ee Handle llvm.assume inside the SCoP
The assumption attached to an llvm.assume in the SCoP needs to be
  combined with the domain of the surrounding statement but can
  nevertheless be used to refine the context.

  This fixes the problems mentioned in PR27067.

llvm-svn: 269060
2016-05-10 14:00:57 +00:00
Johannes Doerfert 297c720d15 Propagate complexity problems during domain generation [NFC]
This patches makes the propagation of complexity problems during
  domain generation consistent. Additionally, it makes it less likely to
  encounter ill-formed domains later, e.g., during schedule generation.

llvm-svn: 269055
2016-05-10 13:06:42 +00:00
Johannes Doerfert 14b1cf35b5 [FIX] Create error-restrictions late
Before this patch we generated error-restrictions only for
  error-blocks, thus blocks (or regions) containing a not represented
  function call. However, the same reasoning is needed if the invalid
  domain of a statement subsumes its actual domain. To this end we move
  the generation of error-restrictions after the propagation of the
  invalid domains. Consequently, error-statements are now defined more
  general as statements that are assumed to be not executed.
  Additionally, we do not record an empty domain for such statements but
  a nullptr instead. This allows to distinguish between error-statements
  and dead-statements.

llvm-svn: 269053
2016-05-10 12:42:26 +00:00
Johannes Doerfert a60ad845c0 Simplify the internal representation according to the context [NFC]
We now use context information to simplify the domains and access
  functions of the SCoP instead of just aligning them with the parameter
  space.

llvm-svn: 269048
2016-05-10 12:18:22 +00:00
Johannes Doerfert 172dd8b923 Allow unsigned divisions
After zero-extend operations and unsigned comparisons we now allow
  unsigned divisions. The handling is basically the same as for signed
  division, except the interpretation of the operands. As the divisor
  has to be constant in both cases we can simply interpret it as an
  unsigned value without additional complexity in the representation.
  For the dividend we could choose from the different representation
  schemes introduced for zero-extend operations but for now we will
  simply use an assumption.

llvm-svn: 268032
2016-04-29 11:53:35 +00:00
Johannes Doerfert 99ec00a2bb [FIX] Typo
llvm-svn: 268027
2016-04-29 10:47:07 +00:00
Johannes Doerfert 3e48ee2ab9 [FIX] Unsigned comparisons change invalid domain
It does not suffice to take a global assumptions for unsigned comparisons but
  we also need to adjust the invalid domain of the statements guarded by such
  an assumption. To this end we allow to specialize the getPwAff call now in
  order to indicate unsigned interpretation.

llvm-svn: 268025
2016-04-29 10:44:41 +00:00
Johannes Doerfert 8475d1c163 [FIX] Correct assumption simplification
Assumptions and restrictions can both be simplified with the domain of a
  statement but not the same way. After this patch we will correctly
  distinguish them.

llvm-svn: 267885
2016-04-28 14:32:58 +00:00
Johannes Doerfert 792374b941 Allow unsigned comparisons
With this patch we will optimistically assume that the result of an unsigned
  comparison is the same as the result of the same comparison interpreted as
  signed.

llvm-svn: 267559
2016-04-26 14:33:12 +00:00
Johannes Doerfert 323ab3975b [FIX] Adjust assumption space for zext instructions
llvm-svn: 267552
2016-04-26 12:44:01 +00:00
Johannes Doerfert 625bb1fc10 Do not add but record signed-unsigned assumptions
llvm-svn: 267528
2016-04-26 09:16:36 +00:00
Johannes Doerfert 9cc8340fea Extract some constant factors from "SCEVAddExprs"
Additive expressions can have constant factors too that we can extract
  and thereby simplify the internal representation. For now we do
  compute the gcd of all constant factors but only extract the same
  (possibly negated) factor if there is one.

llvm-svn: 267445
2016-04-25 19:09:10 +00:00
Johannes Doerfert d5c369f460 Do not check all GEPs for assumptions
Before, we checked all GEPs in a statement in order to derive
  out-of-bound assumptions. However, this can not only introduce new
  parameters but it is also not clear what we can learn from GEPs that
  are not immediately used in a memory accesses inside the SCoP. As this
  case is very rare, no actual change in the behaviour is expected.

llvm-svn: 267442
2016-04-25 18:55:15 +00:00
Johannes Doerfert c78ce7dc21 Only add user assumptions on known parameters [NFC]
Before, assumptions derived from llvm.assume could reference new
  parameters that were not known to the SCoP before. These were neither
  beneficial to the representation nor to the user that reads the
  emitted remark. Now we project them out and keep only user assumptions
  on known parameters. Nevertheless, the new parameters are still part
  of the SCoPs parameter space as the SCEVAffinator currently adds them
  on demand.

llvm-svn: 267441
2016-04-25 18:51:27 +00:00
Johannes Doerfert c3596284c3 Model zext-extend instructions
A zero-extended value can be interpreted as a piecewise defined signed
  value. If the value was non-negative it stays the same, otherwise it
  is the sum of the original value and 2^n where n is the bit-width of
  the original (or operand) type. Examples:
    zext i8 127 to i32 -> { [127] }
    zext i8  -1 to i32 -> { [256 + (-1)] } = { [255] }
    zext i8  %v to i32 -> [v] -> { [v] | v >= 0; [256 + v] | v < 0 }

  However, LLVM/Scalar Evolution uses zero-extend (potentially lead by a
  truncate) to represent some forms of modulo computation. The left-hand side
  of the condition in the code below would result in the SCEV
  "zext i1 <false, +, true>for.body" which is just another description
  of the C expression "i & 1 != 0" or, equivalently, "i % 2 != 0".

    for (i = 0; i < N; i++)
      if (i & 1 != 0 /* == i % 2 */)
        /* do something */

  If we do not make the modulo explicit but only use the mechanism described
  above we will get the very restrictive assumption "N < 3", because for all
  values of N >= 3 the SCEVAddRecExpr operand of the zero-extend would wrap.
  Alternatively, we can make the modulo in the operand explicit in the
  resulting piecewise function and thereby avoid the assumption on N. For the
  example this would result in the following piecewise affine function:
  { [i0] -> [(1)] : 2*floor((-1 + i0)/2) = -1 + i0;
    [i0] -> [(0)] : 2*floor((i0)/2) = i0 }
  To this end we can first determine if the (immediate) operand of the
  zero-extend can wrap and, in case it might, we will use explicit modulo
  semantic to compute the result instead of emitting non-wrapping assumptions.

  Note that operands with large bit-widths are less likely to be negative
  because it would result in a very large access offset or loop bound after the
  zero-extend. To this end one can optimistically assume the operand to be
  positive and avoid the piecewise definition if the bit-width is bigger than
  some threshold (here MaxZextSmallBitWidth).

  We choose to go with a hybrid solution of all modeling techniques described
  above. For small bit-widths (up to MaxZextSmallBitWidth) we will model the
  wrapping explicitly and use a piecewise defined function. However, if the
  bit-width is bigger than MaxZextSmallBitWidth we will employ overflow
  assumptions and assume the "former negative" piece will not exist.

llvm-svn: 267408
2016-04-25 14:01:36 +00:00
Johannes Doerfert 85676e3674 Add an invalid domain to memory accesses
Memory accesses can have non-precisely modeled access functions that
  would cause us to build incorrect execution context for hoisted loads.
  This is the same issue that occurred during the domain construction for
  statements and it is dealt with the same way.

llvm-svn: 267289
2016-04-23 14:32:34 +00:00
Johannes Doerfert ac9c32e216 Translate SCEVs to isl_pw_aff and their invalid domain
The SCEVAffinator will now produce not only the isl representaiton of
  a SCEV but also the domain under which it is invalid. This is used to
  record possible overflows that can happen in the statement domains in
  the statements invalid domain. The result is that invalid loads have
  an accurate execution contexts with regards to the validity of their
  statements domain. While the SCEVAffinator currently is only taking
  "no-wrapping" assumptions, we can add more withouth worrying about the
  execution context of loads that are optimistically hoisted.

llvm-svn: 267288
2016-04-23 14:31:17 +00:00
Johannes Doerfert 1dc12aff8a Simplify the execution context for dereferencable loads
If we know it is safe to execute a load we do not need an execution
  context, however only if we are sure it was modeled correctly.

llvm-svn: 267284
2016-04-23 12:59:18 +00:00
Johannes Doerfert d77089e62d Bail for complex execution contexts of invariant loads
llvm-svn: 267146
2016-04-22 11:41:14 +00:00
Tobias Grosser ad25f0fee6 Update debug metadata after LLVM commits r266445+r266446
llvm-svn: 266473
2016-04-15 20:51:27 +00:00
Michael Kruse 5c7d0cb834 Add contexts to test cases. NFC.
As discussed in the Polly weekly phone call and reviews.llvm.org/D18878,
the assumed contexts changed (widen) due to D18878/r265942. Also check
these contexts in the tests affected by that change.

llvm-svn: 266323
2016-04-14 15:22:13 +00:00
Johannes Doerfert fb72187fdd [FIX] Check the invalid context agains the context to rule out SCoPs
llvm-svn: 266096
2016-04-12 17:54:29 +00:00
Johannes Doerfert 615e0b85f8 Record wrapping assumptions early
Utilizing the record option for assumptions we can simplify the wrapping
  assumption generation a lot. Additionally, we can now report locations
  together with wrapping assumptions, though they might not be accurate yet.

llvm-svn: 266069
2016-04-12 13:28:39 +00:00
Johannes Doerfert 3bf6e4129f Record assumptions first and add them later
There are three reasons why we want to record assumptions first before we
add them to the assumed/invalid context:

  1) If the SCoP is not profitable or otherwise invalid without the
     assumed/invalid context we do not have to compute it.
  2) Information about the context are gathered rather late in the SCoP
     construction (basically after we know all parameters), thus the user
     might see overly complicated assumptions to be taken while they would
     have been simplified later on.
  3) Currently we cannot take assumptions at any point but have to wait,
     e.g., for the domain generation to finish. This makes wrapping
     assumptions much more complicated as they need to be and it will
     have a similar effect on "signed-unsigned" assumptions later.

llvm-svn: 266068
2016-04-12 13:27:35 +00:00
Johannes Doerfert 7c01357cef Introduce an invalid context for each statement
Collect the error domain contexts (formerly in the ErrorDomainCtxMap)
  for each statement in the new InvalidContext member variable. While
  this commit is basically a [NFC] it is a first step to make hoisting
  sound by allowing a more fine grained record of invalid contexts,
  e.g., here on statement level.

llvm-svn: 266053
2016-04-12 09:57:34 +00:00
Michael Kruse 3b425ff232 Allow overflow of indices with constant dim-sizes.
Allow overflow of indices into the next higher dimension if it has
constant size. E.g.

    float A[32][2];
    ((float*)A)[5];

is effectively the same as

    A[2][1];

This can happen since r265379 as a side effect if ScopDetection
recognizes an access as affine, but ScopInfo rejects the GetElementPtr.

Differential Revision: http://reviews.llvm.org/D18878

llvm-svn: 265942
2016-04-11 14:34:08 +00:00
Johannes Doerfert 561d36b320 Allow pointer expressions in SCEVs again.
In r247147 we disabled pointer expressions because the IslExprBuilder did not
  fully support them. This patch reintroduces them by simply treating them as
  integers. The only special handling for pointers that is left detects the
  comparison of two address_of operands and uses an unsigned compare.

llvm-svn: 265894
2016-04-10 09:50:10 +00:00
Johannes Doerfert 41725a1e7a [FIX] Do not crash on opaque (unsized) types.
llvm-svn: 265834
2016-04-08 19:20:03 +00:00
Michael Kruse b3de24f5e6 Add testcase from PR27218. NFC.
The the bug has already been fixed r265795, but this second testcase
still useful.

llvm-svn: 265809
2016-04-08 16:54:53 +00:00
Michael Kruse 436c90619c [ScopInfo] Fix check for element size mismatch.
The way to get the elements size with getPrimitiveSizeInBits() is not
the same as used in other parts of Polly which should use
DataLayout::getTypeAllocSize(). Its use only queries the size of the
pointer and getPrimitiveSizeInBits returns 0 for types that require a
DataLayout object such as pointers.

Together with r265379, this should fix PR27195.

llvm-svn: 265795
2016-04-08 16:20:08 +00:00
Johannes Doerfert 3ef78d6d38 [FIX] Adjust execution context of hoisted loads wrt. error domains
If we build the domains for error blocks and later remove them we lose
  the information that they are not executed. Thus, in the SCoP it looks
  like the control will always reach the statement S:

            for (i = 0 ... N)
                if (*valid == 0)
                  doSth(&ptr);
          S:    A[i] = *ptr;

  Consequently, we would have assumed "ptr" to be always accessed and
  preloaded it unconditionally. However, only if "*valid != 0" we would
  execute the optimized version of the SCoP. Nevertheless, we would have
  hoisted and accessed "ptr"regardless of "*valid". This changes the
  semantic of the program as the value of "*valid" can cause a change of
  "ptr" and control if it is executed or not.

  To fix this problem we adjust the execution context of hoisted loads
  wrt. error domains. To this end we introduce an ErrorDomainCtxMap that
  maps each basic block to the error context under which it might be
  executed. Thus, to the context under which it is executed but an error
  block would have been executed to. To fill this map one traversal of
  the blocks in the SCoP suffices. During this traversal we do also
  "remove" error statements and those that are only reachable via error
  statements. This was previously done by the removeErrorBlockDomains
  function which is therefor not needed anymore.

  This fixes bug PR26683 and thereby several SPEC miscompiles.

Differential Revision: http://reviews.llvm.org/D18822

llvm-svn: 265778
2016-04-08 10:30:09 +00:00
Johannes Doerfert b47cbe1c72 [FIX] Handle multiplications in the SCEVAffinator again
If ScalarEvolution cannot look through some expression but we do, it
  might happen that a multiplication will arrive at the
  SCEVAffinator::visitMulExpr. While we could always try to improve the
  extractConstantFactor function we might still miss something, thus we
  reintroduce the code to generate multiplicative piecewise-affine
  functions as a fall-back.

llvm-svn: 265777
2016-04-08 10:27:40 +00:00
Johannes Doerfert e85d50defb Add test cases for the removal of error blocks
llvm-svn: 265776
2016-04-08 10:26:39 +00:00
Tobias Grosser 6a44b7fdf0 Add test case forgotten in r265379.
Thanks Johannes for reminding me.

llvm-svn: 265423
2016-04-05 17:40:07 +00:00
Johannes Doerfert 57c5f0b1c4 [FIX] Ensure SAI objects for exit PHIs
If all exiting blocks of a SCoP are error blocks and therefor not
  represented we will not generate accesses and consequently no SAI
  objects for exit PHIs. However, they are needed in the code generation
  to generate the merge PHIs between the original and optimized region.
  With this patch we enusre that the SAI objects for exit PHIs exist
  even if all exiting blocks turn out to be eror blocks.

  This fixes the crash reported in PR27207.

llvm-svn: 265393
2016-04-05 13:44:21 +00:00
Johannes Doerfert 1519491eaf Do not allow to complex branch conditions
Even before we build the domain the branch condition can become very
  complex, especially if we have to build the complement of a lot of
  equality constraints. With this patch we bail if the branch condition
  has a lot of basic sets and parameters.

  After this patch we now successfully compile
    External/SPEC/CINT2000/186_crafty/186_crafty
  with "-polly-process-unprofitable -polly-position=before-vectorizer".

llvm-svn: 265286
2016-04-04 07:59:41 +00:00
Johannes Doerfert 642594ae87 Exploit graph properties during domain generation
As a CFG is often structured we can simplify the steps performed during
  domain generation. When we push domain information we can utilize the
  information from a block A to build the domain of a block B, if A dominates B
  and there is no loop backede on a path from A to B. When we pull domain
  information we can use information from a block A to build the domain of a
  block B if B post-dominates A. This patch implements both ideas and thereby
  simplifies domains that were not simplified by isl. For the FINAL basic block
  in test/ScopInfo/complex-successor-structure-3.ll we used to build a universe
  set with 81 basic sets. Now it actually is represented as universe set.

  While the initial idea to utilize the graph structure depended on the
  dominator and post-dominator tree we can use the available region
  information as a coarse grained replacement. To this end we push the
  region entry domain to the region exit and pull it from the region
  entry for the region exit if applicable.

  With this patch we now successfully compile
    External/SPEC/CINT2006/400_perlbench/400_perlbench
  and
    SingleSource/Benchmarks/Adobe-C++/loop_unroll.

Differential Revision: http://reviews.llvm.org/D18450

llvm-svn: 265285
2016-04-04 07:57:39 +00:00
Johannes Doerfert d5edbd61a1 [FIX] Do not create a SCoP in the presence of infinite loops
If a loop has no exiting blocks the region covering we use during
  schedule genertion might not cover that loop properly. For now we bail
  out as we would not optimize these loops anyway.

llvm-svn: 265280
2016-04-03 23:09:06 +00:00
Tobias Grosser 151ae32dba Revert "[FIX] Do not create a SCoP in the presence of infinite loops"
This reverts commit r265260, as it caused the following 'make check-polly'
failures:

    Polly :: ScopDetect/index_from_unpredictable_loop.ll
    Polly :: ScopInfo/multiple_exiting_blocks.ll
    Polly :: ScopInfo/multiple_exiting_blocks_two_loop.ll
    Polly :: ScopInfo/schedule-const-post-dominator-walk-2.ll
    Polly :: ScopInfo/schedule-const-post-dominator-walk.ll
    Polly :: ScopInfo/switch-5.ll

llvm-svn: 265272
2016-04-03 19:36:52 +00:00
Johannes Doerfert 2075b5d2a1 [FIX] Do not create two SAI objects for exit PHIs
If an exit PHI is written and also read in the SCoP we should not create two
  SAI objects but only one. As the read is only modeled to ensure OpenMP code
  generation knows about it we can simply use the EXIT_PHI MemoryKind for both
  accesses.

llvm-svn: 265261
2016-04-03 11:16:00 +00:00
Johannes Doerfert 7dcceb82e9 [FIX] Do not create a SCoP in the presence of infinite loops
If a loop has no exiting blocks the region covering we use during
  schedule genertion might not cover that loop properly. For now we bail
  out as we would not optimize these loops anyway.

llvm-svn: 265260
2016-04-03 11:12:39 +00:00
Tobias Grosser 6deba4ea03 Revert 264782 and 264789
These caused LNT failures due to new assertions when running with
-polly-position=before-vectorizer -polly-process-unprofitable for:

FAIL: clamscan.compile_time
FAIL: cjpeg.compile_time
FAIL: consumer-jpeg.compile_time
FAIL: shapes.compile_time
FAIL: clamscan.execution_time
FAIL: cjpeg.execution_time
FAIL: consumer-jpeg.execution_time
FAIL: shapes.execution_time

The failures have been introduced by r264782, but r264789 had to be reverted
as it depended on the earlier patch.

llvm-svn: 264885
2016-03-30 18:18:31 +00:00
Johannes Doerfert a144fb148b Exploit graph properties during domain generation
As a CFG is often structured we can simplify the steps performed
  during domain generation. When we push domain information we can
  utilize the information from a block A to build the domain of a
  block B, if A dominates B. When we pull domain information we can
  use information from a block A to build the domain of a block B
  if B post-dominates A. This patch implements both ideas and thereby
  simplifies domains that were not simplified by isl. For the FINAL
  basic block in
    test/ScopInfo/complex-successor-structure-3.ll .
  we used to build a universe set with 81 basic sets. Now it actually is
  represented as universe set.

  While the initial idea to utilize the graph structure depended on the
  dominator and post-dominator tree we can use the available region
  information as a coarse grained replacement. To this end we push the
  region entry domain to the region exit and pull it from the region
  entry for the region exit.

Differential Revision: http://reviews.llvm.org/D18450

llvm-svn: 264789
2016-03-29 21:31:05 +00:00
Michael Kruse 88a2256a34 Revert "[ScopInfo] Fix domains after loops."
This reverts commit r264118. The approach is still under discussion.

llvm-svn: 264705
2016-03-29 07:50:52 +00:00
Johannes Doerfert 6462d8c1d9 Generalize the domain complexity restrictions
This patch applies the restrictions on the number of domain conjuncts
  also to the domain parts of piecewise affine expressions we generate.
  To this end the wording is change slightly. It was needed to support
  complex additions featuring zext-instructions but it also fixes PR27045.

  lnt profitable runs reports only little changes that might be noise:
  Compile Time:
    Polybench/[...]/2mm                     +4.34%
    SingleSource/[...]/stepanov_container   -2.43%
  Execution Time:
    External/[...]/186_crafty               -2.32%
    External/[...]/188_ammp                 -1.89%
    External/[...]/473_astar                -1.87%

llvm-svn: 264514
2016-03-26 16:17:00 +00:00
Johannes Doerfert 733ea34f38 [FIX] Handle accesses to "null" in MemIntrinsics
This fixes PR27035. While we now exclude MemIntrinsics from the
  polyhedral model if they would access "null" we could exploit this
  even more, e.g., remove all parameter combinations that would lead to
  the execution of this statement from the context.

llvm-svn: 264284
2016-03-24 13:50:04 +00:00
Tobias Grosser 25e8ebe29d Drop explicit -polly-delinearize parameter
Delinearization is now enabled by default and does not need to explicitly need
to be enabled in our tests.

llvm-svn: 264154
2016-03-23 13:21:02 +00:00
Tobias Grosser 898a636210 Add option to disallow modref function calls in scops.
This might be useful to evaluate the benefit of us handling modref funciton
calls. Also, a new bug that was triggered by modref function calls was
recently reported http://llvm.org/PR27035. To ensure the same issue does not
cause troubles for other people, we temporarily disable this until the bug
is resolved.

llvm-svn: 264140
2016-03-23 06:40:15 +00:00
Michael Kruse 49a59ca093 [ScopInfo] Fix domains after loops.
ISL can conclude additional conditions on parameters from restrictions
on loop variables. Such conditions persist when leaving the loop and the
loop variable is projected out. This results in a narrower domain for
exiting the loop than entering it and is logically impossible for
non-infinite loops.

We fix this by not adding a lower bound i>=0 when constructing BB
domains, but defer it to when also the upper bound it computed, which
was done redundantly even before this patch.

This reduces the number of LNT fails with -polly-process-unprofitable
-polly-position=before-vectorizer from 8 to 6.

llvm-svn: 264118
2016-03-22 23:27:42 +00:00
Tobias Grosser 5a8c052baf Invalidate scop on encountering a complex control flow
We bail out if current scop has a complex control flow as this could lead to
building of large domain conditions. This is to reduce compile time.  This
addresses r26382.

Contributed-by: Chris Jenneisch <chrisj@codeaurora.org>

Differential Revision: http://reviews.llvm.org/D18362

llvm-svn: 264105
2016-03-22 22:05:32 +00:00
Tobias Grosser 0904c69110 ScopInfo: Do not generate dependences for i1 values used in affine branches
Affine branches are fully modeled and regenerated from the polyhedral domain and
consequently do not require any input conditions to be propagated.

llvm-svn: 263678
2016-03-16 23:33:54 +00:00
Tobias Grosser 2880f10aa4 tests: Fix some spelling mistakes
llvm-svn: 262649
2016-03-03 19:51:03 +00:00
Johannes Doerfert df88023d2b [FIX] Consolidation of loads with same pointer but different access relation
This should fix PR19422.

  Thanks to Jeremy Huddleston Sequoia for reporting this.
  Thanks to Roman Gareev for his investigation and the reduced test case.

llvm-svn: 262612
2016-03-03 12:26:58 +00:00
Johannes Doerfert 066dbf3f8e Track assumptions and restrictions separatly
In order to speed up compile time and to avoid random timeouts we now
  separately track assumptions and restrictions. In this context
  assumptions describe parameter valuations we need and restrictions
  describe parameter valuations we do not allow. During AST generation
  we create a runtime check for both, whereas the one for the
  restrictions is negated before a conjunction is build.

  Except the In-Bounds assumptions we currently only track restrictions.

Differential Revision: http://reviews.llvm.org/D17247

llvm-svn: 262328
2016-03-01 13:06:28 +00:00
Johannes Doerfert a792098047 Support calls with known ModRef function behaviour
Check the ModRefBehaviour of functions in order to decide whether or
  not a call instruction might be acceptable.

Differential Revision: http://reviews.llvm.org/D5227

llvm-svn: 261866
2016-02-25 14:08:48 +00:00
Johannes Doerfert 9dd42ee7c1 Try to build alias checks even when non-affine accesses are allowed
From now on we bail only if a non-trivial alias group contains a non-affine
  access, not when we discover aliasing and non-affine accesses are allowed.

llvm-svn: 261863
2016-02-25 14:06:11 +00:00
Roman Gareev 11001e1534 Annotation of SIMD loops
Use 'mark' nodes annotate a SIMD loop during ScheduleTransformation and skip
parallelism checks.

The buildbot shows the following compile/execution time changes:

  Compile time:
    Improvements    Δ     Previous  Current  σ
    …/gesummv      -6.06% 0.2640    0.2480   0.0055
    …/gemver       -4.46% 0.4480    0.4280   0.0044
    …/covariance   -4.31% 0.8360    0.8000   0.0065
    …/adi          -3.23% 0.9920    0.9600   0.0065
    …/doitgen      -2.53% 0.9480    0.9240   0.0090
    …/3mm          -2.33% 1.0320    1.0080   0.0087

  Execution time:
    Regressions     Δ     Previous  Current  σ
    …/viterbi       1.70% 5.1840    5.2720   0.0074
    …/smallpt       1.06% 12.4920   12.6240  0.0040

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: http://reviews.llvm.org/D14491

llvm-svn: 261620
2016-02-23 09:00:13 +00:00
Johannes Doerfert cea6193b79 Support memory intrinsics
This patch adds support for memcpy, memset and memmove intrinsics. They are
  represented as one (memset) or two (memcpy, memmove) memory accesses in the
  polyhedral model. These accesses have an access range that describes the
  summarized effect of the intrinsic, i.e.,
    memset(&A[i], '$', N);
  is represented as a write access from A[i] to A[i+N].

Differential Revision: http://reviews.llvm.org/D5226

llvm-svn: 261489
2016-02-21 19:13:19 +00:00
Johannes Doerfert 4d9bb8d594 Allow all combinations of types and subscripts for memory accesses
To support non-aligned accesses we introduce a virtual element size
  for arrays that divides each access function used for this array. The
  adjustment of the access function based on the element size of the
  array was therefore moved after this virtual element size was
  determined, thus after all accesses have been created.

Differential Revision: http://reviews.llvm.org/D17246

llvm-svn: 261226
2016-02-18 16:50:12 +00:00
Johannes Doerfert 13637678b1 [FIX] LICM test case
llvm-svn: 260955
2016-02-16 12:10:42 +00:00
Johannes Doerfert 965edde695 Separate more constant factors of parameters
So far we separated constant factors from multiplications, however,
  only when they are at the outermost level of a parameter SCEV. Now,
  we also separate constant factors from the parameter SCEV if the
  outermost expression is a SCEVAddRecExpr. With the changes to the
  SCEVAffinator we can now improve the extractConstantFactor(...)
  function at will without worrying about any other code part. Thus,
  if needed we can implement a more comprehensive
  extractConstantFactor(...) function that will traverse the SCEV
  instead of looking only at the outermost level.

  Four test cases were affected. One did not change much and the other
  three were simplified.

llvm-svn: 260859
2016-02-14 22:30:56 +00:00
Johannes Doerfert 96e5471139 Separate invariant equivalence classes by type
We now distinguish invariant loads to the same memory location if they
  have different types. This will cause us to pre-load an invariant
  location once for each type that is used to access it. However, we can
  thereby avoid invalid casting, especially if an array is accessed
  though different typed/sized invariant loads.

  This basically reverts the changes in r260023 but keeps the test
  cases.

llvm-svn: 260045
2016-02-07 17:30:13 +00:00
Johannes Doerfert e708790c59 [FIX] Two "off-by-one" error in constant range usage
llvm-svn: 260031
2016-02-07 13:59:03 +00:00
Tobias Grosser 8ebdc2dd53 Make memory accesses with different element types optional
We also disable this feature by default, as there are still some issues in
combination with invariant load hoisting that slipped through my initial
testing.

llvm-svn: 260025
2016-02-07 08:48:57 +00:00
Tobias Grosser 107cd5f5f6 IslNodeBuilder: Invariant load hoisting of elements with differing sizes
Always use access-instruction pointer type to load the invariant values.
Otherwise mismatches between ScopArrayInfo element type and memory access
element type will result in invalid casts. These type mismatches are after
r259784 a lot more common and also arise with types of different size, which
have not been handled before.

Interestingly, this change actually simplifies the code, as we now have only
one code path that is always taken, rather then a standard code path for the
common case and a "fixup" code path that replaces the standard code path in
case of mismatching types.

llvm-svn: 260009
2016-02-06 21:23:39 +00:00
Michael Kruse 2e02d560aa Follow uses to create value MemoryAccesses
The previously implemented approach is to follow value definitions and
create write accesses ("push defs") while searching for uses. This
requires the same relatively validity- and requirement conditions to be
replicated at multiple locations (PHI instructions, other instructions,
uses by PHIs).

We replace this by iterating over the uses in a SCoP ("pull in
requirements"), and add writes only when at least one read has been
added. It turns out to be simpler code because each use is only iterated
over once and writes are added for the first access that reads it. We
need another iteration to identify escaping values (uses not in the
SCoP), which also makes the difference between such accesses more
obvious. As a side-effect, the order of scalar MemoryAccess can change.

Differential Revision: http://reviews.llvm.org/D15706

llvm-svn: 259987
2016-02-06 09:19:40 +00:00
Tobias Grosser d840fc7277 Support accesses with differently sized types to the same array
This allows code such as:

void multiple_types(char *Short, char *Float, char *Double) {
  for (long i = 0; i < 100; i++) {
    Short[i] = *(short *)&Short[2 * i];
    Float[i] = *(float *)&Float[4 * i];
    Double[i] = *(double *)&Double[8 * i];
  }
}

To model such code we use as canonical element type of the modeled array the
smallest element type of all original array accesses, if type allocation sizes
are multiples of each other. Otherwise, we use a newly created iN type, where N
is the gcd of the allocation size of the types used in the accesses to this
array. Accesses with types larger as the canonical element type are modeled as
multiple accesses with the smaller type.

For example the second load access is modeled as:

  { Stmt_bb2[i0] -> MemRef_Float[o0] : 4i0 <= o0 <= 3 + 4i0 }

To support code-generating these memory accesses, we introduce a new method
getAccessAddressFunction that assigns each statement instance a single memory
location, the address we load from/store to. Currently we obtain this address by
taking the lexmin of the access function. We may consider keeping track of the
memory location more explicitly in the future.

We currently do _not_ handle multi-dimensional arrays and also keep the
restriction of not supporting accesses where the offset expression is not a
multiple of the access element type size. This patch adds tests that ensure
we correctly invalidate a scop in case these accesses are found. Both types of
accesses can be handled using the very same model, but are left to be added in
the future.

We also move the initialization of the scop-context into the constructor to
ensure it is already available when invalidating the scop.

Finally, we add this as a new item to the 2.9 release notes

Reviewers: jdoerfert, Meinersbur

Differential Revision: http://reviews.llvm.org/D16878

llvm-svn: 259784
2016-02-04 13:18:42 +00:00
Tobias Grosser e2c31210b2 Revert "Support loads with differently sized types from a single array"
This reverts commit (@259587). It needs some further discussions.

llvm-svn: 259629
2016-02-03 05:53:27 +00:00
Tobias Grosser 5d3fc1ea43 Support loads with differently sized types from a single array
We support now code such as:

void multiple_types(char *Short, char *Float, char *Double) {
  for (long i = 0; i < 100; i++) {
    Short[i] = *(short *)&Short[2 * i];
    Float[i] = *(float *)&Float[4 * i];
    Double[i] = *(double *)&Double[8 * i];
  }
}

To support such code we use as element type of the modeled array the smallest
element type of all original array accesses. Accesses with larger types are
modeled as multiple accesses with the smaller type.

For example the second load access is modeled as:

  { Stmt_bb2[i0] -> MemRef_Float[o0] : 4i0 <= o0 <= 3 + 4i0 }

To support jscop-rewritable memory accesses we need each statement instance to
only be assigned a single memory location, which will be the address at which
we load the value. Currently we obtain this address by taking the lexmin of
the access function. We may consider keeping track of the memory location more
explicitly in the future.

llvm-svn: 259587
2016-02-02 22:05:29 +00:00
Tobias Grosser c2fd8b411d ScopInfo: Correct schedule construction
For schedule generation we assumed that the reverse post order traversal used by
the domain generation is sufficient, however it is not. Once a loop is
discovered, we have to completely traverse it, before we can generate the
schedule for any block/region that is only reachable through a loop exiting
block.

To this end, we add a "loop stack" that will keep track of loops we
discovered during the traversal but have not yet traversed completely.
We will never visit a basic block (or region) outside the most recent
(thus smallest) loop in the loop stack but instead queue such blocks
(or regions) in a waiting list. If the waiting list is not empty and
(might) contain blocks from the most recent loop in the loop stack the
next block/region to visit is drawn from there, otherwise from the
reverse post order iterator.

We exploit the new property of loops being always completed before additional
loops are processed, by removing the LoopSchedules map and instead keep all
information in LoopStack. This clarifies that we indeed always only keep a
stack of in-process loops, but will never keep incomplete schedules for an
arbitrary set of loops. As a result, we can simplify some of the existing code.

This patch also adds some more documentation about how our schedule construction
works.

This fixes http://llvm.org/PR25879

This patch is an modified version of Johannes Doerfert's initial fix.

Differential Revision: http://reviews.llvm.org/D15679

llvm-svn: 259354
2016-02-01 11:54:13 +00:00
Michael Kruse fd46308de4 ScopInfo: Never add read accesses for synthesizable values
Before adding a MK_Value READ MemoryAccess, check whether the read is
necessary or synthesizable. Synthesizable values are later generated by
the SCEVExpander and therefore do not need to be transferred
explicitly. This can happen because the check for synthesizability has
presumbly been forgotten in the case where a phi's incoming value has
been defined in a different statement.

Differential Revision: http://reviews.llvm.org/D15687

llvm-svn: 258998
2016-01-27 22:51:56 +00:00
Michael Kruse ee6a4fc680 Unique phi write accesses
Ensure that there is at most one phi write access per PHINode and
ScopStmt. In particular, this would be possible for non-affine
subregions with multiple exiting blocks. We replace multiple MAY_WRITE
accesses by one MUST_WRITE access. The written value is constructed
using a PHINode of all exiting blocks. The interpretation of the PHI
WRITE's "accessed value" changed from the incoming value to the PHI like
for PHI READs since there is no unique incoming value.

Because region simplification shuffles around PHI nodes -- particularly
with exit node PHIs -- the PHINodes at analysis time does not always
exist anymore in the code generation pass. We instead remember the
incoming block/value pair in the MemoryAccess.

Differential Revision: http://reviews.llvm.org/D15681

llvm-svn: 258809
2016-01-26 13:33:27 +00:00
Michael Kruse 436db620e7 Unique value write accesses
Ensure there is at most one write access per definition of an
llvm::Value. Keep track of already created value write access by using
a (dense) map.

Replace addValueWriteAccess by ensureValueStore which can be uses more
liberally without worrying to add redundant accesses. It will be used,
e.g. in a logical correspondant for value reads -- ensureValueReload --
to ensure that the expected definition has been written when loading it.

Differential Revision: http://reviews.llvm.org/D15483

llvm-svn: 258807
2016-01-26 13:33:10 +00:00
Johannes Doerfert 6f50c29ab2 [FIX] Domain generation error due to loops in non-affine regions
llvm-svn: 258803
2016-01-26 11:03:25 +00:00
Johannes Doerfert 432658d7b8 [FIX] Build correct domain for non-affine region SCoPs
llvm-svn: 258802
2016-01-26 11:01:41 +00:00
Tobias Grosser b3a9538e95 Remove irreducible control flow from test case
The test case we look at does not necessarily require irreducible control flow,
but a normal loop is sufficient to create a non-affine region containing more
than one basic block that dominates the exit node. We replace this irreducible
control flow with a normal loop for the following reasons:

  1) This is easier to understand
  2) We will subsequently commit a patch that ensures Polly does not process
     irreducible control flow.

Within non-affine regions, we could possibly handle irreducible control flow.

llvm-svn: 258496
2016-01-22 09:33:33 +00:00
Michael Kruse 959a8dc39f Update to ISL 0.16.1
llvm-svn: 257898
2016-01-15 15:54:45 +00:00
Michael Kruse 5a9a65e43f Prepare unit tests for update to ISL 0.16
ISL 0.16 will change how sets are printed which breaks 117 unit tests
that text-compare printed sets. This patch re-formats most of these unit
tests using a script and small manual editing on top of that. When
actually updating ISL, most work is done by just re-running the script
to adapt to the changed output.

Some tests that compare IR and tests with single CHECK-lines that can be
easily updated manually are not included here.

The re-format script will also be committed afterwards. The per-test
formatter invocation command lines options will not be added in the near
future because it is ad hoc and would overwrite the manual edits.
Ideally it also shouldn't be required anymore because ISL's set printing
has become more stable in 0.16.

Differential Revision: http://reviews.llvm.org/D16095

llvm-svn: 257851
2016-01-15 00:48:42 +00:00
Roman Gareev 10595a1739 Call assumeNoOutOfBound only in updateDimensionality
Call assumeNoOutOfBound only in updateDimensionality to process situations
when new dimensions are added and new bounds checks are required.

Contributed-by: Tobias Grosser, Gareev Roman
llvm-svn: 257170
2016-01-08 14:01:59 +00:00
Johannes Doerfert 30e2307f61 [FIX] Schedule generation for block exiting multiple loops.
This fixes bug PR25604.

llvm-svn: 256125
2015-12-20 17:12:22 +00:00
Tobias Grosser 75dc40c3be ScopInfo: Bail out in case of complex branch structures
Scops that contain many complex branches are likely to result in complex domain
conditions that consist of a large (> 100) number of conjucts.  Transforming
such domains is expensive and unlikely to result in efficient code.  To avoid
long compile times we detect this case and skip such scops. In the future we may
improve this by either using non-affine subregions to hide such complex
condition structures or by exploiting in certain cases properties (e.g.,
dominance) that allow us to construct the domains of a scop in a way that
results in a smaller number improving conjuncts.

Example of a code that results in complex iteration spaces:

      loop.header
     /    |    \ \
   A0    A2    A4 \
     \  /  \  /    \
      A1    A3      \
     /  \  /  \     |
   B0    B2    B4   |
     \  /  \  /     |
      B1    B3      ^
     /  \  /  \     |
   C0    C2    C4   |
     \  /  \  /    /
      C1    C3    /
       \   /     /
    loop backedge

llvm-svn: 256123
2015-12-20 13:31:48 +00:00
Tobias Grosser f4f6870ff2 Revert "Always treat scalar writes as MUST_WRITEs"
This reverts commit r255471.

Johannes raised in the post-commit review of r255471 the concern that PHI
writes in non-affine regions with two exiting blocks are not really MUST_WRITE,
but we just know that at least one out of the set of all possible PHI writes
will be executed. Modeling all PHI nodes as MUST_WRITEs is probably save, but
adding the needed documentation for such a special case is probably not worth
the effort. Michael will be proposing a new patch that ensures only a single
PHI_WRITE is created for non-affine regions, which - besides other benefits -
should also allow us to use a single well-defined MUST_WRITE for such PHI
writes.

(This is not a full revert, but the condition and documentation have been
slightly extended)

llvm-svn: 255503
2015-12-14 15:05:37 +00:00
Michael Kruse e0d135c536 Add unit test for r255473
Check that memory accesses in non-affine regions that are always executed are
MUST_WRITE.

llvm-svn: 255500
2015-12-14 14:53:30 +00:00
Michael Kruse b06e3029d1 Always treat scalar writes as MUST_WRITEs
LLVM's IR guarantees that a value definition occurs before any use, and
also the value of a PHI must be one of the incoming values, "written"
in one of the incoming blocks. Hence, such writes are never conditional
in the context of a non-affine subregion.

llvm-svn: 255471
2015-12-13 22:10:32 +00:00
Tobias Grosser 2d3d4ec860 executeScopConditionally: Introduce special exiting block
When introducing separate control flow for the original and optimized code we
introduce now a special 'ExitingBlock':

      \   /
    EnteringBB
        |
    SplitBlock---------\
   _____|_____         |
  /  EntryBB  \    StartBlock
  |  (region) |        |
  \_ExitingBB_/   ExitingBlock
        |              |
    MergeBlock---------/
        |
      ExitBB
      /    \

This 'ExitingBlock' contains code such as the final_reloads for scalars, which
previously were just added to whichever statement/loop_exit/branch-merge block
had been generated last. Having an explicit basic block makes it easier to
find these constructs when looking at the CFG.

llvm-svn: 255107
2015-12-09 11:38:22 +00:00
Tobias Grosser 2f8e43d677 ScopInfo: Add support for delinearizing fortran arrays
gfortran (and fortran in general?) does not compute the address of an array
element directly from the array sizes (e.g., %s0, %s1), but takes first the
maximum of the sizes and 0 (e.g., max(0, %s0)) before multiplying the resulting
value with the per-dimension array subscript expressions. To successfully
delinearize index expressions as we see them in fortran, we first filter 'smax'
expressions out of the SCEV expression, use them to guess array size parameters
and only then continue with the existing delinearization.

llvm-svn: 253995
2015-11-24 17:06:38 +00:00
Tobias Grosser 9737c7b431 ScopInfo: Remove domains of error blocks (and blocks they dominate) early on
Trying to build up access functions for any of these blocks is likely to fail,
as error blocks may contain invalid/non-representable instructions, and blocks
dominated by error blocks may reference such instructions, which wil also cause
failures. As all of these blocks are anyhow assumed to not be executed, we can
just remove them early on.

This fixes http://llvm.org/PR25596

llvm-svn: 253818
2015-11-22 11:06:51 +00:00
Tobias Grosser 020fa09a3c Remove -polly-code-generator=isl from many test cases
This is the default since a long time. Setting it again does not add value
in any of these test cases.

llvm-svn: 253800
2015-11-21 23:05:48 +00:00
Johannes Doerfert dec27df588 [FIX] Get the correct loop that surrounds a region
llvm-svn: 253788
2015-11-21 16:56:13 +00:00
Johannes Doerfert 55b3d8b831 Consistenly use getTypeAllocSize for size estimation.
Only when we check for wrapping we want to use the store size, for all
  other cases we use the alloc size now.

Suggested by: Tobias Grosser <tobias@grosser.es>

llvm-svn: 252941
2015-11-12 20:15:08 +00:00
Johannes Doerfert 2af10e2eed Use parameter constraints provided via llvm.assume
If an llvm.assume dominates the SCoP entry block and the assumed condition
  can be expressed as an affine inequality we will now add it to the context.

Differential Revision: http://reviews.llvm.org/D14413

llvm-svn: 252851
2015-11-12 03:25:01 +00:00
Johannes Doerfert d84493e52e Emit remarks for taken assumptions
Differential Revision: http://reviews.llvm.org/D14412

llvm-svn: 252848
2015-11-12 02:33:38 +00:00
Johannes Doerfert 0cf4e0aa42 Emit remark about aliasing pointers
llvm-svn: 252847
2015-11-12 02:32:51 +00:00
Johannes Doerfert 48fe86f1ff Emit SCoP source location as remark during ScopInfo
This removes a similar feature from ScopDetection, though with
  -polly-report that feature present twice anyway.

llvm-svn: 252846
2015-11-12 02:32:32 +00:00