Commit Graph

973 Commits

Author SHA1 Message Date
Johannes Doerfert bda814350a Allow to disable unsigned operations (zext, icmp ugt, ...)
Unsigned operations are often useful to support but the heuristics are
not yet tuned. This options allows to disable them if necessary.

llvm-svn: 288521
2016-12-02 17:55:41 +00:00
Johannes Doerfert a94ae1aede Do not allow multiple possibly aliasing ptrs in an expression
Relational comparisons should not involve multiple potentially
  aliasing pointers. Similarly this should hold for switch conditions
  and the two conditions involved in equality comparisons (separately!).
  This is a heuristic based on the C semantics that does only allow such
  operations when the base pointers do point into the same object.
  Since this makes aliasing likely we will bail out early instead of
  producing a probably failing runtime check.

llvm-svn: 288516
2016-12-02 17:49:52 +00:00
Tobias Grosser bedef00e2c [ScopInfo] Fold constant coefficients in array dimensions to the right
This allows us to delinearize code such as the one below, where the array
sizes are A[][2 * n] as there are n times two elements in the innermost
dimension. Alternatively, we could try to generate another dimension for the
struct in the innermost dimension, but as the struct has constant size,
recovering this dimension is easy.

   struct com {
     double Real;
     double Img;
   };

   void foo(long n, struct com A[][n]) {
     for (long i = 0; i < 100; i++)
       for (long j = 0; j < 1000; j++)
         A[i][j].Real += A[i][j].Img;
   }

   int main() {
     struct com A[100][1000];
     foo(1000, A);

llvm-svn: 288489
2016-12-02 08:10:56 +00:00
Tobias Grosser 491b799a4d [ScopInfo] Separate construction and finalization of memory accesses [NFC]
After having built memory accesses we perform some additional transformations
on them to increase the chances that our delinearization guesses the right
shape. Only after these transformations, we take the assumptions that the
array shape we predict is such that no out-of-bounds memory accesses arise.

Before this change, the construction of the memory access, the access folding
that improves the represenation for certain parametric subscripts, and taking
the assumption was all done right after a memory access was created. In this
change we split this now into three separate iterations over all memory
accesses. This means only after all memory accesses have been built, we start
to canonicalize accesses, and to take assumptions. This split prepares for
future canonicalizations that must consider all memory accesses for deriving
additional beneficial transformations.

llvm-svn: 288479
2016-12-02 05:21:22 +00:00
Johannes Doerfert b1d6608430 [NFC] Check for feasibility prior to the profitability check
Feasibility is checked late on its own but early it is hidden behind
the "PollyProcessUnprofitable" guard. This change will make sure we opt
out early if the runtime context is infeasible anyway.

llvm-svn: 288329
2016-12-01 11:12:14 +00:00
Michael Kruse 11c5e07925 canSynthesize: Remove unused argument LI. NFC.
The helper function polly::canSynthesize() does not directly use the LoopInfo
analysis, hence remove it from its argument list.

llvm-svn: 288144
2016-11-29 15:11:04 +00:00
Tobias Grosser 278f9e7d27 [ScopInfo] Use SCEVRewriteVisitor to simplify SCEVSensitiveParameterRewriter [NFC]
llvm-svn: 287984
2016-11-26 17:58:40 +00:00
Tobias Grosser b45ae5601b [ScopDetect] Expand statistics of the detected scops
We now collect:

  Number of total loops
  Number of loops in scops
  Number of scops
  Number of scops with maximal loop depth 1
  Number of scops with maximal loop depth 2
  Number of scops with maximal loop depth 3
  Number of scops with maximal loop depth 4
  Number of scops with maximal loop depth 5
  Number of scops with maximal loop depth 6 and larger
  Number of loops in scops (profitable scops only)
  Number of scops (profitable scops only)
  Number of scops with maximal loop depth 1 (profitable scops only)
  Number of scops with maximal loop depth 2 (profitable scops only)
  Number of scops with maximal loop depth 3 (profitable scops only)
  Number of scops with maximal loop depth 4 (profitable scops only)
  Number of scops with maximal loop depth 5 (profitable scops only)
  Number of scops with maximal loop depth 6 and larger (profitable scops only)

These statistics are certainly completely accurate as we might drop scops
when building up their polyhedral representation, but they should give a good
indication of the number of scops we detect.

llvm-svn: 287973
2016-11-26 07:37:46 +00:00
Tobias Grosser 5c00b0dc74 [ScopDetectionDiagnostic] Collect statistics for each diagnostic type
Our original statistics were added before we introduced a more fine-grained
diagnostic system, but the granularity of our statistics has never been
increased accordingly. This change introduces now one statistic counter per
diagnostic to enable us to collect fine-grained statistics about who certain
scops are not detected. In case coarser grained statistics are needed, the
user is expected to combine counters manually.

llvm-svn: 287968
2016-11-26 05:53:09 +00:00
Tobias Grosser c64269ea1b [ScopDectionDiagnostic] Use scoped enums instead three letter prefix [NFC]
This improves readability of the code.

llvm-svn: 287963
2016-11-26 03:44:31 +00:00
Hongbin Zheng 4ff1c3983d Fix typo.
llvm-svn: 287819
2016-11-23 21:59:33 +00:00
Hongbin Zheng a8fb73fc0b Split ScopInfo::addScopStmt into two versions. NFC
One for adding statement for region, another one for BB

llvm-svn: 287566
2016-11-21 20:09:40 +00:00
Tobias Grosser 1f0236d8e5 [ScopDetect] Use mayReadOrWriteMemory to shorten condition
llvm-svn: 287525
2016-11-21 09:07:30 +00:00
Tobias Grosser b94e9b31d0 [ScopDetect] Remove unnecessary namespace qualifier
llvm-svn: 287524
2016-11-21 09:04:45 +00:00
Johannes Doerfert 81aa6e882f [NFC] Adjust naming scheme of statistic variables
Suggested-by: Tobias Grosser <tobias@grosser.es>
llvm-svn: 287347
2016-11-18 14:37:08 +00:00
Johannes Doerfert 6cd59e9076 Probably overwritten loads should not be considered hoistable
Do not assume a load to be hoistable/invariant if the pointer is used by
another instruction in the SCoP that might write to memory and that is
always executed.

llvm-svn: 287272
2016-11-17 22:25:17 +00:00
Johannes Doerfert c97654681e [FIX] Do not try to hoist memory intrinsic
Since we do not necessarily treat memory intrinsics as non-affine
anymore, we have to check for them explicitly before we try to hoist an
access.

llvm-svn: 287270
2016-11-17 22:11:56 +00:00
Johannes Doerfert b3265a3612 [NFC] Skip over trivial assumptions
Filter trivial assumptions, thus assume { : } or restrict { : 0 = 1 },
as they clutter the user output as well as the statistics.

llvm-svn: 287269
2016-11-17 22:08:40 +00:00
Johannes Doerfert cfadb2293f [DBG] Collect statistics about statically infeasible SCoPs
llvm-svn: 287263
2016-11-17 21:44:47 +00:00
Johannes Doerfert cd195326bf [DBG] Collect statistics about taken assumptions
llvm-svn: 287261
2016-11-17 21:41:08 +00:00
Tobias Grosser 26be8e99b6 [ScopBuilder] Drop unnecessary namespace identifiers [NFC]
llvm-svn: 286781
2016-11-13 21:28:13 +00:00
Tobias Grosser 70d2709b1a [ScopDetect] Conservatively handle inaccessible memory alias attributes
Commit r286294 introduced support for inaccessiblememonly and
inaccessiblemem_or_argmemonly attributes to BasicAA, which we need to
support to avoid undefined behavior. This change just refuses all calls
which are annotated with these attributes, which is conservatively correct.
In the future we may consider to model and support such function calls
in Polly.

llvm-svn: 286771
2016-11-13 19:27:24 +00:00
Tobias Grosser a2f8fa33aa [ScopDetect] Evaluate and verify branches at branch condition, not icmp
The validity of a branch condition must be verified at the location of the
branch (the branch instruction), not the location of the icmp that is
used in the branch instruction. When verifying at the wrong location, we
may accept an icmp that is defined within a loop which itself dominates, but
does not contain the branch instruction. Such loops cannot be modeled as
we only introduce domain dimensions for surrounding loops. To address this
problem we change the scop detection to evaluate and verify SCEV expressions at
the right location.

This issue has been around since at least r179148 "scop detection: properly
instantiate SCEVs to the place where they are used", where we explicitly
set the scope to the wrong location. Before this commit the scope
was not explicitly set, which probably also resulted in the scope around the
ICmp to be choosen.

This resolves http://llvm.org/PR30989

Reported-by: Eli Friedman <efriedma@codeaurora.org>
llvm-svn: 286769
2016-11-13 19:27:04 +00:00
Tobias Grosser f67433abd9 SCEVAffinator: pass parameter-only set to addRestriction if BB=nullptr
Assumptions can either be added for a given basic block, in which case the set
describing the assumptions is expected to match the dimensions of its domain.
In case no basic block is provided a parameter-only set is expected to describe
the assumption.

The piecewise expressions that are generated by the SCEVAffinator sometimes
have a zero-dimensional domain (e.g., [p] -> { [] : p <= -129 or p >= 128 }),
which looks similar to a parameter-only domain, but is still a set domain.

This change adds an assert that checks that we always pass parameter domains to
addAssumptions if BB is empty to make mismatches here fail early.

We also change visitTruncExpr to always convert to parameter sets, if BB is
null. This change resolves http://llvm.org/PR30941

Another alternative to this change would have been to inspect all code to make
sure we directly generate in the SCEV affinator parameter sets in case of empty
domains. However, this would likely complicate the code which combines parameter
and non-parameter domains when constructing a statement domain. We might still
consider doing this at some point, but as this likely requires several non-local
changes this should probably be done as a separate refactoring.

Reported-by: Eli Friedman <efriedma@codeaurora.org>
llvm-svn: 286444
2016-11-10 11:44:10 +00:00
Tobias Grosser bbaeda3fe5 Do not allow switch statements in loop latches
In r248701 "Allow switch instructions in SCoPs" support for switch statements
has been introduced, but support for switch statements in loop latches was
incomplete. This change completely disables switch statements in loop latches.

The original commit changed addLoopBoundsToHeaderDomain to support non-branch
terminator instructions, but this change was incorrect: it added a check for
BI != null to the if-branch of a condition, but BI was used in the else branch
es well. As a result, when a non-branch terminator instruction is encounted a
nullptr dereference is triggered. Due to missing test coverage, this bug was
overlooked.

r249273 "[FIX] Approximate non-affine loops correctly" added code to disallow
switch statements for non-affine loops, if they appear in either a loop latch
or a loop exit. We adapt this code to now prohibit switch statements in
loop latches even if the control condition is affine.

We could possibly add support for switch statements in loop latches, but such
support should be evaluated and tested separately.

This fixes llvm.org/PR30952

Reported-by: Eli Friedman <efriedma@codeaurora.org>
llvm-svn: 286426
2016-11-10 05:20:29 +00:00
Tobias Grosser eba86a1208 ScopInfo: only run code needed for ASSERT in DEBUG mode
Suggested-by: Johannes Doerfert
llvm-svn: 286338
2016-11-09 04:24:49 +00:00
Tobias Grosser 744740ad91 ScopInfo: Ensure copy statement memory accesses are correct
Add asserts that verify that the memory accesses of a new copy statement
are defined for all domain instances the copy statement is defined for.

llvm-svn: 286047
2016-11-05 21:02:43 +00:00
Michael Kruse e1dc387731 [ScopInfo] Fix isl object leak.
Fix return from function without releasing isl objects, which was introduced
in r269055.

llvm-svn: 285924
2016-11-03 15:19:41 +00:00
Eli Friedman b9c6f01a81 [ScopInfo] Make memset etc. affine where possible.
We don't actually check whether a MemoryAccess is affine in very many
places, but one important one is in checks for aliasing.

Differential Revision: https://reviews.llvm.org/D25706

llvm-svn: 285746
2016-11-01 20:53:11 +00:00
Tobias Grosser ebb626e4b7 [ScopDetect] Use SCEVRewriteVisitor to simplify SCEVRemoveSMax rewriter
ScalarEvolution got at some pointer a SCEVRewriteVisitor. Use it to simplify
our SCEVRemoveSMax visitor.

llvm-svn: 285491
2016-10-29 06:19:34 +00:00
Michael Kruse 426e6f71f8 [ScopInfo] Fix: use raw source pointer.
When adding an llvm.memcpy instruction to AliasSetTracker, it uses the raw
source and target pointers which preserve bitcasts.
MemAccInst::getPointerOperand() also returns the raw target pointers, but
Scop::buildAliasGroups() did not for the source pointer. This lead to mismatches
between AliasSetTracker and ScopInfo on which pointer to use.

Fixed by also using raw pointers in Scop::buildAliasGroups().

llvm-svn: 285071
2016-10-25 13:37:43 +00:00
Mandeep Singh Grang 48e7add80f [polly] Change SmallPtrSet which are being iterated into SmallSetVector
Summary: Otherwise the lack of an iteration order results in non-determinism in codegen.

Reviewers: _jdoerfert, zinob, grosser

Tags: #polly

Differential Revision: https://reviews.llvm.org/D25863

llvm-svn: 284845
2016-10-21 17:29:10 +00:00
Michael Kruse 6a19d592da [ScopDetect] Depend transitively on ScalarEvolution.
ScopDetection might be queried by -dot-scops or -view-scops passes for which
it accesses ScalarEvolution.

llvm-svn: 284385
2016-10-17 13:29:20 +00:00
Michael Kruse fa53c86dc1 [ScopInfo/CodeGen] ExitPHI reads are implicit.
Under some conditions MK_Value read accessed where converted to MK_ExitPHI read
accessed. This is unexpected because MK_ExitPHI read accesses are implicit after
the scop execution. This behaviour was introduced in r265261, which fixed a
failed assertion/crash in CodeGen.

Instead, we fix this failure in CodeGen itself. createExitPHINodeMerges(),
despite its name, also handles accesses of kind MK_Value, only to skip them
because they access values that are usually not PHI nodes in the SCoP region's
exit block. Except in the situation observed in r265261.

Do not convert value accessed to ExitPHI accesses and do not handle
value accesses like ExitPHI accessed in CodeGen anymore.

llvm-svn: 284023
2016-10-12 16:31:09 +00:00
Michael Kruse c9edc2ee8d [DepInfo] Print -debug output outside of max-operations scope.
ISL tries to simplify the polyhedral operations before printing its objects.
This increases the operations counter and therefore can contribute to hitting
the operations limit. Therefore the result could be different when -debug output
is enabled, making debugging harder.

llvm-svn: 283745
2016-10-10 11:45:59 +00:00
Michael Kruse 8bfba1ff46 [Support/DepInfo] Introduce IslMaxOperationsGuard and make DepInfo use it. NFC.
IslMaxOperationsGuard defines a scope where ISL may abort operations because if
it takes too many operations. Replace the call to the raw ISL interface by a
use of the guard.

IslMaxOperationsGuard provides a uniform way to define a maximal computation
time for a code region in C++ using RAII.

llvm-svn: 283744
2016-10-10 11:45:54 +00:00
Mehdi Amini 732afdd09a Turn cl::values() (for enum) from a vararg function to using C++ variadic template
The core of the change is supposed to be NFC, however it also fixes
what I believe was an undefined behavior when calling:

 va_start(ValueArgs, Desc);

with Desc being a StringRef.

Differential Revision: https://reviews.llvm.org/D25342

llvm-svn: 283671
2016-10-08 19:41:06 +00:00
Michael Kruse 6ab4476835 [ScopInfo] Add -polly-unprofitable-scalar-accs option.
With this option one can disable the heuristic that assumes that statements with
a scalar write access cannot be profitably optimized. Such a statement instances
necessarily have WAW-dependences to itself. With DeLICM scalar accesses can be
changed to array accesses, which can avoid these WAW-dependence.

llvm-svn: 283233
2016-10-04 17:33:39 +00:00
Michael Kruse ca7cbcca37 [ScopInfo] Scalar access do not have indirect base pointers.
ScopArrayInfo used to determine base pointer origins by looking up whether the
base pointer is a load. The "base pointer" for scalar accesses is the
llvm::Value being accessed. This is only a symbolic base pointer, it
represents the alloca variable (.s2a or .phiops) generated for it at code
generation.

This patch disables determining base pointer origin for scalars.
A test case where this caused a crash will be added in the next commit. In that
test SAI tried to get the origin base pointer that was only declared later,
therefore not existing. This is probably only possible for scalars used in
PHINode incoming blocks.

llvm-svn: 283232
2016-10-04 17:33:34 +00:00
Tobias Grosser 349d1c3368 [ScopDetection] Remove redundant checks for endless loops
Summary:
Both `canUseISLTripCount()` and `addOverApproximatedRegion()` contained checks
to reject endless loops which are now removed and replaced by a single check
in `isValidLoop()`.

For reporting such loops the `ReportLoopOverlapWithNonAffineSubRegion` is
renamed to `ReportLoopHasNoExit`. The test case
`ReportLoopOverlapWithNonAffineSubRegion.ll` is adapted and renamed as well.

The schedule generation in `buildSchedule()` is based on the following
assumption:

Given some block B that is contained in a loop L and a SESE region R,
we assume that L is contained in R or the other way around.

However, this assumption is broken in the presence of endless loops that are
nested inside other loops. Therefore, in order to prevent erroneous behavior
in `buildSchedule()`, r265280 introduced a corresponding check in
`canUseISLTripCount()` to reject endless loops. Unfortunately, it was possible
to bypass this check with -polly-allow-nonaffine-loops which was fixed by adding
another check to reject endless loops in `allowOverApproximatedRegion()` in
r273905. Hence there existed two separate locations that handled this case.

Thank you Johannes Doerfert for helping to provide the above background
information.

Reviewers: Meinersbur, grosser

Subscribers: _jdoerfert, pollydev

Differential Revision: https://reviews.llvm.org/D24560

Contributed-by: Matthias Reisinger <d412vv1n@gmail.com>
llvm-svn: 281987
2016-09-20 17:05:22 +00:00
Tobias Grosser fe74a7a1f5 GPGPU: Detect read-only scalar arrays ...
and pass these by value rather than by reference.

llvm-svn: 281837
2016-09-17 19:22:18 +00:00
Roman Gareev b3224adfb6 Perform copying to created arrays according to the packing transformation
This is the fourth patch to apply the BLIS matmul optimization pattern on matmul
kernels (http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf).
BLIS implements gemm as three nested loops around a macro-kernel, plus two
packing routines. The macro-kernel is implemented in terms of two additional
loops around a micro-kernel. The micro-kernel is a loop around a rank-1
(i.e., outer product) update. In this change we perform copying to created
arrays, which is the last step to implement the packing transformation.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D23260

llvm-svn: 281441
2016-09-14 06:26:09 +00:00
Michael Kruse 19c9d99f45 Use value directly instead of reference. NFC.
The alias to the array element is read-only and a primitive type (pointer),
therefore use the value directly instead of a reference to it.

llvm-svn: 281311
2016-09-13 09:56:05 +00:00
Roman Gareev f5aff70405 Store the size of the outermost dimension in case of newly created arrays that require memory allocation.
We do not need the size of the outermost dimension in most cases, but if we
allocate memory for newly created arrays, that size is needed.

Reviewed-by: Michael Kruse <llvm@meinersbur.de>

Differential Revision: https://reviews.llvm.org/D23991

llvm-svn: 281234
2016-09-12 17:08:31 +00:00
Tobias Grosser 55a7af7da5 ScopInfo: Make clear that no double-free problem exists
When running the clang static analyser to check for memory issues, this code
originally showed a double free, as the analyser was unable to understand that
isl_set_free always returns NULL and consequently later uses of the isl object
we just freed will never be reached. Without this knowledge, the analyser has
to issue a warning.

We refactor the code to make it clear that for empty maps the current loop
iteration is aborted.

llvm-svn: 280940
2016-09-08 14:08:07 +00:00
Tobias Grosser b316dc166f ScopDetection: Make sure we do not accidentally divide by zero
This code path is likely never triggered, but by still handling this case
locally we avoid warnings in clangs static analyzer.

llvm-svn: 280939
2016-09-08 14:08:05 +00:00
Tobias Grosser adfc971820 DependenceInfo: Make clear that no double-free problem exists
When running the clang static analyser to check for memory issues, this code
originally showed a double free, as the analyser was unable to understand that
isl_union_map_free always returns NULL and consequently later uses of the isl
object we just freed will never be reached. Without this knowledge, the analyser
has to issue a warning.

We refactor the code to make it clear that for empty maps the current loop
iteration is aborted.

llvm-svn: 280938
2016-09-08 14:08:01 +00:00
Tobias Grosser 2a526feec9 ScopInfo: Add missing __isl_take annotation
llvm-svn: 280923
2016-09-08 11:18:56 +00:00
Tobias Grosser 8d4cb1a060 ScopInfo: Do not derive assumptions from all GEP pointer instructions
... but instead rely on the assumptions that we derive for load/store
instructions.

Before we were able to delinearize arrays, we used GEP pointer instructions
to derive information about the likely range of induction variables, which
gave us more freedom during loop scheduling. Today, this is not needed
any more as we delinearize multi-dimensional memory accesses and as part
of this process also "assume" that all accesses to these arrays remain
inbounds. The old derive-assumptions-from-GEP code has consequently become
mostly redundant. We drop it both to clean up our code, but also to improve
compile time. This change reduces the scop construction time for 3mm in
no-asserts mode on my machine from 48 to 37 ms.

llvm-svn: 280601
2016-09-03 21:55:25 +00:00
Tobias Grosser 66c6506aac Dependences: Only create flat StmtSchedule in presence of reductions
Without reductions we do not need a flat union_map schedule describing
the computation we want to perform, but can work purely on the schedule
tree. This reduces the dependence computation and scheduling time from 33ms
to 25ms. Another 30% reduction.

llvm-svn: 280558
2016-09-02 23:40:15 +00:00