Originally, we intersected the iteration space with the AssumedContext before
computing the minimal/maximal memory offset in our run-time alias checks. With
this patch we drop this intersection as the AssumedContext can - for larger or
more complex scops - become very complicated (contain many disjuncts). When
intersecting an object with many disjuncts with other objects, the number of
disjuncts in these other objects also increases quickly. As a result, the
compile time is unnecessarily increased. This patch now drops the intersection
with the assumed context to ensure we do not pay unnecessary compile time
costs.
With this patch we see -3.17% reduction in compile time for 3mm with default
flags and -17.87% when compiling 3mm with -DPOLYBENCH_USE_C99_PROTO flag. We
did not observe any regressions in LNT.
Contributed-by: Pratik Bhatu <cs12b1010@iith.ac.in>
Reviewers: grosser
Differential Revision: http://reviews.llvm.org/D12198
llvm-svn: 245617
To avoid multiple exits and the resulting complicated conditions when
creating a SCoP we now use the single hasFeasibleRuntimeContext()
check to decide if a SCoP should be dismissed right after
construction. If building runtime checks failed the assumed context is
made infeasible, hence the optimized version will never be executed
and the SCoP can be dismissed.
llvm-svn: 245593
If nothing is executed we can bail out early. Otherwise we can use the
constraints that ensure at least one statement is executed for
simplification.
llvm-svn: 245585
We will record if a SAI is the base of another SAI or derived from it.
This will allow to reason about indirect base pointers later on and
allows a clearer picture of indirection also in the SCoP dump.
llvm-svn: 245584
Register tiling in Polly is for now just an additional level of tiling which
is fully unrolled. It is disabled by default. To make this useful for more than
experiments, we still need a cost function as well as possibly further
optimizations that teach LLVM to actually put some of the values we got into
scalar registers.
llvm-svn: 245564
By default we only use one level of tiling for loops, but in general tiling
for multiple levels is trivial for us. Hence, we add a set of options that
allow people to play with a second level of tiling. If this is profitable for
some cases we can work on heuristics that allow us to identify these cases
and use two-level tiling for them.
llvm-svn: 245563
Instead of generating code for an empty assumed context we bail out
early. As the number of assumptions we generate increases this becomes
more and more important. Additionally, this change will allow us to
hide internal contexts that are only used in runtime checks e.g., a
boundary context with constraints not suited for simplifications.
llvm-svn: 245540
To make alias scope metadata generation work in OpenMP mode we now provide
the ScopAnnotator with information about the base pointer rewrite that happens
when passing arrays into the OpenMP subfunction.
llvm-svn: 245451
Polly uses 'prevectorization' to enable outer loop vectorization. When
vectorizing an outer loop, we strip-mine <number-of-prevec-dims> loop
iterations which are than interchanged to the innermost level such that LLVM's
inner loop vectorizer (or Polly's simple vectorizer) can easily vectorize this
loop. The number of loop iterations to strip-mine is now configurable with the
option -polly-prevect-width=<number-of-prevec-dims>.
This is mostly a debugging option. We should probably add a heuristic that
derives the number of prevectorization dimensions from the target data and
the data types used.
llvm-svn: 245424
This test was written to check the workings of IndependentBlocks on
arrays which doesn't do such transformations anymore. The test itself
is still useful to check that the region is rejected as SCoP.
llvm-svn: 245353
This patch changes Polly to compute the data-dependences on the schedule tree
instead of a flat schedule representation. Calculating dependences directly on
the schedule tree results in some good compile-time improvements (adi : -23.35%,
3mm : -9.57%), as the structure of the schedule can be exploited for increased
efficiency.
Earlier experiments with schedule tree based dependence analysis in Polly showed
some compile-time regressions. These regressions arose due to the schedule tree
based dependence analysis not taking into account the domain constraints of the
schedule tree. As a result, the computed dependences were different and this
difference caused in some cases the schedule optimizer to take a very long time.
Since isl version fe865996 the schedule tree based dependence analysis takes
domain constraints into account, which fixes the earlier compile-time issues.
Contributed-by: Pratik Bhatu <cs12b1010@iith.ac.in>
llvm-svn: 245300
executeScopConditionally would destroy a predecessor region if it the
scop's entry was the region's exit block by forking it to polly.start
and thus creating a secnd exit out of the region. This patch "shrinks"
the predecessor region s.t. polly.split_new_and_old is not the
region's exit anymore.
llvm-svn: 245294
The SCEVExpander cannot deal with all SCEVs Polly allows in all kinds
of expressions. To this end we introduce a ScopExpander that handles
the additional expressions separatly and falls back to the
SCEVExpander for everything else.
Reviewers: grosser, Meinersbur
Subscribers: #polly
Differential Revision: http://reviews.llvm.org/D12066
llvm-svn: 245288
The new field in the MemoryAccess allows us to track a value related
to that access:
- For real memory accesses the value is the loaded result or the
stored value.
- For straigt line scalar accesses it is the access instruction
itself.
- For PHI operand accesses it is the operand value.
We use this value to simplify code which deduced information about the value
later in the Polly pipeline and was known to be error prone.
Reviewers: grosser, Meinsersbur
Subscribers: #polly
Differential Revision: http://reviews.llvm.org/D12062
llvm-svn: 245213
This allows the code generation to continue working even if a needed
value (that is reloaded anyway) was not yet demoted. Instead of
failing it will now create the location for future demotion to memory
and load from that location. The stores will use the same location and
by construction execute before the load even if the textual order in
the generated AST is otherwise.
Reviewers: grosser, Meinersbur
Subscribers: #polly
Differential Revision: http://reviews.llvm.org/D12072
llvm-svn: 245203
This test case crashes the scalar code generation as we are not
consistent with the usage of the assumed context. To be precise, we
use the assumed context for the dependence analysis but not to
restrict the domains of the statements.
A step by step explanation of the problem is given in the test case.
llvm-svn: 245176
This option allows the user to provide additional information about parameter
values as an isl_set. To specify that N has the value 1024, we can provide
the context -polly-context='[N] -> {: N = 1024}'.
llvm-svn: 245175
The July issue of TOPLAS contains a 50 page discussion of the AST generation
techniques used in Polly. This discussion gives not only an in-depth
description of how we (re)generate an imperative AST from our polyhedral based
mathematical program description, but also gives interesting insights about:
- Schedule trees: A tree-based mathematical program description that enables us
to perform loop transformations on an abstract level, while issues like the
generation of the correct loop structure and loop bounds will be taken care of
by our AST generator.
- Polyhedral unrolling: We discuss techniques that allow the unrolling of
non-trivial loops in the context of parameteric loop bounds, complex tile
shapes and conditionally executed statements. Such unrolling support enables
the generation of predicated code e.g. in the context of GPGPU computing.
- Isolation for full/partial tile separation: We discuss native support for
handling full/partial tile separation and -- in general -- native support for
isolation of boundary cases to enable smooth code generation for core
computations.
- AST generation with modulo constraints: We discuss how modulo mappings are
lowered to efficient C/LLVM code.
- User-defined constraint sets for run-time checks We discuss how arbitrary
sets of constraints can be used to automatically create run-time checks that
ensure a set of constrainst actually hold. This feature is very useful to
verify at run-time various assumptions that have been taken program
optimization.
Polyhedral AST generation is more than scanning polyhedra
Tobias Grosser, Sven Verdoolaege, Albert Cohen
ACM Transations on Programming Languages and Systems (TOPLAS), 37(4), July 2015
llvm-svn: 245157
This modifies the order in which Polly passes are executed.
Assuming a function has two scops (A and B), the order before was:
FunctionPassManager
ScopDetection
IndependentBlocks
TempScopInfo for A and B
RegionPassManager
ScopInfo for A
DependenceInfo for A
IslScheduleOptimizer for A
IslAstInfo for A
CodeGeneration for A
ScopInfo for B
DependenceInfo for B
IslScheduleOptimizer for B
IslAstInfo for B
CodeGeneration for B
After this patch:
FunctionPassManager
ScopDetection
IndependentBlocks
RegionPassManager
TempScopInfo for A
ScopInfo for A
DependenceInfo for A
IslScheduleOptimizer for A
IslAstInfo for A
CodeGeneration for A
TempScopInfo for B
ScopInfo for B
DependenceInfo for B
IslScheduleOptimizer for B
IslAstInfo for B
CodeGeneration for B
TempScopInfo for B might store information and references to the IR
that CodeGeneration for A might modify. Changing the order ensures that
the IR is not modified from the analysis of a region until code
generation.
Reviewers: grosser
Differential Revision: http://reviews.llvm.org/D12014
llvm-svn: 245091
This change extends the BlockGenerator to not only allow Instructions as
base elements of scalar dependences, but any llvm::Value. This allows
us to code-generate scalar dependences which reference function arguments, as
they arise when moddeling read-only scalar dependences.
llvm-svn: 244874