Unsigned operations are often useful to support but the heuristics are
not yet tuned. This options allows to disable them if necessary.
llvm-svn: 288521
Relational comparisons should not involve multiple potentially
aliasing pointers. Similarly this should hold for switch conditions
and the two conditions involved in equality comparisons (separately!).
This is a heuristic based on the C semantics that does only allow such
operations when the base pointers do point into the same object.
Since this makes aliasing likely we will bail out early instead of
producing a probably failing runtime check.
llvm-svn: 288516
This allows us to delinearize code such as the one below, where the array
sizes are A[][2 * n] as there are n times two elements in the innermost
dimension. Alternatively, we could try to generate another dimension for the
struct in the innermost dimension, but as the struct has constant size,
recovering this dimension is easy.
struct com {
double Real;
double Img;
};
void foo(long n, struct com A[][n]) {
for (long i = 0; i < 100; i++)
for (long j = 0; j < 1000; j++)
A[i][j].Real += A[i][j].Img;
}
int main() {
struct com A[100][1000];
foo(1000, A);
llvm-svn: 288489
After having built memory accesses we perform some additional transformations
on them to increase the chances that our delinearization guesses the right
shape. Only after these transformations, we take the assumptions that the
array shape we predict is such that no out-of-bounds memory accesses arise.
Before this change, the construction of the memory access, the access folding
that improves the represenation for certain parametric subscripts, and taking
the assumption was all done right after a memory access was created. In this
change we split this now into three separate iterations over all memory
accesses. This means only after all memory accesses have been built, we start
to canonicalize accesses, and to take assumptions. This split prepares for
future canonicalizations that must consider all memory accesses for deriving
additional beneficial transformations.
llvm-svn: 288479
Feasibility is checked late on its own but early it is hidden behind
the "PollyProcessUnprofitable" guard. This change will make sure we opt
out early if the runtime context is infeasible anyway.
llvm-svn: 288329
We now collect:
Number of total loops
Number of loops in scops
Number of scops
Number of scops with maximal loop depth 1
Number of scops with maximal loop depth 2
Number of scops with maximal loop depth 3
Number of scops with maximal loop depth 4
Number of scops with maximal loop depth 5
Number of scops with maximal loop depth 6 and larger
Number of loops in scops (profitable scops only)
Number of scops (profitable scops only)
Number of scops with maximal loop depth 1 (profitable scops only)
Number of scops with maximal loop depth 2 (profitable scops only)
Number of scops with maximal loop depth 3 (profitable scops only)
Number of scops with maximal loop depth 4 (profitable scops only)
Number of scops with maximal loop depth 5 (profitable scops only)
Number of scops with maximal loop depth 6 and larger (profitable scops only)
These statistics are certainly completely accurate as we might drop scops
when building up their polyhedral representation, but they should give a good
indication of the number of scops we detect.
llvm-svn: 287973
Our original statistics were added before we introduced a more fine-grained
diagnostic system, but the granularity of our statistics has never been
increased accordingly. This change introduces now one statistic counter per
diagnostic to enable us to collect fine-grained statistics about who certain
scops are not detected. In case coarser grained statistics are needed, the
user is expected to combine counters manually.
llvm-svn: 287968
Do not assume a load to be hoistable/invariant if the pointer is used by
another instruction in the SCoP that might write to memory and that is
always executed.
llvm-svn: 287272
Since we do not necessarily treat memory intrinsics as non-affine
anymore, we have to check for them explicitly before we try to hoist an
access.
llvm-svn: 287270
Commit r286294 introduced support for inaccessiblememonly and
inaccessiblemem_or_argmemonly attributes to BasicAA, which we need to
support to avoid undefined behavior. This change just refuses all calls
which are annotated with these attributes, which is conservatively correct.
In the future we may consider to model and support such function calls
in Polly.
llvm-svn: 286771
The validity of a branch condition must be verified at the location of the
branch (the branch instruction), not the location of the icmp that is
used in the branch instruction. When verifying at the wrong location, we
may accept an icmp that is defined within a loop which itself dominates, but
does not contain the branch instruction. Such loops cannot be modeled as
we only introduce domain dimensions for surrounding loops. To address this
problem we change the scop detection to evaluate and verify SCEV expressions at
the right location.
This issue has been around since at least r179148 "scop detection: properly
instantiate SCEVs to the place where they are used", where we explicitly
set the scope to the wrong location. Before this commit the scope
was not explicitly set, which probably also resulted in the scope around the
ICmp to be choosen.
This resolves http://llvm.org/PR30989
Reported-by: Eli Friedman <efriedma@codeaurora.org>
llvm-svn: 286769
Assumptions can either be added for a given basic block, in which case the set
describing the assumptions is expected to match the dimensions of its domain.
In case no basic block is provided a parameter-only set is expected to describe
the assumption.
The piecewise expressions that are generated by the SCEVAffinator sometimes
have a zero-dimensional domain (e.g., [p] -> { [] : p <= -129 or p >= 128 }),
which looks similar to a parameter-only domain, but is still a set domain.
This change adds an assert that checks that we always pass parameter domains to
addAssumptions if BB is empty to make mismatches here fail early.
We also change visitTruncExpr to always convert to parameter sets, if BB is
null. This change resolves http://llvm.org/PR30941
Another alternative to this change would have been to inspect all code to make
sure we directly generate in the SCEV affinator parameter sets in case of empty
domains. However, this would likely complicate the code which combines parameter
and non-parameter domains when constructing a statement domain. We might still
consider doing this at some point, but as this likely requires several non-local
changes this should probably be done as a separate refactoring.
Reported-by: Eli Friedman <efriedma@codeaurora.org>
llvm-svn: 286444
In r248701 "Allow switch instructions in SCoPs" support for switch statements
has been introduced, but support for switch statements in loop latches was
incomplete. This change completely disables switch statements in loop latches.
The original commit changed addLoopBoundsToHeaderDomain to support non-branch
terminator instructions, but this change was incorrect: it added a check for
BI != null to the if-branch of a condition, but BI was used in the else branch
es well. As a result, when a non-branch terminator instruction is encounted a
nullptr dereference is triggered. Due to missing test coverage, this bug was
overlooked.
r249273 "[FIX] Approximate non-affine loops correctly" added code to disallow
switch statements for non-affine loops, if they appear in either a loop latch
or a loop exit. We adapt this code to now prohibit switch statements in
loop latches even if the control condition is affine.
We could possibly add support for switch statements in loop latches, but such
support should be evaluated and tested separately.
This fixes llvm.org/PR30952
Reported-by: Eli Friedman <efriedma@codeaurora.org>
llvm-svn: 286426
Add asserts that verify that the memory accesses of a new copy statement
are defined for all domain instances the copy statement is defined for.
llvm-svn: 286047
We don't actually check whether a MemoryAccess is affine in very many
places, but one important one is in checks for aliasing.
Differential Revision: https://reviews.llvm.org/D25706
llvm-svn: 285746
When adding an llvm.memcpy instruction to AliasSetTracker, it uses the raw
source and target pointers which preserve bitcasts.
MemAccInst::getPointerOperand() also returns the raw target pointers, but
Scop::buildAliasGroups() did not for the source pointer. This lead to mismatches
between AliasSetTracker and ScopInfo on which pointer to use.
Fixed by also using raw pointers in Scop::buildAliasGroups().
llvm-svn: 285071
Summary: Otherwise the lack of an iteration order results in non-determinism in codegen.
Reviewers: _jdoerfert, zinob, grosser
Tags: #polly
Differential Revision: https://reviews.llvm.org/D25863
llvm-svn: 284845
Under some conditions MK_Value read accessed where converted to MK_ExitPHI read
accessed. This is unexpected because MK_ExitPHI read accesses are implicit after
the scop execution. This behaviour was introduced in r265261, which fixed a
failed assertion/crash in CodeGen.
Instead, we fix this failure in CodeGen itself. createExitPHINodeMerges(),
despite its name, also handles accesses of kind MK_Value, only to skip them
because they access values that are usually not PHI nodes in the SCoP region's
exit block. Except in the situation observed in r265261.
Do not convert value accessed to ExitPHI accesses and do not handle
value accesses like ExitPHI accessed in CodeGen anymore.
llvm-svn: 284023
ISL tries to simplify the polyhedral operations before printing its objects.
This increases the operations counter and therefore can contribute to hitting
the operations limit. Therefore the result could be different when -debug output
is enabled, making debugging harder.
llvm-svn: 283745
IslMaxOperationsGuard defines a scope where ISL may abort operations because if
it takes too many operations. Replace the call to the raw ISL interface by a
use of the guard.
IslMaxOperationsGuard provides a uniform way to define a maximal computation
time for a code region in C++ using RAII.
llvm-svn: 283744
The core of the change is supposed to be NFC, however it also fixes
what I believe was an undefined behavior when calling:
va_start(ValueArgs, Desc);
with Desc being a StringRef.
Differential Revision: https://reviews.llvm.org/D25342
llvm-svn: 283671
With this option one can disable the heuristic that assumes that statements with
a scalar write access cannot be profitably optimized. Such a statement instances
necessarily have WAW-dependences to itself. With DeLICM scalar accesses can be
changed to array accesses, which can avoid these WAW-dependence.
llvm-svn: 283233
ScopArrayInfo used to determine base pointer origins by looking up whether the
base pointer is a load. The "base pointer" for scalar accesses is the
llvm::Value being accessed. This is only a symbolic base pointer, it
represents the alloca variable (.s2a or .phiops) generated for it at code
generation.
This patch disables determining base pointer origin for scalars.
A test case where this caused a crash will be added in the next commit. In that
test SAI tried to get the origin base pointer that was only declared later,
therefore not existing. This is probably only possible for scalars used in
PHINode incoming blocks.
llvm-svn: 283232
Summary:
Both `canUseISLTripCount()` and `addOverApproximatedRegion()` contained checks
to reject endless loops which are now removed and replaced by a single check
in `isValidLoop()`.
For reporting such loops the `ReportLoopOverlapWithNonAffineSubRegion` is
renamed to `ReportLoopHasNoExit`. The test case
`ReportLoopOverlapWithNonAffineSubRegion.ll` is adapted and renamed as well.
The schedule generation in `buildSchedule()` is based on the following
assumption:
Given some block B that is contained in a loop L and a SESE region R,
we assume that L is contained in R or the other way around.
However, this assumption is broken in the presence of endless loops that are
nested inside other loops. Therefore, in order to prevent erroneous behavior
in `buildSchedule()`, r265280 introduced a corresponding check in
`canUseISLTripCount()` to reject endless loops. Unfortunately, it was possible
to bypass this check with -polly-allow-nonaffine-loops which was fixed by adding
another check to reject endless loops in `allowOverApproximatedRegion()` in
r273905. Hence there existed two separate locations that handled this case.
Thank you Johannes Doerfert for helping to provide the above background
information.
Reviewers: Meinersbur, grosser
Subscribers: _jdoerfert, pollydev
Differential Revision: https://reviews.llvm.org/D24560
Contributed-by: Matthias Reisinger <d412vv1n@gmail.com>
llvm-svn: 281987
This is the fourth patch to apply the BLIS matmul optimization pattern on matmul
kernels (http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf).
BLIS implements gemm as three nested loops around a macro-kernel, plus two
packing routines. The macro-kernel is implemented in terms of two additional
loops around a micro-kernel. The micro-kernel is a loop around a rank-1
(i.e., outer product) update. In this change we perform copying to created
arrays, which is the last step to implement the packing transformation.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D23260
llvm-svn: 281441
The alias to the array element is read-only and a primitive type (pointer),
therefore use the value directly instead of a reference to it.
llvm-svn: 281311
We do not need the size of the outermost dimension in most cases, but if we
allocate memory for newly created arrays, that size is needed.
Reviewed-by: Michael Kruse <llvm@meinersbur.de>
Differential Revision: https://reviews.llvm.org/D23991
llvm-svn: 281234
When running the clang static analyser to check for memory issues, this code
originally showed a double free, as the analyser was unable to understand that
isl_set_free always returns NULL and consequently later uses of the isl object
we just freed will never be reached. Without this knowledge, the analyser has
to issue a warning.
We refactor the code to make it clear that for empty maps the current loop
iteration is aborted.
llvm-svn: 280940
When running the clang static analyser to check for memory issues, this code
originally showed a double free, as the analyser was unable to understand that
isl_union_map_free always returns NULL and consequently later uses of the isl
object we just freed will never be reached. Without this knowledge, the analyser
has to issue a warning.
We refactor the code to make it clear that for empty maps the current loop
iteration is aborted.
llvm-svn: 280938
... but instead rely on the assumptions that we derive for load/store
instructions.
Before we were able to delinearize arrays, we used GEP pointer instructions
to derive information about the likely range of induction variables, which
gave us more freedom during loop scheduling. Today, this is not needed
any more as we delinearize multi-dimensional memory accesses and as part
of this process also "assume" that all accesses to these arrays remain
inbounds. The old derive-assumptions-from-GEP code has consequently become
mostly redundant. We drop it both to clean up our code, but also to improve
compile time. This change reduces the scop construction time for 3mm in
no-asserts mode on my machine from 48 to 37 ms.
llvm-svn: 280601
Without reductions we do not need a flat union_map schedule describing
the computation we want to perform, but can work purely on the schedule
tree. This reduces the dependence computation and scheduling time from 33ms
to 25ms. Another 30% reduction.
llvm-svn: 280558