A prevectorized loop may contain multiple statements, in which case
isl_schedule_node_band_sink will sink the vector band to multiple
leaves. Instead of statically assuming a specific tree structure after
sinking, add a SIMD marker to all inner bands.
Fixes llvm.org/PR52637
[NFC] As part of using inclusive language within the llvm project,
this patch replaces master with main when referring to `.chm` files.
Reviewed By: teemperor
Differential Revision: https://reviews.llvm.org/D113299
This is part of an effort to reduce the differences between the custom C++ bindings used right now by polly in lib/External/isl/include/isl/isl-noxceptions.h and the official isl C++ interface.
In the official interface the type `isl::size` cannot be casted to an unsigned without previously having checked if it contains a valid value with the function `isl::size::is_error()`.
For this reason two helping functions have been added:
- `IslAssert`: assert that no errors are present in debug builds and just disables the mandatory error check in non-debug builds
- `unisgnedFromIslSIze`: cast the `isl::size` object to `unsigned`
Changes made:
- Add the functions `IslAssert` and `unsignedFromIslSize`
- Add the utility function `rangeIslSize()`
- Retype `MaxDisjunctsInDomain` from `int` to `unsigned`
- Retype `RunTimeChecksMaxAccessDisjuncts` from `int` to `unsigned`
- Retype `MaxDimensionsInAccessRange` from `int` to `unsigned`
- Replaced some usages of `isl_size` to `unsigned` since we aim not to use `isl_size` anymore
- `isl-noexceptions.h` has been generated by e704f73c88
No functional change intended.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D113101
Polly is trying to move towards using isl::ast_expr / isl-noexceptions.h
(which implements RAII) where possible instead of manually managing memory.
checkIslAstExprInt manually frees Expr, so it has been removed to be
more idiomatic and consistent.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D111769
This patch removes the broken bash scipt (polly.sh) and fixes the broken setup
instructions in get_started.html. It also adds instructions for using Ninja and
links to the LLVM getting started page.
Reviewed By: Meinersbur, InnovativeInventor
Differential Revision: https://reviews.llvm.org/D111685
Instead of being inline and having a neverCalled() workaround to make it
work in the debugger, define it as a regular exported function.
Also add overloads for the C API types isl_* so it works with managed as
well as unmanaged ISL objects.
Commit 573531fb1f fixed the colon at the
end of a CHECK line (was a semicolon by mistake). With the check
enabled, it turned out that it was failing. Check for the correct
content.
Also add the missing colon to the next CHECK line.
When the option -polly-loopfusion-greedy is set, the ScheduleOptimizer
tries to aggressively fuse any band it can and does not violate any
dependences.
As part if the implementation, the functionalty for copying a band
into an new schedule was extracted out of the ScheduleTreeRewriter.
This moves the registry higher in the LLVM library dependency stack.
Every client of the target registry needs to link against MC anyway to
actually use the target, so we might as well move this out of Support.
This allows us to ensure that Support doesn't have includes from MC/*.
Differential Revision: https://reviews.llvm.org/D111454
As described on D111049, we're trying to remove the <string> dependency from error handling and replace uses of report_fatal_error(const std::string&) with the Twine() variant which can be forward declared.
In some build environments, the C++ compiler is unable to infer the
correct type for the DenseMap::insert in isErrorBlock. Typing out
std::make_pair helps.
This is a followon to D109845. With that landed, we will have fixed all known instances of pr51817, and can thus start inferring flags more aggressively with greatly reduced risk of miscompiles. This patch simply applies the same inference logic used in that patch to our other major flag inference path.
We can still do much better here (on both paths), but this is our first step.
Differential Revision: https://reviews.llvm.org/D111003
This fixes a violation of the wrap flag rules introduced in c4048d8f. This was also noted in the (very old) PR23527.
The issue being fixed is that we assume the inbound flag on any GEP assumes that all users of *any* gep (or add) which happens to map to that SCEV would also be UB if the (other) gep overflowed. That's simply not true.
In terms of the test diffs, I don't see anything seriously problematic. The lost flags are expected (given the semantic restriction on when its legal to tag the SCEV), and there are several cases where the previously inferred flags are unsound per the new semantics.
The only common trend I noticed when looking at the deltas is that by not considering branch on poison as immediate UB in ValueTracking, we do miss a few cases we could reclaim. We may be able to claw some of these back with the follow ideas mentioned in PR51817.
It's worth noting that most of the changes are analysis result only changes. The two transform changes are pretty minimal. In one case, we miss the opportunity to infer a nuw (correctly). In the other, we fail to fold an exit and produce a loop invariant form instead. This one is probably over-reduced as the program appears to be undefined in practice, and neither before or after exploits that.
Differential Revision: https://reviews.llvm.org/D109789
multiplication
The following code modifies elements of the array D.
for (i = 0; i < _PB_NI; i++)
for (j = 0; j < _PB_NJ; j++)
{
for (k = 0; k < _PB_NK; k++)
{
double Mul = A[i][k] * B[k][j];
D[i][j][k] += Mul;
C[i][j] += Mul;
}
}
Nevertheless, the code is recognised as a matrix-matrix multiplication, since
the second and third dimensions of D are accessed with non-zero strides.
This fixes the typo, which was made during the translation to C++ bindings
(https://reviews.llvm.org/D35845).
Reviewed By: Michael Kruse <llvm@meinersbur.de>
Differential Revision: https://reviews.llvm.org/D110491
SplitBlockPredecessors is unable to insert an additional BasicBlock
between an indirectbr/callbr terminator and the successor blocks.
This is needed by Polly to normalize the control flow before emitting
its optimzed code.
This patches rejects regions entered by an indirectbr/callbr to not fail
later at code generation.
This fixes llvm.org/PR51964
Recommit with "REQUIRES: asserts" in test that uses statistics.
SplitBlockPredecessors is unable to insert an additional BasicBlock
between an indirectbr/callbr terminator and the successor blocks.
This is needed by Polly to normalize the control flow before emitting
its optimzed code.
This patches rejects regions entered by an indirectbr/callbr to not fail
later at code generation.
This fixes llvm.org/PR51964
Inline assembly was not handled at all and treated like a llvm::Value.
In particular, it tried to create a pointer it which is not allowed.
Fix by handling like a llvm::Constant such that it is just reused when
required, instead of trying to marshall it in memory.
Fixes llvm.org/PR51960
VirtualUse ensures consistency over different source of values with
Polly. In particular, this enables its use of instructions moved between
Statement. Before the patch, the code wrongly assumed that the BB's
instructions are also the ScopStmt's instructions. Reference are
determined for OpenMP outlining and GPGPU kernel extraction.
GPGPU CodeGen had some problems. For one, it generated GPU kernel
parameters for constants. Second, it emitted GPU-side invariant loads
which have already been loaded by the host. This has been partially
fixed, it still generates a store for the invariant load result, but
using the value that the host has already written.
WARNING: I did not test the generated PollyACC code on an actual GPU.
The improved consistency will be made use of in the next patch.
The function was intended to catch OpenMP functions such as
get_thread_id(). If matched, the call would be considered synthesizable.
There were a few problems with this:
* get_thread_id() is not 'const' in the sense of have the gcc manual
defines it: "do not examine any values except their arguments".
get_thread_id() reads OpenCL runtime libreary global state.
What was inteded was probably 'speculable'.
* isConstCall was implemented using mayReadOrWriteMemory(). 'const' is
stricter than that, mayReadOrWriteMemory is e.g. true for malloc(),
since it may only read/write addresses that are considered
inaccessible fro the application. However, malloc is certainly not
speculable.
* Values that are isConstCall were not handled consistently throughout
Polly. In particular, it was not considered for referenced values
(OpenMP outlining and PollyACC).
Fix by removing special handling for isConstCall entirely.
This is a simple version without the possibility to define distribute
points or followup-transformations. However, it is the first
transformation that has to check whether the transformation is correct.
It interprets the same metadata as the LoopDistribute pass.
Re-apply after revert in c7bcd72a38 with
fix: Take isBand out of #ifndef NDEBUG since it now is used
unconditionally.
The name of the option is misleading and has been renamed by isl to
"serialize-sccs". Instead of also renaming the option, remove it.
The option is still accessible using
-polly-isl-arg=--no-schedule-serialize-sccs
This is a simple version without the possibility to define distribute
points or followup-transformations. However, it is the first
transformation that has to check whether the transformation is correct.
It interprets the same metadata as the LoopDistribute pass.
This metadata was intended to mark all accesses within an iteration to be pairwise non-aliasing, in this case because every memory of a base pointer is touched (read or write) at most once. This is typical for 'sweeps' over all data. The stated motivation from D30606 is to ensure that unrolled iterations are considered non-aliasing.
Rhe implemention had multiple issues:
* The structure of the noalias metadata was malformed. D110026 added check in the verifier for this metadata, and the tests were failing since then.
* This is not true for the outer loops of the BLIS matrix multiplication, where it was being inserted. Each element of A, B, C is accessed multiple times, as often as the loop not used as an index is iterating.
* Scopes were added to SecondLevelOtherAliasScopeList (used for the !noalias scop list) on-the-fly when another SCEV was seen. This meant that previously visited instructions would not be updated with alias scopes that are only seen later, missing out those SCEVs they should not be aliasing with.
* Since the !noalias scope list would ideally consists of all other SCEV for this base pointer, we might run quickly into scalability issues. Especially after unrolling there would probably at least once SCEV per instruction and unroll instance.
* The inter-iteration noalias base pointer was not removed after leaving the loop marked with it, effectively marking everything after it to noalias as well.
A solution I considered was to mark each instruction as non-aliasing with its own scope. The instruction itself would obviously alias itself, but such construction might also be considered invalid. Duplicating the instruction (e.g. due to speculation) would mark the instruction non-aliasing with its clone. I don't want to go into this territory, especially since the original motivation of determining unrolled instances as noalias based on SCEV is the what scev-aa does as well.
This effectively reverts D30606 and D35761.
getMetadata() currently uses a weird API where it populates a
structure passed to it, and optionally merges into it. Instead,
we can return the AAMDNodes and provide a separate merge() API.
This makes usages more compact.
Differential Revision: https://reviews.llvm.org/D109852
Building a source distribution using autotools adds GPL-licenced
files into the the sources. Although redistribution of theses files is
explicitly allowed with an exception, these are not used by Polly
which uses a CMake replacement. Use the direct source checkout
instead (replacing the output of 'make dist').
Some m4 scripts with the same licence are also included in isl/ppcg
repository. Removing them renders the autotools-based build scipts
inoperable, so remove the autotools build system altogether.