Commit Graph

1526 Commits

Author SHA1 Message Date
Michael Kruse e470f9268a [Polly] Implement user-directed loop distribution/fission.
This is a simple version without the possibility to define distribute
points or followup-transformations. However, it is the first
transformation that has to check whether the transformation is correct.

It interprets the same metadata as the LoopDistribute pass.

Re-apply after revert in c7bcd72a38 with
fix: Take isBand out of #ifndef NDEBUG since it now is used
unconditionally.
2021-09-23 21:11:01 -05:00
Petr Hosek c7bcd72a38 Revert "[Polly] Implement user-directed loop distribution/fission."
This reverts commit 52c30adc7d which
breaks the build when NDEBUG is defined.
2021-09-23 14:04:25 -07:00
Michael Kruse 07e7cb9433 [Polly] Remove -polly-opt-fusion option.
The name of the option is misleading and has been renamed by isl to
"serialize-sccs". Instead of also renaming the option, remove it.
The option is still accessible using

    -polly-isl-arg=--no-schedule-serialize-sccs
2021-09-23 15:43:08 -05:00
Michael Kruse 35f7020098 [Polly] Dissolve Isl test directory. NFC.
All tests use ISL, integrate its subfolder into the components they
belong to.
2021-09-22 17:45:07 -05:00
Michael Kruse 52c30adc7d [Polly] Implement user-directed loop distribution/fission.
This is a simple version without the possibility to define distribute
points or followup-transformations. However, it is the first
transformation that has to check whether the transformation is correct.

It interprets the same metadata as the LoopDistribute pass.
2021-09-22 17:28:25 -05:00
Michael Kruse cad9f98a2a [Polly] Don't generate inter-iteration noalias metadata.
This metadata was intended to mark all accesses within an iteration to be pairwise non-aliasing, in this case because every memory of a base pointer is touched (read or write) at most once. This is typical for 'sweeps' over all data. The stated motivation from D30606 is to ensure that unrolled iterations are considered non-aliasing.

Rhe implemention had multiple issues:

 * The structure of the noalias metadata was malformed. D110026 added check in the verifier for this metadata, and the tests were failing since then.

 * This is not true for the outer loops of the BLIS matrix multiplication, where it was being inserted. Each element of A, B, C is accessed multiple times, as often as the loop not used as an index is iterating.

 * Scopes were added to SecondLevelOtherAliasScopeList (used for the !noalias scop list) on-the-fly when another SCEV was seen. This meant that previously visited instructions would not be updated with alias scopes that are only seen later, missing out those SCEVs they should not be aliasing with.

 * Since the !noalias scope list would ideally consists of all other SCEV for this base pointer, we might run quickly into scalability issues. Especially after unrolling there would probably at least once SCEV per instruction and unroll instance.

 * The inter-iteration noalias base pointer was not removed after leaving the loop marked with it, effectively marking everything after it to noalias as well.

A solution I considered was to mark each instruction as non-aliasing with its own scope. The instruction itself would obviously alias itself, but such construction might also be considered invalid. Duplicating the instruction (e.g. due to speculation) would mark the instruction non-aliasing with its clone. I don't want to go into this territory, especially since the original motivation of determining unrolled instances as noalias based on SCEV is the what scev-aa does as well.

This effectively reverts D30606 and D35761.
2021-09-20 22:20:17 -05:00
Nikita Popov 53720f74e4 [Polly] Partially fix scoped alias metadata
This partially addresses the verifier failures caused by D110026.
In particular, it does not fix the "second level" alias metadata.
2021-09-20 22:51:31 +02:00
Michael Kruse ca5f05d2df [Polly][test] Add dependency to count.
Polly does not use the count program itself, but somewhere in lit it is
expected to exists. Otherwise, the following error occurs:

    llvm-lit: llvm-project/llvm/utils/lit/lit/llvm/subst.py:133: fatal: Did not find count in ./bin
2021-08-28 22:50:07 -05:00
Michael Kruse ffa39b4582 [Polly] Fix dumpfunction.ll test. 2021-08-28 22:43:07 -05:00
Michael Kruse e4f3f2c0c5 [Polly] Don't prune non-external function itself from dump. 2021-08-28 17:06:53 -05:00
Michael Kruse 1537563104 [Polly][test] Add missing %loadPolly.
This fixes check-polly when using the -load mechanism,
i.e. LLVM_POLLY_LINK_INTO_TOOLS=OFF.
2021-08-24 13:47:25 -05:00
Michael Kruse 955b91c19c [Polly] Never consider non-SCoP blocks as error blocks.
Code outside the SCoP will be executed recardless of the code versioning
runtime check introduced by CodeGeneration. Assumption made based on
that these are never executed in Polly-optimized code does not hold.

This fixes the miscompilation of MultiSource/Applications/lambda-0.1.3
2021-08-23 01:04:01 -05:00
Michael Kruse 9cfab5e249 [Polly] Add support for -polly-dump-before/after with NPM.
The new pass manager does not allow adding module passes at the
-polly-position=before-vectorizer extension point. Introduce a
DumpFunctionPass that dumps only current function. In contrast to the
legacy pass manager's -polly-dump-before, each function will be dumped
into its own file. -polly-dump-before-file is still not supported.

The DumpFunctionPass uses llvm::CloneModule to copy the current function
into a new module and then write it into a file.
2021-08-22 20:43:35 -05:00
Eli Friedman 3f2828dc28 [polly] Fix up regression test config with current features.
Primarily, configure substitutions so we can copy-paste the "RUN" line
of failed tests without worrying about the paths.
2021-07-30 13:44:48 -07:00
Riccardo Mori ec3da1a43f Update isl to isl-0.24-69-g54aac5ac
This is needed for having the functions isl_{set,map}_n_basic_{set,map}
exported to the C++ interface.
Some tests have been modified to reflect the isl changes.
2021-07-27 17:38:12 +02:00
Michael Kruse 84046ebd95 [Polly] Fix test after D104732.
The SCEV analysis has been improved to identify a write access as a MustWrite.
2021-06-23 14:59:53 -05:00
Bjorn Pettersson 6aac2773d8 [polly][GPGPU] Fixup related to overloading exponent type in llvm.powi
Commit 4c7f820b2b changed the llvm.powi intrinsic to support
different 'int' sizes for the exponent. That happened to break
the IntrinsicToLibdeviceFunc mapping in PPCGCodeGeneration, which
obviously should have been updated as part of commit 4c7f820b2b
(https://reviews.llvm.org/D99439).

The shortcoming was found by buildbots that use
   -DPOLLY_ENABLE_GPGPU_CODEGEN=ON

This patch should fixup the problem.
2021-06-18 08:59:06 +02:00
Michael Kruse a56bd7dec8 [Polly][Matmul] Re-pack A in every iteration.
Packed_A must be copied repeatedly, not just for the first iteration of
the outer tile.

This fixes llvm.org/PR50557
2021-06-09 15:19:52 -05:00
Eli Friedman fd229caa01 [polly] Fix SCEVLoopAddRecRewriter to avoid invalid AddRecs.
When we're remapping an AddRec, the AddRec constructed by a partial
rewrite might not make sense.  This triggers an assertion complaining
it's not loop-invariant.

Instead of constructing the partially rewritten AddRec, just skip
straight to calling evaluateAtIteration.

Testcase was automatically reduced using llvm-reduce, so it's a little
messy, but hopefully makes sense.

Differential Revision: https://reviews.llvm.org/D102959
2021-06-01 09:51:05 -07:00
serge-sans-paille 4ab3041acb Revert "[NFC] remove explicit default value for strboolattr attribute in tests"
This reverts commit bda6e5bee0.

See https://lab.llvm.org/buildbot/#/builders/109/builds/15424 for instance
2021-05-24 19:43:40 +02:00
serge-sans-paille bda6e5bee0 [NFC] remove explicit default value for strboolattr attribute in tests
Since d6de1e1a71, no attributes is quivalent to
setting attribute to false.

This is a preliminary commit for https://reviews.llvm.org/D99080
2021-05-24 19:31:04 +02:00
Michael Kruse ad568f4286 [Polly] Add support for -polly-dump-after(-file) with the NPM.
For the same reason as with -polly-dump-before, it is only supported
with -polly-position=early.
2021-05-17 22:20:47 -05:00
Michael Kruse 29bef8e4e3 [Polly] Add support for -polly-dump-before(-file) with the NPM.
Only supported with -polly-position=early. Unfortunately, the
extension point callpack for VectorizerStart only passes a
FunctionPassManager, making it impossible to add a module pass.
2021-05-17 20:58:37 -05:00
Michael Kruse 5aafcb2b44 [Polly] Add support for -polly-position=early with the NPM.
This required support for the canonicalization passes, inlcuding
porting RewriteByReferenceParams to the NPM.

For some reason, the legacy pass pipeline with -polly-position=early did
not run the CodePreparation pass. This was fixed as well.
2021-05-14 12:55:03 -05:00
Michael Kruse 286677870b [Polly][ManualOpt] Match interpretation of unroll metadata to LoopUnrolls's.
We previously had a different interpretation of unroll transformation
attributes than how LoopUnroll interpreted it. In particular,
llvm.loop.unroll.enable was needed explicitly to enable it and disabling
metadata was ignored.
Additionally, it required that either full unrolling or an unroll factor
to be specified or fail otherwise. An unroll factor is still required,
but the transformation is ignored with the hope that LoopUnroll is going
to apply the unrolling, since Polly currently does not implement an
heuristic.

Fixes llvm.org/PR50109
2021-04-24 04:30:19 -05:00
Roman Lebedev 2aff4f7f57
[polly] Fix check-polly after SCEVExpander PtrToInt fixes 2021-04-19 19:10:55 +03:00
Michael Kruse 8796451d6e [Polly] Port DeadCodeElim to the NewPM. 2021-03-24 01:01:29 -05:00
Michael Kruse f51427afb5 [Polly][Unroll] Fix unroll_double test.
We enumerated the cross product Domain x Scatter, but sorted only be the
scatter key. In case there are are multiple statement instances per
scatter value, the order between statement instances of the same loop
iteration was undefined.

Propertly enumerate and sort only by the scatter value, and group the
domains using the scatter dimension again.

Thanks to Leonard Chan for the report.
2021-03-16 09:00:42 -05:00
Michael Kruse 3f170eb197 [Polly][Optimizer] Apply user-directed unrolling.
Make Polly look for unrolling metadata (https://llvm.org/docs/TransformMetadata.html#loop-unrolling) that is usually only interpreted by the LoopUnroll pass and apply it to the SCoP's schedule.

While not that useful by itself (there already is an unroll pass), it introduces mechanism to apply arbitrary loop transformation directives in arbitrary order to the schedule. Transformations are applied until no more directives are found. Since ISL's rescheduling would discard the manual transformations and it is assumed that when the user specifies the sequence of transformations, they do not want any other transformations to apply. Applying user-directed transformations can be controlled using the `-polly-pragma-based-opts` switch and is enabled by default.

This does not influence the SCoP detection heuristic. As a consequence, loop that do not fulfill SCoP requirements or the initial profitability heuristic will be ignored. `-polly-process-unprofitable` can be used to disable the latter.

Other than manually editing the IR, there is currently no way for the user to add loop transformations in an order other than the order in the default pipeline, or transformations other than the one supported by clang's LoopHint. See the `unroll_double.ll` test as example that clang currently is unable to emit. My own extension of `#pragma clang loop` allowing an arbitrary order and additional transformations is available here: https://github.com/meinersbur/llvm-project/tree/pragma-clang-loop. An effort to upstream this functionality as `#pragma clang transform` (because `#pragma clang loop` has an implicit transformation order defined by the loop pipeline) is D69088.

Additional transformations from my downstream pragma-clang-loop branch are tiling, interchange, reversal, unroll-and-jam, thread-parallelization and array packing. Unroll was chosen because it uses already-defined metadata and does not require correctness checks.

Reviewed By: sebastiankreutzer

Differential Revision: https://reviews.llvm.org/D97977
2021-03-15 13:05:39 -05:00
Roman Lebedev 78b8ce40ef
Reland [SCEV] Improve modelling for (null) pointer constants
This reverts commit 329aeb5db4,
and relands commit 61f006ac65.

This is a continuation of D89456.

As it was suggested there, now that SCEV models `PtrToInt`,
we can try to improve SCEV's pointer handling.
In particular, i believe, i will need this in the future
to further fix `SCEVAddExpr`operation type handling.

This removes special handling of `ConstantPointerNull`
from `ScalarEvolution::createSCEV()`, and add constant folding
into `ScalarEvolution::getPtrToIntExpr()`.
This way, `null` constants stay as such in SCEV's,
but gracefully become zero integers when asked.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D98147
2021-03-13 16:05:34 +03:00
Roman Lebedev 329aeb5db4
Temporairly evert "[SCEV] Improve modelling for (null) pointer constants"
This appears to have broken ubsan bot:
https://lab.llvm.org/buildbot/#/builders/85/builds/3062
https://reviews.llvm.org/D98147#2623549

It looks like LSR needs some kind of a change around insertion point handling.
Reverting until i have a fix.

This reverts commit 61f006ac65.
2021-03-13 09:10:28 +03:00
Roman Lebedev 61f006ac65
[SCEV] Improve modelling for (null) pointer constants
This is a continuation of D89456.

As it was suggested there, now that SCEV models `PtrToInt`,
we can try to improve SCEV's pointer handling.
In particular, i believe, i will need this in the future
to further fix `SCEVAddExpr`operation type handling.

This removes special handling of `ConstantPointerNull`
from `ScalarEvolution::createSCEV()`, and add constant folding
into `ScalarEvolution::getPtrToIntExpr()`.
This way, `null` constants stay as such in SCEV's,
but gracefully become zero integers when asked.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D98147
2021-03-12 22:11:58 +03:00
Roman Lebedev f449e5ef9b
[NFCI] Fix polly tests after b46c085d2b
That commit changed SCEVExpander to emit intrinsics instead of icmp+select,
but i forgot about polly, and i'm not sure if any bots complained.
2021-03-07 20:44:04 +03:00
Michael Kruse b85c98b4c5 [Polly][Codegen] Emit access group metadata.
Emit llvm.loop.parallel_accesses metadata instead of
llvm.mem.parallel_loop_access. The latter is deprecated because it
assumes that LoopIDs are persistent, which they are not.
We also emit parallel access metadata for all surrounding parallel
loops, not just the innermost parallel.
2021-03-04 03:58:03 -06:00
Michael Kruse 91c472c86c [Polly] Fix test after D96534. 2021-02-19 12:49:29 -06:00
Michael Kruse 089421ba9a [Polly] Test all optimization levels. 2021-02-14 00:31:10 -06:00
Michael Kruse 95ef556bd1 [Polly] Preserve DetectionContext references.
DetectionContext objects are stored as values in a DenseMap. When the
DenseMap reaches its maximum load factor, it is resized and all its
objects moved to a new memory allocation. Unfortunately Scop object have
a reference to its DetectionContext. When the DenseMap resizes, all the
DetectionContexts reference now point to invalid memory, even if caused
by an unrelated DetectionContext.

Even worse, NewPM's ScopPassManager called isMaxRegionInScop with the
Verify=true parameter before each pass. This caused the old
DetectionContext to be removed an a new on created and re-verified.
Of course, the Scop object was already created pointing to the old
DetectionContext. Because the new DetectionContext would
usually be stored at the same position in the DenseMap, the reference
would usually reference the new DetectionContext of the same Region.
Usually.
If not, the old position still points to memory in the DenseMap
allocation (unless also a resizing occurs) such that tools like Valgrind
and AddressSanitizer would not be able to diagnose this.

Instead of storing the DetectionContext inside the DenseMap, use a
std::unique_ptr to a DetectionContext allocation, i.e. it will not move
around anymore. This also allows use to remove the very strange

    DetectionContext(const DetectionContext &&)

copy/move(?) constructor. DetectionContext objects now are neither
copied nor moved.

As a result, every re-verification of a DetectionContext will use a new
allocation. Therefore, once a Scop object has been created using a
DetectionContext, it must not be re-verified (the Scop data structure
requires its underlying Region to not change before code generation
anyway). The NewPM may call isMaxRegionInScop only with
Validate=false parameter.
2021-02-13 03:36:09 -06:00
Michael Kruse d50f92a4f0 [Polly] Added dedicated test for working -O3 pipeline.
Test the NewPM as well as the legacy PM.
2021-02-10 13:25:56 -06:00
Michael Kruse 11511ee343 [Polly] Do not use -O3 pipeline for single pass test. 2021-02-10 13:25:56 -06:00
Michael Kruse e200df952b [Polly] Port IslScheduleOptimizer to the NewPM. 2021-02-09 23:56:21 -06:00
Michael Kruse b687fc9122 [Polly] Port PruneUnprofitable to the NewPM. 2021-02-09 23:56:20 -06:00
Michael Kruse 7903d594ea [Polly] Port DeLICM to the NewPM. 2021-02-09 23:56:19 -06:00
Michael Kruse 4c64d8ee3a [Polly] Port ForwardOpTree to the NewPM. 2021-02-09 23:56:19 -06:00
Michael Kruse 3dcb535115 [Polly] Remove use of -O3 in regression test.
In addition to that regression tests should not test the intire pass
pipeline (unless they are testing the pipeline itself), the Polly-ACC
currently does not support the new pass manager. If enabled by default,
such tests will therefore fail.

Use the -polly-gpu-runtime and -polly-gpu-arch options also as default
values for the PPCGCodeGeneration pass. This requires to move the option
to be moved from the pipeline-building Register passes to the
PPCGCodeGeneration implementation.

Fixes the spir-typesize.ll buildbot fail.
2021-02-09 18:13:35 -06:00
Arthur Eubanks 781a1b1e36 [test] Pin spir-codegen.ll to legacy PM
-polly-enable-delicm is not supported under the new PM but is tested here:
  Assertion `!EnableDeLICM && "This option is not implemented"' failed.
2021-02-03 19:37:32 -08:00
Michael Kruse 3b9677e1ec [Polly] Track defined behavior for PHI predecessor computation.
ZoneAlgorithms's computePHI relies on being provided with consistent a
schedule to compute the statement prodecessors of a statement containing
PHINodes. Otherwise unexpected results such as PHI nodes with multiple
predecessors can occur which would result in problems in the
algorithms expecting consistent data.

In the added test case, statement instances are scrubbed from the
SCoP their execution would result in undefined behavior (Due to a nsw
overflow). As already being undefined behavior in LLVM-IR, neither
AssumedContext nor InvalidContext are updated, giving computePHI no
means to avoid these cases.

Intoduce a new SCoP property, the DefinedBehaviorContext, that among
the runtime-checked conditions, also tracks the assumptions not needing
a runtime check, in particular those affecting the assumed control flow.
This replaces the manual combination of the 3 other contexts that was
already done in computePHI and setNewAccessRelation. Currently, the only
additional assumption is that loop induction variables will nsw flag for
not wrap, but potentially more can be added. Use in
hasFeasibleRuntimeContext, isl::ast_build and gisting are other
potential uses.

To limit computational complexity, the DefinedBehaviorContext is not
availabe if it grows too large (atm hardcoded to 8 disjuncts).

Possible other fixes include bailing out in computePHI when
inconsistencies are detected, choose an arbitrary value for inconsistent
cases (since it is undefined behavior anyways), or make the code
receiving the result from ComputePHI handle inconsistent data. All of
them reduce the quality of implementation having to bail out more often
and disabling the ability to assert on actually wrong results.

This fixes llvm.org/PR48783.
2021-01-23 13:03:49 -06:00
Michael Kruse a5b895110f [Polly] Gist new access relations using the SCoP context.
This simplifies the access relations.
2021-01-23 13:03:48 -06:00
Arthur Eubanks cabe1b1124 [polly][NewPM][test] Fix polly tests under -enable-new-pm
In preparation for turning on opt's -enable-new-pm by default, this pins
uses of passes via the legacy "opt -passname" with pass names beginning
with "polly-" and "polyhedral-info" to the legacy PM. Many of these
tests use -analyze, which isn't supported in the new PM.

(This doesn't affect uses of "opt -passes=passname").

rL240766 accidentally removed `-polly-prepare` in
phi_not_grouped_at_top.ll, and it also doesn't use the output of
-analyze.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D94266
2021-01-19 12:38:58 -08:00
Michael Kruse 842314b5f0 [Polly] Update isl to isl-0.23-61-g24e8cd12.
This fixes llvm.org/PR48554

Some test cases had to be updated because the hash function for
union_maps have been changed which affects the output order.
2021-01-19 12:01:31 -06:00
Juneyoung Lee 278aa65cc4 [IR] Let IRBuilder's CreateVectorSplat/CreateShuffleVector use poison as placeholder
This patch updates IRBuilder to create insertelement/shufflevector using poison as a placeholder.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D93793
2020-12-30 04:21:04 +09:00
Michael Kruse bc633fe46b [Polly] Consider InvalidContext to determine partial READ.
MemoryAccess::setNewAccessRelation() in assert-builds checks whether the
access relation for a READ has a memory location for every instance of
the domain. Otherwise, we would not have value to load from. That check
already considered that instances outside the Scop's context do not
matter since they are never executed (or would be undefined behavior).
In this patch also take instances of the InvalidContext into account,
as these can also be assumed to never occur. InvalidContext was
introduced to avoid the computational complexity of subtracting
restrictions from the AssumedContext. However, this additional check in
setNewAccessRelation is only done in assert-builds.

The assertion case with an InvalidContext may occur with DeLICM on a
conditionally infinite loops, as it is the case in the following code:

    for (int i = 0; i < n; i+=b)
      vreg = ...;
    *Dest = vreg;

The loop is infinite when b=0, and [b] -> { : b = 0 }  is part of the
InvalidContext. When DeLICM tries to map the memory for %vreg to *Dest,
there is no store instance that uses the value of vreg when b = 0, hence
no location to map it to. However, the case is irrelevant since Polly's
runtime condition check ensures that this is never case.

Fixes llvm.org/PR48445
2020-12-10 22:25:19 -06:00
Michael Kruse 6249bfeefe [Polly][CodeGen] Remove use of ScalarEvolution.
ScalarEvolution::getSCEV cannot be used during codegen. ScalarEvolution
assumes a stable IR and control flow which is under construction during
Polly's CodeGen. In particular, it uses DominatorTree for compute the
backedge taken count. However the DominatorTree is not updated during
codegen.

In this case, SCEV was used to determine the base pointer of an array
access. Replace it by our own function. Polly generates only GEP and
BitCasts for array acceses, i.e. it is sufficient to handle these to to
find the base pointer.

Fixes llvm.org/PR48422
2020-12-07 15:21:51 -06:00
Michael Kruse c8a0e27cfb [Polly][OpTree] Fix mid-processing change of access kind.
Operand tree forwarding can cause the change of an access kind; in
particular change from a scalar kind to an array kind if the scalar
dependency is not necessary. Such an access cannot and doesn't need to
be forwarded anymore.

Fixes llvm.org/PR48034
2020-11-11 16:21:48 -06:00
Michael Kruse c1cf51e777 [Polly][OpTree] Better report applied changes.
Print to dbgs() any taken action.

Also, read-only scalars do not require any action unless
-polly-analyze-read-only-scalars=true is used. Better refect this by
using ForwardingAction::triviallyForwardable and thus not bumping the
statistics.
2020-11-11 16:21:48 -06:00
Michael Kruse e408935bb5 [Polly][ScopBuilder] Use only modeled instructions to compute statement granularity.
ScopBuilder distributes independent instructions between statements.
Only modeled (e.g. not synthesizable) instructions are represented.
To compute independence, non-modeled instructions were used in some
parts of determining instruction independence, which could lead to the
re-introduction of non-model instructions.

In particular, required invariant loads could be added to instruction
list, which then led to redundant MemoryAccesses for such a load.

This fixes llvm.org/PR48059.
2020-11-10 15:30:16 -06:00
Roman Lebedev b4916918e5
[SCEV] SCEVPtrToIntExpr simplifications
If we've got an SCEVPtrToIntExpr(op), where op is not an SCEVUnknown,
we want to sink the SCEVPtrToIntExpr into an operand,
so that the operation is performed on integers,
and eventually we end up with just an `SCEVPtrToIntExpr(SCEVUnknown)`.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D89692
2020-10-30 11:13:35 +03:00
Roman Lebedev 81fc53a36a
[SCEV] Introduce SCEVPtrToIntExpr (PR46786)
And use it to model LLVM IR's `ptrtoint` cast.

This is essentially an alternative to D88806, but with no chance for
all the problems it caused due to having the cast as implicit there.
(see rG7ee6c402474a2f5fd21c403e7529f97f6362fdb3)

As we've established by now, there are at least two reasons why we want this:
* It will allow SCEV to actually model the `ptrtoint` casts
  and their operands, instead of treating them as `SCEVUnknown`
* It should help with initial problem of PR46786 - this should eventually allow us
  to not loose pointer-ness of an expression in more cases

As discussed in [[ https://bugs.llvm.org/show_bug.cgi?id=46786 | PR46786 ]], in principle,
we could just extend `SCEVUnknown` with a `is ptrtoint` cast, because `ScalarEvolution::getPtrToIntExpr()`
should sink the cast as far down into the expression as possible,
so in the end we should always end up with `SCEVPtrToIntExpr` of `SCEVUnknown`.

But i think that it isn't the best solution, because it doesn't really matter
from memory consumption side - there probably won't be *that* many `SCEVPtrToIntExpr`s
for it to matter, and it allows for much better discoverability.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D89456
2020-10-30 11:13:35 +03:00
Michael Kruse 7175cffb21 [Polly] Reuse multiple uses in operand tree.
Recursively traversing the operand tree leads to an exponential blowup
if instructions are used multiple times due to every path leading to an
additional copy of the instructions after forwarding. This problem was
marked as a TODO in the code and was reported as a bug in llvm.org/PR47340.

Fix by caching already visited instructions and returning the cached
version when already visited. Instead of calling forwardTree() twice,
return a ForwardingAction structure that contains a lambda which will
carry-out the forwarding when requested. The lambdas are executed in
reverse-postorder to mimic the previous recursive calls unless there
is a reuse.

Fixes llvm.org/PR47340
2020-10-20 18:05:35 -05:00
Mark Schimmel 8e570abf10 Polly - specify address space when creating a pointer to a vector type
Polly incorrectly dropped the address space specified for a load instruction when it vectorized the code.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D88907
2020-10-14 11:17:15 -05:00
Roman Lebedev 7ee6c40247
Revert "Reland "[SCEV] Model ptrtoint(SCEVUnknown) cast not as unknown, but as zext/trunc/self of SCEVUnknown"" and it's follow-ups
While we haven't encountered an earth-shattering problem with this yet,
by now it is pretty evident that trying to model the ptr->int cast
implicitly leads to having to update every single place that assumed
no such cast could be needed. That is of course the wrong approach.

Let's back this out, and re-attempt with some another approach,
possibly one originally suggested by Eli Friedman in
https://bugs.llvm.org/show_bug.cgi?id=46786#c20
which should hopefully spare us this pain and more.

This reverts commits 1fb6104293,
7324616660,
aaafe350bb,
e92a8e0c74.

I've kept&improved the tests though.
2020-10-14 16:09:18 +03:00
Roman Lebedev 1fb6104293
Reland "[SCEV] Model ptrtoint(SCEVUnknown) cast not as unknown, but as zext/trunc/self of SCEVUnknown"
This relands commit 1c021c64ca which was
reverted in commit 17cec6a11a because
an assertion was being triggered, since `BuildConstantFromSCEV()`
wasn't updated to handle the case where the constant we want to truncate
is actually a pointer. I was unsuccessful in coming up with a test case
where we'd end there with constant zext/sext of a pointer,
so i didn't handle those cases there until there is a test case.

Original commit message:

While we indeed can't treat them as no-ops, i believe we can/should
do better than just modelling them as `unknown`. `inttoptr` story
is complicated, but for `ptrtoint`, it seems straight-forward
to model it just as a zext-or-trunc of unknown.

This may be important now that we track towards
making inttoptr/ptrtoint casts not no-op,
and towards preventing folding them into loads/etc
(see D88979/D88789/D88788)

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D88806
2020-10-12 23:02:55 +03:00
Hans Wennborg 17cec6a11a Revert 1c021c64c "[SCEV] Model ptrtoint(SCEVUnknown) cast not as unknown, but as zext/trunc/self of SCEVUnknown"
> While we indeed can't treat them as no-ops, i believe we can/should
> do better than just modelling them as `unknown`. `inttoptr` story
> is complicated, but for `ptrtoint`, it seems straight-forward
> to model it just as a zext-or-trunc of unknown.
>
> This may be important now that we track towards
> making inttoptr/ptrtoint casts not no-op,
> and towards preventing folding them into loads/etc
> (see D88979/D88789/D88788)
>
> Reviewed By: mkazantsev
>
> Differential Revision: https://reviews.llvm.org/D88806

It caused the following assert during Chromium builds:

  llvm/lib/IR/Constants.cpp:1868:
  static llvm::Constant *llvm::ConstantExpr::getTrunc(llvm::Constant *, llvm::Type *, bool):
  Assertion `C->getType()->isIntOrIntVectorTy() && "Trunc operand must be integer"' failed.

See code review for a link to a reproducer.

This reverts commit 1c021c64ca.
2020-10-12 18:39:35 +02:00
Roman Lebedev 1c021c64ca
[SCEV] Model ptrtoint(SCEVUnknown) cast not as unknown, but as zext/trunc/self of SCEVUnknown
While we indeed can't treat them as no-ops, i believe we can/should
do better than just modelling them as `unknown`. `inttoptr` story
is complicated, but for `ptrtoint`, it seems straight-forward
to model it just as a zext-or-trunc of unknown.

This may be important now that we track towards
making inttoptr/ptrtoint casts not no-op,
and towards preventing folding them into loads/etc
(see D88979/D88789/D88788)

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D88806
2020-10-12 11:04:03 +03:00
Pengxuan Zheng deb00cf0b5 [Polly][NewPM] Port Simplify to the new pass manager
Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D87328
2020-09-20 19:18:01 -07:00
Michael Kruse 6983741eaa [Polly] Fix use-after-free.
VirtualUse of type UseKind::Inter expects the definition of a
llvm::Value to be represented in another statement. In the bug report
that statement has been removed due to its domain being empty.
Scop::InstStmtMap for the llvm::Value's defintion still pointed to the
removed statement, which resulted in the use-after-free.

The defintion statement was removed by Simplify because it was
considered to not be reachable by other uses; trivially because it is
never executed due to its empty domain. However, no such thing happend
to the using statement using the value altough its domain is also empty.

Fix by always removing statements with empty domains in Simplify since
these are not properly analyzable. A UseKind::Inter should always have a
statement with its defintion due to LLVM's SSA form.
Scop::removeStmtNotInDomainMap() also removes statements with empty
domains but does so without considering the context as used by
Simplify's analyzes.

In another angle, InstStmtMap pointing to removed statements should not
happen either and ForwardOpTree would have bailed out if the llvm::Value
definition was not represented by a statement. This will be corrected in
a followup-commit.

This fixes llvm.org/PR47098
2020-08-22 10:10:49 -05:00
Michael Kruse a54eb9b7c5 [Polly] Update isl to isl-0.22.1-416-g61d6dc75.
This fixes llvm.org/PR47104
2020-08-21 00:28:44 -05:00
Michael Kruse 1139d899d5 [polly] Unbreak buildbot.
The test failed since commit
bc10888dc "DomTree: Make PostDomTree indifferent to block successors swap"
which is a re-commit of
c35585e20 "DomTree: Make PostDomTree immune to block successors swap"
2020-08-06 21:17:27 -05:00
Wei Wang 46ebb619bf [FIX] Resolve test failure in polly/test/ScopInfo/memcpy-raw-source.ll
scoped-noalias -> scoped-noalias-aa

reference: https://reviews.llvm.org/D84542

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D84720
2020-07-28 09:15:40 -07:00
serge-sans-paille 515bc8c155 Harmonize Python shebang
Differential Revision: https://reviews.llvm.org/D83857
2020-07-16 21:53:45 +02:00
Michael Kruse c0bc995429 [Polly] Fix prevectorization of fused loops.
The schedule of a fused loop has one isl_space per statement, such that
a conversion to a isl_map fails. However, the prevectorization is
interested in the schedule space only: Converting to the non-union
representation only after extracting the schedule range fixes the problem.

This fixes llvm.org/PR46578
2020-07-10 16:42:03 -05:00
Michael Kruse 32bf468420 [Polly] Fix -polly-opt-isl -analyze
The member LastSchedule was never set, such that printScop would always
print "n/a" instead of the last schedule.

To ensure that the isl_ctx lives as least as long as the stored
schedule, also store a shared_ptr.

Also set the schedule tree output style to ISL_YAML_STYLE_BLOCK to avoid
printing everything on a single line.

`opt -polly-opt-isl -analyze` will be used in the next commit.
2020-07-10 16:42:03 -05:00
Roman Lebedev a2619a60e4
Reland "[ScalarEvolution] createSCEV(): recognize `udiv`/`urem` disguised as an `sdiv`/`srem`"
This reverts commit d3e3f36ff1,
which reverter the original commit 2c16100e6f,
but with polly tests now actually passing.
2020-07-06 18:00:22 +03:00
Arthur Eubanks b210c9899b [BasicAA] Replace -basicaa with -basic-aa in polly
Follow up to https://reviews.llvm.org/D82607.
2020-06-30 15:50:17 -07:00
Simon Pilgrim 74dc081ef2 Update polly tests to use -disable-basicaa to -disable-basic-aa
These were missed in rG4cd19a6e15120cb
2020-06-27 15:56:01 +01:00
Eli Friedman f26bdb539e Make Value::getPointerAlignment() return an Align, not a MaybeAlign.
If we don't know anything about the alignment of a pointer, Align(1) is
still correct: all pointers are at least 1-byte aligned.

Included in this patch is a bugfix for an issue discovered during this
cleanup: pointers with "dereferenceable" attributes/metadata were
assumed to be aligned according to the type of the pointer.  This
wasn't intentional, as far as I can tell, so Loads.cpp was fixed to
stop making this assumption. Frontends may need to be updated.  I
updated clang's handling of C++ references, and added a release note for
this.

Differential Revision: https://reviews.llvm.org/D80072
2020-05-20 16:37:20 -07:00
Eli Friedman 1a6e4a2cf6 Fix polly tests after D79968. 2020-05-15 15:15:09 -07:00
Eli Friedman 4532a50899 Infer alignment of unmarked loads in IR/bitcode parsing.
For IR generated by a compiler, this is really simple: you just take the
datalayout from the beginning of the file, and apply it to all the IR
later in the file. For optimization testcases that don't care about the
datalayout, this is also really simple: we just use the default
datalayout.

The complexity here comes from the fact that some LLVM tools allow
overriding the datalayout: some tools have an explicit flag for this,
some tools will infer a datalayout based on the code generation target.
Supporting this properly required plumbing through a bunch of new
machinery: we want to allow overriding the datalayout after the
datalayout is parsed from the file, but before we use any information
from it. Therefore, IR/bitcode parsing now has a callback to allow tools
to compute the datalayout at the appropriate time.

Not sure if I covered all the LLVM tools that want to use the callback.
(clang? lli? Misc IR manipulation tools like llvm-link?). But this is at
least enough for all the LLVM regression tests, and IR without a
datalayout is not something frontends should generate.

This change had some sort of weird effects for certain CodeGen
regression tests: if the datalayout is overridden with a datalayout with
a different program or stack address space, we now parse IR based on the
overridden datalayout, instead of the one written in the file (or the
default one, if none is specified). This broke a few AVR tests, and one
AMDGPU test.

Outside the CodeGen tests I mentioned, the test changes are all just
fixing CHECK lines and moving around datalayout lines in weird places.

Differential Revision: https://reviews.llvm.org/D78403
2020-05-14 13:03:50 -07:00
Michael Kruse 1ef55ac96e [Polly] Fix long loop due to unsigned warparound.
After the update to ISL to isl-0.22.1-87-gfee05a13 and its change of
isl_*_dim returning -1 instead of 0, the -1 got wrapped-around to
UINT_MAX because Polly often uses 'unsigned' type to represent
dimensions, as ISL did before this patch. This may happen in normal
executions after an out-of-quota.

Fix by catching the error-case earlier.
2020-04-27 12:15:56 -05:00
Eli Friedman 9b9454af8a Require "target datalayout" to be at the beginning of an IR file.
This will allow us to use the datalayout to disambiguate other
constructs in IR, like load alignment. Split off from D78403.

Differential Revision: https://reviews.llvm.org/D78413
2020-04-20 11:55:49 -07:00
Eli Friedman 89e0662dee Make IRBuilder automatically set alignment on load/store/alloca.
This is equivalent in terms of LLVM IR semantics, but we want to
transition away from using MaybeAlign to represent the alignment of
these instructions.

Differential Revision: https://reviews.llvm.org/D77984
2020-04-13 13:43:14 -07:00
Michael Kruse 4dded1a7cb [Polly] Add -polly-isl-arg command line option.
The option is passed as argv to ISL's command line option parser.

Polly's own own command line options take precedence over options passed
as `-polly-isl-arg`. For instance,
`-polly-isl-arg=--schedule-outer-coincidence` will be ignored in favor
of `-polly-opt-outer-coincidence`.

Reviewed By: grosser

Differential Revision: https://reviews.llvm.org/D77303
2020-04-06 08:56:57 -05:00
Yuanfang Chen 4ad7685258 Revert "Revert "Reland "[Support] make report_fatal_error `abort` instead of `exit`"""
This reverts commit 80a34ae311 with fixes.

Previously, since bots turning on EXPENSIVE_CHECKS are essentially turning on
MachineVerifierPass by default on X86 and the fact that
inline-asm-avx-v-constraint-32bit.ll and inline-asm-avx512vl-v-constraint-32bit.ll
are not expected to generate functioning machine code, this would go
down to `report_fatal_error` in MachineVerifierPass. Here passing
`-verify-machineinstrs=0` to make the intent explicit.
2020-02-13 10:16:06 -08:00
Yuanfang Chen 17122ec10a Revert "Revert "Revert "Reland "[Support] make report_fatal_error `abort` instead of `exit`""""
This reverts commit bb51d24330.
2020-02-13 10:08:05 -08:00
Yuanfang Chen bb51d24330 Revert "Revert "Reland "[Support] make report_fatal_error `abort` instead of `exit`"""
This reverts commit 80a34ae311 with fixes.

On bots llvm-clang-x86_64-expensive-checks-ubuntu and
llvm-clang-x86_64-expensive-checks-debian only,
llc returns 0 for these two tests unexpectedly. I tweaked the RUN line a little
bit in the hope that LIT is the culprit since this change is not in the
codepath these tests are testing.
llvm\test\CodeGen\X86\inline-asm-avx-v-constraint-32bit.ll
llvm\test\CodeGen\X86\inline-asm-avx512vl-v-constraint-32bit.ll
2020-02-13 10:02:53 -08:00
Michael Halkenhäuser 1e0be76e98 [Polly] LLVM OpenMP Backend -- Fix "static chunked" scheduling.
Static chunked OpenMP scheduling has not been treated correctly.
This patch fixes the problem that threads would not process their
(work-)chunks as intended.

Differential Revision: https://reviews.llvm.org/D61081
2020-02-11 12:51:35 -06:00
Michael Kruse e8227804ac [Polly] Update ISL to isl-0.22.1-87-gfee05a13.
The primary motivation is to fix an assertion failure in
isl_basic_map_alloc_equality:

    isl_assert(ctx, room_for_con(bmap, 1), return -1);

Although the assertion does not occur anymore, I could not identify
which of ISL's commits fixed it.

Compared to the previous ISL version, Polly requires some changes for this update

 * Since ISL commit
   20d3574 "perform parameter alignment by modifying both arguments to function"
   isl_*_gist_* and similar functions do not always align the paramter
   list anymore. This caused the parameter lists in JScop files to
   become out-of-sync. Since many regression tests use JScop files with
   a fixed parameter list and order, we explicitly call align_params to
   ensure a predictable parameter list.

 * ISL changed some return types to isl_size, a typedef of (signed) int.
   This caused some issues where the return type was unsigned int before:
   - No overload for std::max(unsigned,isl_size)
   - It cause additional 'mixed signed/unsigned comparison' warnings.
     Since they do not break compilation, and sizes larger than 2^31
     were never supported, I am going to fix it separately.

 * With the change to isl_size, commit
   57d547 "isl_*_list_size: return isl_size"
   also changed the return value in case of an error from 0 to -1. This
   caused undefined looping over isl_iterator since the 'end iterator'
   got index -1, never reached from the 'begin iterator' with index 0.

 * Some internal changes in ISL caused the number of operations to
   increase when determining access ranges to determine aliasing
   overlaps. In one test, this caused exceeding the default limit of
   800000. The operations-limit was disabled for this test.
2020-02-10 19:03:08 -06:00
Eli Friedman 2f6b9edfa8 [AliasAnalysis] Add missing FMRB_* enums.
Previously, the enums didn't account for all the possible cases, which
could cause misleading results (particularly for a "switch" on
FunctionModRefBehavior).

Fixes regression in polly from recent patch to add writeonly to memset.

While I'm here, also fix a few dubious uses of the FMRB_* enum values.

Differential Revision: https://reviews.llvm.org/D73154
2020-01-28 15:47:08 -08:00
Eli Friedman d9e6196312 [polly] XFAIL memset_null.ll.
I'm working on a patch, but not sure how long it'll take.
2020-01-21 17:29:44 -08:00
serge_sans_paille 24ab9b537e Generalize the pass registration mechanism used by Polly to any third-party tool
There's quite a lot of references to Polly in the LLVM CMake codebase. However
the registration pattern used by Polly could be useful to other external
projects: thanks to that mechanism it would be possible to develop LLVM
extension without touching the LLVM code base.

This patch has two effects:

1. Remove all code specific to Polly in the llvm/clang codebase, replaicing it
   with a generic mechanism

2. Provide a generic mechanism to register compiler extensions.

A compiler extension is similar to a pass plugin, with the notable difference
that the compiler extension can be configured to be built dynamically (like
plugins) or statically (like regular passes).

As a result, people willing to add extra passes to clang/opt can do it using a
separate code repo, but still have their pass be linked in clang/opt as built-in
passes.

Differential Revision: https://reviews.llvm.org/D61446
2020-01-02 16:45:31 +01:00
Fangrui Song a36ddf0aa9 Migrate function attribute "no-frame-pointer-elim"="false" to "frame-pointer"="none" as cleanups after D56351 2019-12-24 16:27:51 -08:00
Fangrui Song 502a77f125 Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" as cleanups after D56351 2019-12-24 15:57:33 -08:00
Michael Kruse 7be6ec5fa2 [GPGPU] Fix regression test after 395124.
Commit 395124 "NVPTX: Don't insert an extra empty line at the end of the last section"
changed the length of the kernel payload. Update the regression test to the new binary size.
2019-11-13 06:20:17 +00:00
Michael Kruse d72637f5cc [ScopBuilder] Fix bug 38358 by preserving correct order of ScopStmts.
ScopBuilder::buildEqivClassBlockStmts creates ScopStmts for instruction
groups in basic block and inserts these ScopStmts into Scop::StmtMap,
however, as described in llvm.org/PR38358, comment , StmtScops are
inserted into vector ScopStmt[BB] in wrong order.  As a result,
ScopBuilder::buildSchedule creates wrong order sequence node.

Looking closer to code, it's clear there is no equivalent classes with
interleaving isOrderedInstruction(memory access) instructions after
joinOrderedInstructions.  Afterwards, ScopStmts need to be created and
inserted in the original order of memory access instructions, however,
at the moment ScopStmts are inserted in the order of leader instructions
which are probably not memory access instructions.

The fix is simple with a standalone loop scanning
isOrderedInstruction(memory access) instructions in basic block and
inserting elements into LeaderToInstList one by one.  The patch also
removes double reversing operations which are now unnecessary.

New test preserve-equiv-class-order-in-basic_block.ll is also added.

Differential Revision: https://reviews.llvm.org/D68941

llvm-svn: 375192
2019-10-17 23:55:35 +00:00
Tim Northover 1249126c7c Revert "Update polly test for SCEV change."
The motivating SCEV change was reverted as incorrect.

llvm-svn: 373185
2019-09-30 07:47:08 +00:00
Michael Kruse 241b02e762 [CodeGen] Handle outlining of CopyStmts.
Since the removal of extensions nodes from schedule trees in r362257 it
is possible to emit parallel code for SCoPs containing
matrix-multiplications. However, the code looking for references used in
outlined statement was not prepared to handle CopyStmts introduced by
the matrix-matrix multiplication detection.

In this case, CopyStmts do not introduce references in addition to the
ones captured by MemoryAccesses, i.e. we change the assertion to accept
CopyStmts and add a regression test for this case.

This fixes llvm.org/PR43164

llvm-svn: 372188
2019-09-17 22:59:43 +00:00
Michael Kruse 87baae85cd [ScopBuilder] Skip getting leader when merging statements to close holes.
Function joinOrderedInstructions merges instructions when a leader is encountered twice.
It also notices that leaders in SeenLeaders may lose their leadership in previous merging,
and tries to handle the case using following code:

    Instruction *PrevLeader = UnionFind.getLeaderValue(SeenLeaders.back());

However, this is wrong because it always gets leader for the last element of SeenLeaders,
and I believe it's wrong even we get leader for Prev here.  As a result, Statements in cases
like the one in patch aren't merged as expected.  After investigation, I believe it's
unnecessary to get leader instruction at all.  This is based on fact: Although leaders in
SeenLeaders could lose leadership, they only lose to others in SeenLeaders, in other words,
one existing leader will be chosen as new leader of merged equivalent statements.  We can
take advantage of this and simply check if current leader equals to Prev and break merging
if it does.

The patch also adds a new test.

Patch by bin.narwal <bin.narwal@gmail.com>

Differential Revision: https://reviews.llvm.org/D67007

llvm-svn: 371801
2019-09-13 01:04:38 +00:00
Michael Kruse cc0f0582c8 [Polly-ACC] Fix test after IR-printer change.
After r367755, even unnamed parameters are printed in IR dumps. Change
the test to expect te additional %0 in the line.

llvm-svn: 368763
2019-08-13 22:42:08 +00:00
Eli Friedman c68dd359ae Update polly test for SCEV change.
r366419 adds nsw to more SCEV expressions, which allows polly to
make more aggressive assumptions about the input expressions.

llvm-svn: 366510
2019-07-18 22:35:45 +00:00
Michael Kruse 88afd75300 [test] Add wrap flags after D61934.
https://reviews.llvm.org/D61934, committed as r362687, r363540, r363364
and r363147, made some emitted instruction nus/nsw. Add these falgs to
Polly's regression tests.

This should fix
    Polly :: Isl/CodeGen/partial_write_in_region_with_loop.ll
    Polly :: Isl/CodeGen/scev_expansion_in_nonaffine.ll

llvm-svn: 363599
2019-06-17 19:17:07 +00:00
Michael Kruse aa8a976174 [ScheduleOptimizer] Hoist extension nodes after schedule optimization.
Extension nodes make schedule trees are less flexible: Many operations,
such as rescheduling, do not work on such schedule trees with extension.
As such, some functionality such as determining parallel loops in isl's
AST are disabled.

Currently, only the pattern-matching generalized matrix-matrix
multiplication optimization adds extension nodes (to add copy-in
statements).

This patch removes all extension nodes as the last step of the schedule
optimization by hoisting the extension node's added domain up to the
root domain node. All following passes can assume that schedule trees
work without restrictions, including the parallelism test. Mark the
outermost loop of the optimized matrix-matrix multiplication as parallel
such that -polly-parallel is able to parallelize that loop.

Differential Revision: https://reviews.llvm.org/D58202

llvm-svn: 362257
2019-05-31 19:26:57 +00:00