This reverts commit 329aeb5db4,
and relands commit 61f006ac65.
This is a continuation of D89456.
As it was suggested there, now that SCEV models `PtrToInt`,
we can try to improve SCEV's pointer handling.
In particular, i believe, i will need this in the future
to further fix `SCEVAddExpr`operation type handling.
This removes special handling of `ConstantPointerNull`
from `ScalarEvolution::createSCEV()`, and add constant folding
into `ScalarEvolution::getPtrToIntExpr()`.
This way, `null` constants stay as such in SCEV's,
but gracefully become zero integers when asked.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D98147
This is a continuation of D89456.
As it was suggested there, now that SCEV models `PtrToInt`,
we can try to improve SCEV's pointer handling.
In particular, i believe, i will need this in the future
to further fix `SCEVAddExpr`operation type handling.
This removes special handling of `ConstantPointerNull`
from `ScalarEvolution::createSCEV()`, and add constant folding
into `ScalarEvolution::getPtrToIntExpr()`.
This way, `null` constants stay as such in SCEV's,
but gracefully become zero integers when asked.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D98147
Emit llvm.loop.parallel_accesses metadata instead of
llvm.mem.parallel_loop_access. The latter is deprecated because it
assumes that LoopIDs are persistent, which they are not.
We also emit parallel access metadata for all surrounding parallel
loops, not just the innermost parallel.
DetectionContext objects are stored as values in a DenseMap. When the
DenseMap reaches its maximum load factor, it is resized and all its
objects moved to a new memory allocation. Unfortunately Scop object have
a reference to its DetectionContext. When the DenseMap resizes, all the
DetectionContexts reference now point to invalid memory, even if caused
by an unrelated DetectionContext.
Even worse, NewPM's ScopPassManager called isMaxRegionInScop with the
Verify=true parameter before each pass. This caused the old
DetectionContext to be removed an a new on created and re-verified.
Of course, the Scop object was already created pointing to the old
DetectionContext. Because the new DetectionContext would
usually be stored at the same position in the DenseMap, the reference
would usually reference the new DetectionContext of the same Region.
Usually.
If not, the old position still points to memory in the DenseMap
allocation (unless also a resizing occurs) such that tools like Valgrind
and AddressSanitizer would not be able to diagnose this.
Instead of storing the DetectionContext inside the DenseMap, use a
std::unique_ptr to a DetectionContext allocation, i.e. it will not move
around anymore. This also allows use to remove the very strange
DetectionContext(const DetectionContext &&)
copy/move(?) constructor. DetectionContext objects now are neither
copied nor moved.
As a result, every re-verification of a DetectionContext will use a new
allocation. Therefore, once a Scop object has been created using a
DetectionContext, it must not be re-verified (the Scop data structure
requires its underlying Region to not change before code generation
anyway). The NewPM may call isMaxRegionInScop only with
Validate=false parameter.
ZoneAlgorithms's computePHI relies on being provided with consistent a
schedule to compute the statement prodecessors of a statement containing
PHINodes. Otherwise unexpected results such as PHI nodes with multiple
predecessors can occur which would result in problems in the
algorithms expecting consistent data.
In the added test case, statement instances are scrubbed from the
SCoP their execution would result in undefined behavior (Due to a nsw
overflow). As already being undefined behavior in LLVM-IR, neither
AssumedContext nor InvalidContext are updated, giving computePHI no
means to avoid these cases.
Intoduce a new SCoP property, the DefinedBehaviorContext, that among
the runtime-checked conditions, also tracks the assumptions not needing
a runtime check, in particular those affecting the assumed control flow.
This replaces the manual combination of the 3 other contexts that was
already done in computePHI and setNewAccessRelation. Currently, the only
additional assumption is that loop induction variables will nsw flag for
not wrap, but potentially more can be added. Use in
hasFeasibleRuntimeContext, isl::ast_build and gisting are other
potential uses.
To limit computational complexity, the DefinedBehaviorContext is not
availabe if it grows too large (atm hardcoded to 8 disjuncts).
Possible other fixes include bailing out in computePHI when
inconsistencies are detected, choose an arbitrary value for inconsistent
cases (since it is undefined behavior anyways), or make the code
receiving the result from ComputePHI handle inconsistent data. All of
them reduce the quality of implementation having to bail out more often
and disabling the ability to assert on actually wrong results.
This fixes llvm.org/PR48783.
This fixes llvm.org/PR48554
Some test cases had to be updated because the hash function for
union_maps have been changed which affects the output order.
This patch updates IRBuilder to create insertelement/shufflevector using poison as a placeholder.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D93793
ScalarEvolution::getSCEV cannot be used during codegen. ScalarEvolution
assumes a stable IR and control flow which is under construction during
Polly's CodeGen. In particular, it uses DominatorTree for compute the
backedge taken count. However the DominatorTree is not updated during
codegen.
In this case, SCEV was used to determine the base pointer of an array
access. Replace it by our own function. Polly generates only GEP and
BitCasts for array acceses, i.e. it is sufficient to handle these to to
find the base pointer.
Fixes llvm.org/PR48422
And use it to model LLVM IR's `ptrtoint` cast.
This is essentially an alternative to D88806, but with no chance for
all the problems it caused due to having the cast as implicit there.
(see rG7ee6c402474a2f5fd21c403e7529f97f6362fdb3)
As we've established by now, there are at least two reasons why we want this:
* It will allow SCEV to actually model the `ptrtoint` casts
and their operands, instead of treating them as `SCEVUnknown`
* It should help with initial problem of PR46786 - this should eventually allow us
to not loose pointer-ness of an expression in more cases
As discussed in [[ https://bugs.llvm.org/show_bug.cgi?id=46786 | PR46786 ]], in principle,
we could just extend `SCEVUnknown` with a `is ptrtoint` cast, because `ScalarEvolution::getPtrToIntExpr()`
should sink the cast as far down into the expression as possible,
so in the end we should always end up with `SCEVPtrToIntExpr` of `SCEVUnknown`.
But i think that it isn't the best solution, because it doesn't really matter
from memory consumption side - there probably won't be *that* many `SCEVPtrToIntExpr`s
for it to matter, and it allows for much better discoverability.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D89456
While we haven't encountered an earth-shattering problem with this yet,
by now it is pretty evident that trying to model the ptr->int cast
implicitly leads to having to update every single place that assumed
no such cast could be needed. That is of course the wrong approach.
Let's back this out, and re-attempt with some another approach,
possibly one originally suggested by Eli Friedman in
https://bugs.llvm.org/show_bug.cgi?id=46786#c20
which should hopefully spare us this pain and more.
This reverts commits 1fb6104293,
7324616660,
aaafe350bb,
e92a8e0c74.
I've kept&improved the tests though.
This relands commit 1c021c64ca which was
reverted in commit 17cec6a11a because
an assertion was being triggered, since `BuildConstantFromSCEV()`
wasn't updated to handle the case where the constant we want to truncate
is actually a pointer. I was unsuccessful in coming up with a test case
where we'd end there with constant zext/sext of a pointer,
so i didn't handle those cases there until there is a test case.
Original commit message:
While we indeed can't treat them as no-ops, i believe we can/should
do better than just modelling them as `unknown`. `inttoptr` story
is complicated, but for `ptrtoint`, it seems straight-forward
to model it just as a zext-or-trunc of unknown.
This may be important now that we track towards
making inttoptr/ptrtoint casts not no-op,
and towards preventing folding them into loads/etc
(see D88979/D88789/D88788)
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D88806
> While we indeed can't treat them as no-ops, i believe we can/should
> do better than just modelling them as `unknown`. `inttoptr` story
> is complicated, but for `ptrtoint`, it seems straight-forward
> to model it just as a zext-or-trunc of unknown.
>
> This may be important now that we track towards
> making inttoptr/ptrtoint casts not no-op,
> and towards preventing folding them into loads/etc
> (see D88979/D88789/D88788)
>
> Reviewed By: mkazantsev
>
> Differential Revision: https://reviews.llvm.org/D88806
It caused the following assert during Chromium builds:
llvm/lib/IR/Constants.cpp:1868:
static llvm::Constant *llvm::ConstantExpr::getTrunc(llvm::Constant *, llvm::Type *, bool):
Assertion `C->getType()->isIntOrIntVectorTy() && "Trunc operand must be integer"' failed.
See code review for a link to a reproducer.
This reverts commit 1c021c64ca.
While we indeed can't treat them as no-ops, i believe we can/should
do better than just modelling them as `unknown`. `inttoptr` story
is complicated, but for `ptrtoint`, it seems straight-forward
to model it just as a zext-or-trunc of unknown.
This may be important now that we track towards
making inttoptr/ptrtoint casts not no-op,
and towards preventing folding them into loads/etc
(see D88979/D88789/D88788)
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D88806
The test failed since commit
bc10888dc "DomTree: Make PostDomTree indifferent to block successors swap"
which is a re-commit of
c35585e20 "DomTree: Make PostDomTree immune to block successors swap"
This will allow us to use the datalayout to disambiguate other
constructs in IR, like load alignment. Split off from D78403.
Differential Revision: https://reviews.llvm.org/D78413
This is equivalent in terms of LLVM IR semantics, but we want to
transition away from using MaybeAlign to represent the alignment of
these instructions.
Differential Revision: https://reviews.llvm.org/D77984
The option is passed as argv to ISL's command line option parser.
Polly's own own command line options take precedence over options passed
as `-polly-isl-arg`. For instance,
`-polly-isl-arg=--schedule-outer-coincidence` will be ignored in favor
of `-polly-opt-outer-coincidence`.
Reviewed By: grosser
Differential Revision: https://reviews.llvm.org/D77303
This reverts commit 80a34ae311 with fixes.
Previously, since bots turning on EXPENSIVE_CHECKS are essentially turning on
MachineVerifierPass by default on X86 and the fact that
inline-asm-avx-v-constraint-32bit.ll and inline-asm-avx512vl-v-constraint-32bit.ll
are not expected to generate functioning machine code, this would go
down to `report_fatal_error` in MachineVerifierPass. Here passing
`-verify-machineinstrs=0` to make the intent explicit.
This reverts commit 80a34ae311 with fixes.
On bots llvm-clang-x86_64-expensive-checks-ubuntu and
llvm-clang-x86_64-expensive-checks-debian only,
llc returns 0 for these two tests unexpectedly. I tweaked the RUN line a little
bit in the hope that LIT is the culprit since this change is not in the
codepath these tests are testing.
llvm\test\CodeGen\X86\inline-asm-avx-v-constraint-32bit.ll
llvm\test\CodeGen\X86\inline-asm-avx512vl-v-constraint-32bit.ll
Static chunked OpenMP scheduling has not been treated correctly.
This patch fixes the problem that threads would not process their
(work-)chunks as intended.
Differential Revision: https://reviews.llvm.org/D61081
Since the removal of extensions nodes from schedule trees in r362257 it
is possible to emit parallel code for SCoPs containing
matrix-multiplications. However, the code looking for references used in
outlined statement was not prepared to handle CopyStmts introduced by
the matrix-matrix multiplication detection.
In this case, CopyStmts do not introduce references in addition to the
ones captured by MemoryAccesses, i.e. we change the assertion to accept
CopyStmts and add a regression test for this case.
This fixes llvm.org/PR43164
llvm-svn: 372188
https://reviews.llvm.org/D61934, committed as r362687, r363540, r363364
and r363147, made some emitted instruction nus/nsw. Add these falgs to
Polly's regression tests.
This should fix
Polly :: Isl/CodeGen/partial_write_in_region_with_loop.ll
Polly :: Isl/CodeGen/scev_expansion_in_nonaffine.ll
llvm-svn: 363599
At the end of a region statement, the PHINode must be generated
while the current IRBuilder's block is the region's exit node. For
obvious reasons: The PHINode references the region's exiting block.
A partial write would insert new control flow, i.e. insert new basic
blocks between the exiting blocks and the current block.
We fix this by generating the PHI nodes (region exit values) before
generating any MemoryAccess's stores.
This should fix the AOSP buildbot.
Reported-by: Eli Friedman <efriedma@quicinc.com>
llvm-svn: 361204
The ParallelLoopGenerator class is changed such that GNU OpenMP specific
code was removed, allowing to use it as super class in a
template-pattern. Therefore, the code has been reorganized and one may
not use the ParallelLoopGenerator directly anymore, instead specific
implementations have to be provided. These implementations contain the
library-specific code. As such, the "GOMP" (code completely taken from
the existing backend) and "KMP" variant were created.
For "check-polly" all tests that involved "GOMP": equivalents were added
that test the new functionalities, like static scheduling and different
chunk sizes. "docs/UsingPollyWithClang.rst" shows how the alternative
backend may be used.
Patch by Michael Halkenhäuser <michaelhalk@web.de>
Differential Revision: https://reviews.llvm.org/D59100
llvm-svn: 356434
compiler identification lines in test-cases.
(Doing so only because it's then easier to search for references which
are actually important and need fixing.)
llvm-svn: 351200
Summary:
Update all rdtscp callsites in PerfMonitor so that they conform with the signature changes introduced in r341698.
Reviewers: grosser, bollu
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D51928
llvm-svn: 341946
Summary:
This initiates a discussion on changing Polly accordingly while re-applying r335197 (D48338).
I have never worked on Polly. The proposed change to param_div_div_div_2.ll is not educated, but just patterns that match the output.
All LLVM files are already reviewed in D48338.
Reviewers: jdoerfert, bollu, efriedma
Subscribers: jlebar, sanjoy, hiraditya, llvm-commits, bixia
Differential Revision: https://reviews.llvm.org/D48453
llvm-svn: 335292
Add the options -polly-codegen-trace-stmts and
-polly-codegen-trace-scalars. When enabled, adds a call to the
beginning of every generated statement that prints the executed
statement instance. With -polly-codegen-trace-scalars, it also prints
the value of all scalars that are used in the statement, and PHIs
defined in the beginning of the statement.
Differential Revision: https://reviews.llvm.org/D45743
llvm-svn: 330864
A check in assert-builds was meant to verify that a load provides a
value in all statement instances (i.e. its domain). The domain is
commonly gist'ed within the parameter context to contain fewer
constraints. However, statement instances outside the context are
no valid executions, hence the value provided can be undefined.
Refine the check for valid loads to only needed to be defined within
the SCoP context.
In addition, the JSONImporter had to be changed to allow importing
access relations that are broader than the current access relation,
but still defined over all statement instances.
This should fix the compiler crash in test-suite's oggenc of the
-polly-process-unprofitable buildbot.
llvm-svn: 329655
Summary:
When checking the parallelism of a scheduling dimension, we first check if excluding reduction dependences the loop is parallel or not.
If the loop is not parallel, then we need to return the minimal dependence distance of all data dependences, including the previously subtracted reduction dependences.
Reviewers: grosser, Meinersbur, efriedma, eli.friedman, jdoerfert, bollu
Reviewed By: Meinersbur
Subscribers: llvm-commits, pollydev
Tags: #polly
Differential Revision: https://reviews.llvm.org/D45236
llvm-svn: 329214
Splitting basic blocks into multiple statements if there are now
additional scalar dependencies gives more freedom to the scheduler, but
more statements also means higher compile-time complexity. Switch to
finer statement granularity, the additional compile time should be
limited by the number of operations quota.
The regression tests are written for the -polly-stmt-granularity=bb
setting, therefore we add that flag to those tests that break with the
new default. Some of the tests only fail because the statements are
named differently due to a basic block resulting in multiple statements,
but which are removed during simplification of statements without
side-effects. Previous commits tried to reduce this effect, but it is
not completely avoidable.
Differential Revision: https://reviews.llvm.org/D42151
llvm-svn: 324169
Memory transfer instructions take two pointers. It is not defined to
which of those a noalias annotation applies. To ensure correctness,
do not add noalias annotations to memcpy/memmove instructions anymore.
The caused a miscompile with test-suite's MultiSource/Applications/obsequi.
Since r321138, the MemCpyOpt pass would remove memcpy/memmove calls if
known to copy uninitialized memory. In that case, it was initialized
by another memcpy, but the annotation for the target pointer said
it would not alias. The annotation was actually meant for the source
pointer, which was was an alloca and could not alias with the target
pointer.
llvm-svn: 321371
Isl does not allow generating isl_ast_expr from an isl_pw_aff that has an
empty domain (i.e. has no pieces). We already detected the case if the
isl_pw_aff comes with an empty domain.
isl_ast_build also considers the domain empty if it is disjoint with the
parameter context (e.g. parameters values that we exclude by runtime
versioning).
Intersect the access relation domain with the parameter context to
also detect such practically empty access domains. The effective
pointer used in the generated code is unimportand because it will never
be executed.
This fixes llvm.org/PR35362
llvm-svn: 318806
When collecting base pointers that need to be made available in parallel
subfunctions, use the base pointer associated with the latest
ScopArrayInfo, instead of the original one.
llvm-svn: 316983
After rL315683 (improve SCEV to calculate max BETakenCount when end
bound of loop is variant and loop is of form {Start,+1, Stride} LT End)
this test in polly started failing.
However, as discussed in https://reviews.llvm.org/rL315683,
this polly test is not a loops bound test and the MaxBECount calculated by
SCEV looks correct. The max BECount is the value calculated even when the end
bound of loop is invariant.
As discussed with Tobias offline, I'm marking this as an XFAIL, until he
gets a chance to update the testcase, so the build bot goes to green.
llvm-svn: 315912
This test XFAILs two test that start to fail when verifying DT's
DFS numbers, as per Tobias' suggestion.
Related VerifyDFSNumbers patch: D38331.
llvm-svn: 314800
In case a PHI node follows an error block we can assume that the incoming value
can only come from the node that is not an error block. As a result, conditions
that seemed non-affine before are now in fact affine.
This is a recommit of r312663 after fixing
test/Isl/CodeGen/phi_after_error_block_outside_of_scop.ll
llvm-svn: 314075
Such RTCs may introduce integer wrapping intrinsics with more than 64 bit,
which are translated to library calls on AOSP that are not part of the
runtime and will consequently cause linker errors.
Thanks to Eli Friedman for reporting this issue and reducing the test case.
llvm-svn: 314065