LLVM's coding guideline suggests to not use @brief for one-sentence doxygen
comments to improve readability. Switch this once and for all to ensure people
do not copy @brief comments from other parts of Polly, when writing new code.
llvm-svn: 280468
Change the code around setNewAccessRelation to allow to use a an existing array
element for memory instead of an ad-hoc alloca. This facility will be used for
DeLICM/DeGVN to convert scalar dependencies into regular ones.
The changes necessary include:
- Make the code generator use the implicit locations instead of the alloca ones.
- A test case
- Make the JScop importer accept changes of scalar accesses for that test case.
- Adapt the MemoryAccess interface to the fact that the MemoryKind can change.
They are named (get|is)OriginalXXX() to get the status of the memory access
before any change by setNewAccessRelation() (some properties such as
getIncoming() do not change even if the kind is changed and are still
required). To get the modified properties, there is (get|is)LatestXXX(). The
old accessors without Original|Latest become synonyms of the
(get|is)OriginalXXX() to not make functional changes in unrelated code.
Differential Revision: https://reviews.llvm.org/D23962
llvm-svn: 280408
We already invalidated a couple of critical values earlier on, but we now
invalidate all instructions contained in a scop after the scop has been code
generated. This is necessary as later scops may otherwise obtain SCEV
expressions that reference values in the earlier scop that before dominated
the later scop, but which had been moved into the conditional branch and
consequently do not dominate the later scop any more. If these very values are
then used during code generation of the later scop, we generate used that are
dominated by the values they use.
This fixes: http://llvm.org/PR28984
llvm-svn: 279047
To do so we change the way array exents are computed. Instead of the precise
set of memory locations accessed, we now compute the extent as the range between
minimal and maximal address in the first dimension and the full extent defined
by the sizes of the inner array dimensions.
We also move the computation of the may_persist region after the construction
of the arrays, as it relies on array information. Without arrays being
constructed no useful information is computed at all.
llvm-svn: 278212
Ensure the right scalar allocations are used as the host location of data
transfers. For the device code, we clear the allocation cache before device
code generation to be able to generate new device-specific allocation and
we need to make sure to add back the old host allocations as soon as the
device code generation is finished.
llvm-svn: 278126
This increases the readability of the IR and also clarifies that the GPU
inititialization is executed _after_ the scalar initialization which needs
to before the code of the transformed scop is executed.
Besides increased readability, the IR should not change. Specifically, I
do not expect any changes in program semantics due to this patch.
llvm-svn: 278125
In case some code -- not guarded by control flow -- would be emitted directly in
the start block, it may happen that this code would use uninitalized scalar
values if the scalar initialization is only emitted at the end of the start
block. This is not a problem today in normal Polly, as all statements are
emitted in their own basic blocks, but Polly-ACC emits host-to-device copy
statements into the start block.
Additional Polly-ACC test coverage will be added in subsequent changes that
improve the handling of PHI nodes in Polly-ACC.
llvm-svn: 278124
After having generated the code for a ScopStmt, we run a simple dead-code
elimination that drops all instructions that are known to be and remain unused.
Until this change, we only considered instructions for dead-code elimination, if
they have a corresponding instruction in the original BB that belongs to
ScopStmt. However, when generating code we do not only copy code from the BB
belonging to a ScopStmt, but also generate code for operands referenced from BB.
After this change, we now also considers code for dead code elimination, which
does not have a corresponding instruction in BB.
This fixes a bug in Polly-ACC where such dead-code referenced CPU code from
within a GPU kernel, which is possible as we do not guarantee that all variables
that are used in known-dead-code are moved to the GPU.
llvm-svn: 278103
When adding code that avoids to pass values used in isl expressions and
LLVM instructions twice, we forgot to make single variable passed to the
kernel available in the ValueMap that makes it usable for instructions that
are not replaced with isl ast expressions. This change adds the variable
that is passed to the kernel to the ValueMap to ensure it is available
for such use cases as well.
llvm-svn: 278039
There is no need to reset the position of the builder, as we can just continue
to insert code at the current position of the IRBuilder, which happens to
be precisely the location we reset the builder to.
llvm-svn: 278014
... instead of adding instructions at the end of the basic block the builder
is currently at. This makes it easier to reason about where IR is generated,
as with the IRBuilder there is just a single location that specificies where
IR is generated.
llvm-svn: 278013
The map is iterated over when generating the values escaping the SCoP. The
indeterministic iteration order of DenseMap causes the output IR to change at
every compilation, adding noise to comparisons.
Replace DenseMap by a MapVector to ensure the same iteration order at every
compilation.
llvm-svn: 277832
Before this commit we generated the array type in reverse order and we also
added the outermost dimension size to the new array declaration, which is
incorrect as Polly additionally assumed an additional unsized outermost
dimension, such that we had an off-by-one error in the linearization of access
expressions.
llvm-svn: 277802
These annotations ensure that the NVIDIA PTX assembler limits the number of
registers used such that we can be certain the resulting kernel can be executed
for the number of threads in a thread block that we are planning to use.
llvm-svn: 277799
Pass the content of scalar array references to the alloca on the kernel side
and do not pass them additional as normal LLVM scalar value.
llvm-svn: 277699
Otherwise, we would try to re-optimize them with Polly-ACC and possibly even
generate kernels that try to offload themselves, which does not work as the
GPURuntime is not available on the accelerator and also does not make any
sense.
llvm-svn: 277589
Extend the jscop interface to allow the user to export arrays. It is required
that already existing arrays of the list of arrays correspond to arrays
of the SCoP. Each array that is appended to the list will be newly created.
Furthermore, we allow the user to modify access expressions to reference
any array in case it has the same element type.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D22828
llvm-svn: 277263
Before this change we used the array index, which would result in us accessing
the parameter array out-of-bounds. This bug was visible for test cases where not
all arrays in a scop are passed to a given kernel.
llvm-svn: 276961
Also factor out getArraySize() to avoid code dupliciation and reorder some
function arguments to indicate the direction into which data is transferred.
llvm-svn: 276636
At the beginning of each SCoP, we allocate device arrays for all arrays
used on the GPU and we free such arrays after the SCoP has been executed.
llvm-svn: 276635
There is no need to expose the selected device at the moment. We also pass back
pointers as return values, as this simplifies the interface.
llvm-svn: 276623
This allows the finalization routine of the IslNodeBuilder to be overwritten
by derived classes. Being here, we also drop the unnecessary 'Scop' postfix
and the unnecessary 'Scop' parameter.
llvm-svn: 276622
We optimize the kernel _after_ dumping the IR we generate to make the IR we
dump easier readable and independent of possible changes in the general
purpose LLVM optimizers.
llvm-svn: 276551
Run the NVPTX backend over the GPUModule IR and write the resulting assembly
code in a string.
To work correctly, it is important to invalidate analysis results that still
reference the IR in the kernel module. Hence, this change clears all references
to dominators, loop info, and scalar evolution.
Finally, the NVPTX backend has troubles to generate code for various special
floating point types (not surprising), but also for uncommon integer types. This
commit does not resolve these issues, but pulls out problematic test cases into
separate files to XFAIL them individually and resolve them in future (not
immediate) changes one by one.
llvm-svn: 276396
This change introduces the actual compute code in the GPU kernels. To ensure
all values referenced from the statements in the GPU kernel are indeed available
we scan all ScopStmts in the GPU kernel for references to llvm::Values that
are not yet covered by already modeled outer loop iterators, parameters, or
array base pointers and also pass these additional llvm::Values to the
GPU kernel.
For arrays used in the GPU kernel we introduce a new ScopArrayInfo object, which
is referenced by the newly generated access functions within the GPU kernel and
which is used to help with code generation.
llvm-svn: 276270
This is useful for external users using IslExprBuilder, in case they cannot
embed ScopArrayInfo data into their isl_ids, because the isl_ids either already
carry other information or the isl_ids have been created and their user pointers
cannot be updated any more.
llvm-svn: 276268
This ensures that no trivially dead code is generated. This is not only cleaner,
but also avoids troubles in case code is generated in a separate function and
some of this dead code contains references to values that are not available.
This issue may happen, in case the memory access functions have been updated
and old getelementptr instructions remain in the code. With normal Polly,
a test case is difficult to draft, but the upcoming GPU code generation can
possibly trigger such problems. We will later extend this dead-code elimination
to region and vector statements.
llvm-svn: 276263
This is currently not supported and will only be added later. Also update the
test cases to ensure no invariant code hoisting is applied.
llvm-svn: 275987
We use this opportunity to further classify the different user statements that
can arise and add TODOs for the ones not yet implemented.
llvm-svn: 275957
Create for each kernel a separate LLVM-IR module containing a single function
marked as kernel function and taking one pointer for each array referenced
by this kernel. Add debugging output to verify the kernels are generated
correctly.
llvm-svn: 275952
Initialize the list of references to a GPU array to ensure that the arrays that
need to be passed to kernel calls are computed correctly. Furthermore, the very
same information is also necessary to compute synchronization correctly. As the
functionality to compute these references is already available, what is left for
us to do is only to connect the necessary functionality to compute array
reference information.
llvm-svn: 275798
Create LLVM-IR for all host-side control flow of a given GPU AST. We implement
this by introducing a new GPUNodeBuilder class derived from IslNodeBuilder. The
IslNodeBuilder will take care of generating all general-purpose ast nodes, but
we provide our own createUser implementation to handle the different GPU
specific user statements. For now, we just skip any user statement and only
generate a host-code sceleton, but in subsequent commits we will add handling of
normal ScopStmt's performing computations, kernel calls, as well as host-device
data transfers. We will also introduce run-time check generation and LICM in
subsequent commits.
llvm-svn: 275783
Otherwise ppcg would try to call into pet functionality that this not available,
which obviously will cause trouble. As we can easily print these statements
ourselves, we just do so.
llvm-svn: 275579
This option increases the scalability of the scheduler and allows us to remove
the 'gisting' workaround we introduced in r275565 to handle a more complicated
test case. Another benefit of using this option is also that the generated
code looks a lot more streamlined.
Thanks to Sven Verdoolaege for reminding me of this option.
llvm-svn: 275573
This works around a shortcoming of the isl scheduler, which even for some
smaller test cases does not terminate in case domain constraints are part
of the flow dependences.
llvm-svn: 275565
It seems we forgot to actually add the memory access ids to the tagged accesses,
but instead just tagged the accesses with empty isl_ids. This issue was found
by inspection and without code generation it is difficult to test just by
itself. We fix it for now without test case and expect our code generation
tests to cover this later on.
llvm-svn: 275557
Instead of directly linking to ppcg's main source directory, we link to the
parent director. This allows us to access ppcg's include files with
'ppcg/cuda.h' and avoids a conflict with NVIDIA's cuda.h header.
Also drop an include directory that is currently not used.
llvm-svn: 275536
For this we need to provide an explicit list of statements as they occur in
the polly::Scop to ppcg.
We also setup basic AST printing facilities to facilitate debugging. To allow
code reuse some (minor) changes in ppcg are have been necessary.
llvm-svn: 275436
Instead of calling to a pet function that does not return anything, we pass
our own dummy implementation to ppcg that always returns a nullptr. This
ensures that the list of ast expressions always contains a nullptr and we do
not accidentally free a random (uninitalized) pointer. This resolves the
last valgrind warning we see.
We provide an implementation for this function, when the generated AST
expressions can be used and consequently can be tested.
llvm-svn: 275435
The tile size was previously uninitialized. As a result, it was often zero (aka.
no tiling), which is not what we want in general. More importantly, there was
the risk for arbitrary tile sizes to be choosen, which we did not observe, but
which still is highly problematic.
llvm-svn: 275418
This change now applies ppcg's GPU mapping on our initial schedule. For this
to work, we need to also initialize the set of all names (isl_ids) used in
the scop as well as the program context.
llvm-svn: 275396
To do so we copy the necessary information to compute an initial schedule from
polly::Scop to ppcg's scop. Most of the necessary information is directly
available and only needs to be passed on to ppcg, with the exception of 'tagged'
access relations, access relations that additionally carry information about
which memory access an access relation originates from.
We could possibly perform the construction of tagged accesses as part of
ScopInfo, but as this format is currently specific to ppcg we do not do this
yet, but keep this functionality local to our GPU code generation.
After the scop has been initialized, we compute data dependences and ask ppcg to
compute an initial schedule. Some of this functionality is already available in
polly::DependenceInfo and polly::ScheduleOptimizer, but to keep differences
to ppcg small we use ppcg's functionality here. We may later investiage if
a closer integration of these tools makes sense.
llvm-svn: 275390
At this stage, we do not yet modify the IR but just generate a default
initialized ppcg_scop and gpu_prog and free both immediately. Both will later be
filled with data from the polly::Scop and are needed to use PPCG for GPU
schedule generation. This commit does not yet perform any GPU code generation,
but ensures that the basic infrastructure has been put in place.
We also add a simple test case to ensure the new code is run and use this
opportunity to verify that GPU_CODEGEN tests are only run if GPU code generation
has been enabled in cmake.
llvm-svn: 275389
Add a new pass to serve as basis for automatic accelerator mapping in Polly.
The pass structure and the analyses preserved are copied from
CodeGeneration.cpp, as we will rely on IslNodeBuilder and IslExprBuilder for
LLVM-IR code generation.
Polly's accelerator code generation is enabled with -polly-target=gpu
I would like to use this commit as opportunity to thank Yabin Hu for his work in
the context of two Google summer of code projects during which he implemented
initial prototypes of the Polly accelerator code generation -- in parts this
code is already available in todays Polly (e.g., tools/GPURuntime). More will
come as part of the upcoming Polly ACC changes.
Reviewers: Meinersbur
Subscribers: pollydev, llvm-commits
Differential Revision: http://reviews.llvm.org/D22036
llvm-svn: 275275
Commit r275056 introduced a gcc compile failure due to us using two
types named 'Type', the first being the newly introduced member variable
'Type' the second being llvm::Type. We resolve this issue by renaming
the newly introduced member variable to AccessType.
llvm-svn: 275057
Summary:
With a struct we can use named accessors instead of generic std::get<3>()
calls. This increases readability of the source code.
Reviewers: jdoerfert
Subscribers: pollydev, llvm-commits
Differential Revision: http://reviews.llvm.org/D21955
llvm-svn: 275056
This is a regular maintenance update to ensure the latest version of isl is
tested.
Interesting Changes:
- AST nodes and expressions are now printed as YAML
llvm-svn: 274614
Since r274197 -polly-position=before-vectorizer caused various LNT failures
for example in SingleSource/Benchmarks/Linpack. These failures seem to only
occur when the CFLAA pass is scheduled in our codegen-cleanup passes, which
suggests that the way we call this AA pass is somehow problematic. As this pass
is not of high importance, we drop the pass for now to prevent these failures
from happening. At a later point, we might investigate more in-depth why this
specific usage scenario caused correctness issues.
llvm-svn: 274427
llvm commonly adds a comment to the closing brace of a namespace to indicate
which namespace is closed. clang-tidy provides with llvm-namespace-comment
a handy tool to check for this habit. We use it to ensure we consitently use
namespace comments in Polly.
There are slightly different styles in how namespaces are closed in LLVM. As
there is no large difference between the different comment styles we go for the
style clang-tidy suggests by default.
To reproduce this fix run:
for i in `ls tools/polly/lib/*/*.cpp`; \
clang-tidy -checks='-*,llvm-namespace-comment' -p build $i -fix \
-header-filter=".*"; \
done
This cleanup was suggested by Eugene Zelenko <eugene.zelenko@gmail.com> in
http://reviews.llvm.org/D21488 and was split out to increase readability.
llvm-svn: 273621
This cleanup was suggested by Eugene Zelenko <eugene.zelenko@gmail.com> in
http://reviews.llvm.org/D21488 and was split out to increase readability.
llvm-svn: 273436
Instead of using 0 or NULL use the C++11 nullptr symbol when referencing null
pointers.
This cleanup was suggested by Eugene Zelenko <eugene.zelenko@gmail.com> in
http://reviews.llvm.org/D21488 and was split out to increase readability.
llvm-svn: 273435
ScalarReplAggregatesPass was deprecated and replaced by SROAPass.
ScalarReplAggregatesPass got finally removed in LLVM commit r272737, hence this
patch is also a compile fix.
llvm-svn: 272783
As part of this simplification we pull complex logic out of the loop body and
skip the previously redundantly executed first loop iteration.
This is a partial recommit of r271514 and r271535 which where in conflict with
the revert in r272483 and consequently also had to be reverted temporarily. The
original patch was contributed by Johannes Doerfert.
This patch is mostly a NFC, but dropping the first loop iteration can sometimes
result in slightly simpler code.
llvm-svn: 272502
The recent expression type changes still need more discussion, which will happen
on phabricator or on the mailing list. The precise list of commits reverted are:
- "Refactor division generation code"
- "[NFC] Generate runtime checks after the SCoP"
- "[FIX] Determine insertion point during SCEV expansion"
- "Look through IntToPtr & PtrToInt instructions"
- "Use minimal types for generated expressions"
- "Temporarily promote values to i64 again"
- "[NFC] Avoid unnecessary comparison for min/max expressions"
- "[Polly] Fix -Wunused-variable warnings (NFC)"
- "[NFC] Simplify min/max expression generation"
- "Simplify the type adjustment in the IslExprBuilder"
Some of them are just reverted as we would otherwise get conflicts. I will try
to re-commit them if possible.
llvm-svn: 272483
This patch refactors the code generation for divisions. This allows to
always generate a shift for a power-of-two division and to utilize
information about constant divisors in order to truncate the result
type.
llvm-svn: 271898
We now generate runtime checks __after__ the SCoP code generation and
not before, though they are still inserted at the same position int
the code. This allows to modify the runtime check during SCoP code
generation.
llvm-svn: 271894
We now use the minimal necessary bit width for the generated code. If
operations might overflow (add/sub/mul) we will try to adjust the types in
order to ensure a non-wrapping computation. If the type adjustment is not
possible, thus the necessary type is bigger than the type value of
--polly-max-expr-bit-width, we will use assumptions to verify the computation
will not wrap. However, for run-time checks we cannot build assumptions but
instead utilize overflow tracking intrinsics.
llvm-svn: 271878
In case of modulo compared to zero, we need to do signed modulo
operation as unsigned can give different results based on whether the
dividend is negative or not.
This addresses llvm.org/PR27707
Contributed-by: Chris Jenneisch <chrisj@codeaurora.org>
Reviewers: _jdoerfert, grosser, Meinersbur
Differential Revision: http://reviews.llvm.org/D20145
llvm-svn: 271707
Operands of binary operations that might overflow will be temporarily
promoted to i64 again, though that is not a sound solution for the problem.
llvm-svn: 271538
We now have a simple function to adjust/unify the types of two (or three)
operands before an operation that requieres the same type for all operands.
Due to this change we will not promote parameters that are added to i64
anymore if that is not needed.
llvm-svn: 271513
Created a new pass ScopInfoRegionPass. As name suggests, it is a
region pass and it is there to preserve compatibility with our
existing Polly passes. ScopInfoRegionPass will return a SCoP object
for a valid region while the creation of the SCoP stays in the
ScopInfo class.
Contributed-by: Utpal Bora <cs14mtech11017@iith.ac.in>
Reviewed-by: Tobias Grosser <tobias@grosser.es>,
Johannes Doerfert <doerfert@cs.uni-saarland.de>
Differential Revision: http://reviews.llvm.org/D20770
llvm-svn: 271259
Summary:
API-wise `apply` is a somewhat unidiomatic one-off function, and
removing the only(?) use in polly will let me remove it from SCEV's
exposed interface.
Reviewers: jdoerfert, Meinersbur, grosser
Subscribers: grosser, mcrosier, pollydev
Differential Revision: http://reviews.llvm.org/D20779
llvm-svn: 271177
We utilize assumptions on the input to model IR in polyhedral world.
To verify these assumptions we version the code and guard it with a
runtime-check (RTC). However, since the RTCs are themselves generated
from the polyhedral representation we generate them under the same
assumptions that they should verify. In other words, the guarantees
that we try to provide with the RTCs do not hold for the RTCs
themselves. To this end it is necessary to employ a different check
for the RTCs that will verify the assumptions did hold for them too.
Differential Revision: http://reviews.llvm.org/D20165
llvm-svn: 269299
Previously we checked the number of pieces to decide whether or not a
invariant load was to complex to be generated. However, there are
cases when e.g., divisions cause the complexity to spike regardless of
the number of pieces. To this end we now check the number of totally
involved dimensions which will increase with the number of pieces but
also the number of divisions.
llvm-svn: 269045
Min/max expressions are easier to read and can in some cases also result in
more concise IR that is generated as the min/max --- when lowered to a
cmp+select pattern -- commonly has a simpler condition then the ternary
condition isl would normally generate.
llvm-svn: 268855
The check for complexity compares the number of polyhedra in a set,
which are combined by disjunctions (union, "OR"),
not conjunctions (intersection, "AND").
llvm-svn: 268223
If the base pointer of an invariant load is is loaded conditionally, that
condition needs to hold for the invariant load too. The structure of the
program will imply this for domain constraints but not for imprecisions in
the modeling. To this end we will propagate the execution context of base
pointers during code generation and thus ensure the derived pointer does
not access an invalid base pointer.
llvm-svn: 267707
In r247147 we disabled pointer expressions because the IslExprBuilder did not
fully support them. This patch reintroduces them by simply treating them as
integers. The only special handling for pointers that is left detects the
comparison of two address_of operands and uses an unsigned compare.
llvm-svn: 265894
We verify the optimized function now for a long time and it helped to track
down bugs early. This will now also happen for all parallel subfunctions we
generate.
llvm-svn: 265823
The findValues() function did not look through div & srem instructions
that were part of the argument SCEV. However, in different other
places we already look through it. This mismatch caused us to preload
values in the wrong order.
llvm-svn: 265775
If a non-affine region PHI is generated we should not move the insert
point prior to the synthezised value in the same block as we might
split that block at the insert point later on. Only if the incoming
value should be placed in a different block we should change the
insertion point.
llvm-svn: 265132
This pass is not enabled in the default tool chain and currently can run into an
infinite loop, due to other parts of LLVM generating incorrect IR
(http://llvm.org/PR27065) -- which is not executed and consequently does not
seem to disturb other passes. As this pass is not really needed, we can just
drop it to get our build clean.
This fixes the timeout issues in MultiSource/Benchmarks/MiBench/consumer-jpeg
and MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg for
-polly-position=before-vectorizer -polly-process-unprofitable.. Unfortunately,
we are still left with a miscompile in cjpeg.
llvm-svn: 264396
When codegenerating invariant loads in some rare cases we cannot generate code
and bail out. This change ensures that we maintain a valid dominator tree
in these situations. This fixes llvm.org/PR26736
Contributed-by: Matthias Reisinger <d412vv1n@gmail.com>
llvm-svn: 264142
Value merging is only necessary for scalars when they are used outside
of the scop. While an array's base pointer can be used after the scop,
it gets an extra ScopArrayInfo of type MK_Value. We used to generate
phi's for both of them, where one was assuming the reault of the other
phi would be the original value, because it has already been replaced by
the previous phi. This resulted in IR that the current IR verifier
allows, but is probably illegal.
This reduces the number of LNT test-suite fails with
-polly-position=before-vectorizer -polly-process-unprofitable
from 16 to 10.
Also see llvm.org/PR26718.
llvm-svn: 262629
Polly recognizes affine loops that ScalarEvolution does not, in
particular those with loop conditions that depend on hoisted invariant
loads. Check for SCEVAddRec dependencies on such loops and do not
consider their exit values as synthesizable because SCEVExpander would
generate them as expressions that depend on the original induction
variables. These are not available in generated code.
llvm-svn: 262404
In order to speed up compile time and to avoid random timeouts we now
separately track assumptions and restrictions. In this context
assumptions describe parameter valuations we need and restrictions
describe parameter valuations we do not allow. During AST generation
we create a runtime check for both, whereas the one for the
restrictions is negated before a conjunction is build.
Except the In-Bounds assumptions we currently only track restrictions.
Differential Revision: http://reviews.llvm.org/D17247
llvm-svn: 262328
This allows to construct run-time checks for a scop without having to generate
a full AST. This is currently not taken advantage of in Polly itself, but
external users may benefit from this feature.
llvm-svn: 262009
Check the ModRefBehaviour of functions in order to decide whether or
not a call instruction might be acceptable.
Differential Revision: http://reviews.llvm.org/D5227
llvm-svn: 261866
The generated dedicated subregion exit block was assumed to have the same
dominance relation as the original exit block. This is incorrect if the exit
block receives other edges than only from the subregion, which results in that
e.g. the subregion's entry block does not dominate the exit block.
llvm-svn: 261865
Replace Scop::getStmtForBasicBlock and Scop::getStmtForRegionNode, and
add overloads for llvm::Instruction and llvm::RegionNode.
getStmtFor and overloads become the common interface to get the Stmt
that contains something. Named after LoopInfo::getLoopFor and
RegionInfo::getRegionFor.
llvm-svn: 261791
This is also be caught by the function verifier, but disconnected from
the place that produced it. Catch it already at creation to be able to
reason more directly about the cause.
llvm-svn: 261790
This allows other passes and transformations to use some of the existing AST
building infrastructure. This is not yet used in Polly itself.
llvm-svn: 261496
We now always print the reason why the code did not pass the LLVM verifier and
we also allow to disable verfication with -polly-codegen-verify=false. Before
this change the first assertion had generally no information why or what might
have gone wrong and it was also impossible to -view-cfg without recompile. This
change makes debugging bugs that result in incorrect IR a lot easier.
llvm-svn: 261320
After we moved isl_ctx into Scop, we need to free the isl_ctx after
freeing all isl objects, which requires the ScopInfo pass to be freed
at last. But this is not guaranteed by the PassManager, and we need
extra code to free the isl_ctx at the right time.
We introduced a shared pointer to manage the isl_ctx, and distribute
it to all analyses that create isl objects. As such, whenever we free
an analyses with the shared_ptr (and also free the isl objects which
are created by the analyses), we decrease the (shared) reference
counter of the shared_ptr by 1. Whenever the reference counter reach
0 in the releaseMemory function of an analysis, that analysis will
be the last one that hold any isl objects, and we can safely free the
isl_ctx with that analysis.
Differential Revision: http://reviews.llvm.org/D17241
llvm-svn: 261100
We now distinguish invariant loads to the same memory location if they
have different types. This will cause us to pre-load an invariant
location once for each type that is used to access it. However, we can
thereby avoid invalid casting, especially if an array is accessed
though different typed/sized invariant loads.
This basically reverts the changes in r260023 but keeps the test
cases.
llvm-svn: 260045
We also disable this feature by default, as there are still some issues in
combination with invariant load hoisting that slipped through my initial
testing.
llvm-svn: 260025
Always use access-instruction pointer type to load the invariant values.
Otherwise mismatches between ScopArrayInfo element type and memory access
element type will result in invalid casts. These type mismatches are after
r259784 a lot more common and also arise with types of different size, which
have not been handled before.
Interestingly, this change actually simplifies the code, as we now have only
one code path that is always taken, rather then a standard code path for the
common case and a "fixup" code path that replaces the standard code path in
case of mismatching types.
llvm-svn: 260009
This allows code such as:
void multiple_types(char *Short, char *Float, char *Double) {
for (long i = 0; i < 100; i++) {
Short[i] = *(short *)&Short[2 * i];
Float[i] = *(float *)&Float[4 * i];
Double[i] = *(double *)&Double[8 * i];
}
}
To model such code we use as canonical element type of the modeled array the
smallest element type of all original array accesses, if type allocation sizes
are multiples of each other. Otherwise, we use a newly created iN type, where N
is the gcd of the allocation size of the types used in the accesses to this
array. Accesses with types larger as the canonical element type are modeled as
multiple accesses with the smaller type.
For example the second load access is modeled as:
{ Stmt_bb2[i0] -> MemRef_Float[o0] : 4i0 <= o0 <= 3 + 4i0 }
To support code-generating these memory accesses, we introduce a new method
getAccessAddressFunction that assigns each statement instance a single memory
location, the address we load from/store to. Currently we obtain this address by
taking the lexmin of the access function. We may consider keeping track of the
memory location more explicitly in the future.
We currently do _not_ handle multi-dimensional arrays and also keep the
restriction of not supporting accesses where the offset expression is not a
multiple of the access element type size. This patch adds tests that ensure
we correctly invalidate a scop in case these accesses are found. Both types of
accesses can be handled using the very same model, but are left to be added in
the future.
We also move the initialization of the scop-context into the constructor to
ensure it is already available when invalidating the scop.
Finally, we add this as a new item to the 2.9 release notes
Reviewers: jdoerfert, Meinersbur
Differential Revision: http://reviews.llvm.org/D16878
llvm-svn: 259784