The pattern recognition for MatMul is restrictive.
The number of "disjuncts" in the isl_map containing constraint
information was previously required to be 1
(as per isl_*_coalesce - which should ideally produce a domain map with
a single disjunct, but does not under some circumstances).
This was changed and made more flexible.
Contributed-by: Annanay Agarwal <cs14btech11001@iith.ac.in>
Differential Revision: https://reviews.llvm.org/D36460
llvm-svn: 311302
We still see some issues with parameter space mismatches. Revert this to get
a clean baseline. We will recommit after these issues have been resolved.
This reverts commit 0e360a14194f722ded7aa2bc9d4be2ed2efeeb49.
llvm-svn: 311268
Summary:
This information is necessary for PPCG to perform correct life range reordering.
With these changes applied we can live-range reorder some of the important
kernels in COSMO.
We also update and rename one test case, which previously could not be optimized
and now is optimized thanks to live-range reordering. To preserve test coverage
we add a new test case scalar-writes-in-scop-requires-abort.ll, which exercises
our automatic abort in case of scalar writes in the kernel.
Reviewers: Meinersbur, bollu, singam-sanjay
Subscribers: nemanjai, pollydev, llvm-commits, kbarton
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36929
llvm-svn: 311259
Kernel argument sizes now only get appended to the kernel launch parameter list if the OpenCL runtime is selected, not if CUDA runtime is chosen.
Differential revision: D36925
llvm-svn: 311248
When using -polly-ignore-integer-wrapping and -polly-acc-codegen-managed-memory
we add parameter dimensions lazily to the domains, which results in PPCG not
including parameter dimensions that are only used in memory accesses in the
kernel space. To make sure these parameters are still passed to the kernel, we
collect these parameter dimensions and align the kernel's parameter space
before code-generating it.
llvm-svn: 311239
Summary:
When trying to expand memory accesses, the current version of Polly uses statement Level dependences. The actual implementation is not working in case of multiple dependences per statement. For example in the following source code :
```
void mse(double A[Ni], double B[Nj], double C[Nj], double D[Nj]) {
int i,j;
for (j = 0; j < Ni; j++) {
for (int i = 0; i<Nj; i++)
S: B[i] = i;
for (int i = 0; i<Nj; i++)
T: D[i] = i;
U: A[j] = B[j];
C[j] = D[j];
}
}
```
The statement U has two dependences with S and T. The current version of polly fails during expansion.
This patch aims to fix this bug. For that, we use Reference Level dependences to be able to filter dependences according to statement and memory ref. The principle of expansion remains the same as before.
We also noticed that we need to bail out if load come after store (at the same position) in same statement. So a check was added to isExpandable.
Contributed by: Nicholas Bonfante <nicolas.bonfante@insa-lyon.fr>
Reviewers: Meinersbur, simbuerg, bollu
Reviewed By: Meinersbur, simbuerg
Subscribers: pollydev, llvm-commits
Differential Revision: https://reviews.llvm.org/D36791
llvm-svn: 311165
Summary:
Drop unused parameter dimensions to reduce the size of the sets we are working
with. Especially the computed dependences tend to accumulate a lot of parameters
that are present in the input memory accesses, but often not necessary to
express the actual dependences. As isl represents maps and sets with dense
matrices, reducing the dimensionality of isl sets commonly reduces code
generation performance.
This reduces compile time from 17 to 11 seconds for our test case. While this is
not impressive, this patch helped me to identify the previous two performance
improvements and additionally also increases readability of the isl data
structures we use.
Reviewers: Meinersbur, bollu, singam-sanjay
Reviewed By: bollu
Subscribers: nemanjai, pollydev, llvm-commits, kbarton
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36869
llvm-svn: 311161
We add a ScopInliner pass which inlines functions based on a simple heuristic:
Let `g` call `f`.
If we can model all of `f` as a Scop, we inline `f` into `g`.
This requires `-polly-detect-full-function` to be enabled. So, the pass
asserts that `-polly-detect-full-function` is enabled.
Differential Revision: https://reviews.llvm.org/D36832
llvm-svn: 311126
Reuse the machinery built for replacing global arrays to replace malloc/free as
well. Example replacement that was missed earlier:
```
call void \
bitcast (void (i8*)* @free to void (%custom_type*)*) (%custom_type* %13)
```
- Since the `bitcast` is a `ConstantExpr`, `replaceAllUsesWith` would miss
this. We don't miss this anymore.
Differential Revision: https://reviews.llvm.org/D36825
llvm-svn: 311121
In release builds LLVM may not pass along LLVM names consistently. We make the
test cases independent of the LLVM-IR names to avoid spurious test case
failures.
llvm-svn: 311118
- If we have global arrays, we would like to rewrite them to global
pointers which are allocated using `cudaMallocManaged`.
- If we have allocas in a function, we would like to rewrite them to
heap-allocations with `cudaMallocManaged` and `cudaFree`.
- With these rewrite mechanisms, we can offload _any_ function to the
GPU with no code rewrite whatsover.
Differential Revision: https://reviews.llvm.org/D36516
llvm-svn: 311080
Summary:
This pass detangles induction variables from functions, which take variables by
reference. Most fortran functions compiled with gfortran pass variables by
reference. Unfortunately a common pattern, printf calls of induction variables,
prevent in this situation the promotion of the induction variable to a register,
which again inhibits any kind of loop analysis. To work around this issue
we developed a specialized pass which introduces separate alloca slots for
known-read-only references, which indicate the mem2reg pass that the induction
variables can be promoted to registers and consquently enable SCEV to work.
We currently hardcode the information that a function
_gfortran_transfer_integer_write does not read its second parameter, as
dragonegg does not add the right annotations and we cannot change old dragonegg
releases. Hopefully flang will produce the right annotations.
Reviewers: Meinersbur, bollu, singam-sanjay
Reviewed By: bollu
Subscribers: mgorny, pollydev, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36800
llvm-svn: 311066
ReportLoopHasNoExit started failing after r310940 that added
infinite loops to postdominators. The change made regions not
contain infinite loops anymore.
This patch unbreaks the polly tree by XFAILING the
ReportLoopHasNoExit test. Full fix is under review in D36776.
llvm-svn: 310980
Summary:
Before, if we fail to parse a jscop file, this will be reported as an
error and importing is aborted. However, this isn't actually strong
enough, since although the import is aborted, the scop has already been
modified and is very likely broken. Instead, make this a hard failure
and throw an LLVM error. This new behaviour requires small changes to
the tests for the legacy pass, namely using `not` to verify the error.
Further, fixed the jscop file for the
base_pointer_load_is_inst_inside_invariant_1 testcase.
Reviewed By: Meinersbur
Split out of D36578.
llvm-svn: 310599
This pass is useful to automatically convert a codebase that uses malloc/free
to use their managed memory counterparts.
Currently, rewrite malloc and free to the `polly_{malloc,free}Managed` variants.
A future patch will teach ManagedMemoryRewrite to rewrite global arrays
as pointers to globally allocated managed memory.
Differential Revision: https://reviews.llvm.org/D36513
llvm-svn: 310471
Codegen with -polly-parallel queried the unmapped MemoryAccess, but only
the MemoryKind after mapping is relevant for codegen.
This should fix various fails of the
perf-x86_64-penryn-O3-polly-parallel-fast buildbot.
llvm-svn: 310466
Previously, we used to compute this with `elementSizeInBits / 8`. This
would yield an element size of 0 when the array had element size < 8 in
bits.
To fix this, ask data layout what the size in bytes should be.
Differential Revision: https://reviews.llvm.org/D36459
llvm-svn: 310448
We introduce another level of alias metadata to distinguish the individual
non-aliasing accesses that have inter iteration alias-free base pointers
marked with "Inter iteration alias-free" mark nodes. To distinguish two
accesses, the comparison of raw pointers representing base pointers is used.
In case of, for example, ublas's prod function that implements GEMM, and
DeLiCM we can get accesses to same location represented by different raw
pointers. Consequently, we create different alias sets that can prevent
accesses from, for example, being sinked or hoisted.
To avoid the issue, we compare the corresponding SCEV information instead
of the corresponding raw pointers.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D35761
llvm-svn: 310380
Currently, only convex isolation sets can be efficiently processed by isl.
Consequently, as a temporary solution, we use a different algorithm for partial
tile isolation that helps to build convex isolation sets in some cases.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D36278
llvm-svn: 310374
To do this, we replicate what `CodeGeneration` does. We expose
`markNodeUnreachable` from `CodeGeneration` to `PPCGCodeGeneration`.
Differential Revision: https://reviews.llvm.org/D36457
llvm-svn: 310350
It is possible that partial writes are empty (write is never executed).
In this case, when in PHINode's incoming edge is never taken such that
the incoming write becomes an empty partial write, if enabled. The
issue is that when converting the union_map to an map, it's space
cannot be derived from the union_map itself. Rather, we need to
determine its space independently.
This fixes test-suite's MultiSource/Benchmarks/ASC_Sequoia/CrystalMk.
llvm-svn: 310348
In certain cases delicm might decide to not leave the original array write in
the loop body, but to remove it and instead leave a transformed phi node as
write access. This commit teached the matmul pattern detection to order the
memory accesses according to when the access actually happens and use this
information to detect the new pattern. This makes pattern based matmul
optimization work for 2mm and 3mm in polybench 4 after
polly-position=before-vectorizer has been enabled.
llvm-svn: 310338
Two write statements which write into the very same array slot generally are
conflicting. However, in case the value that is written is identical, this
does not cause any problem. Hence, allow such write pairs in this specific
situation.
llvm-svn: 310311
This commit implements the initial version of fully-indexed static
expansion.
```
for(int i = 0; i<Ni; i++)
for(int j = 0; j<Ni; j++)
S: B[j] = j;
T: A[i] = B[i]
```
After the pass, we want this :
```
for(int i = 0; i<Ni; i++)
for(int j = 0; j<Ni; j++)
S: B[i][j] = j;
T: A[i] = B[i][i]
```
For now we bail (fail) in the following cases:
- Scalar access
- Multiple writes per SAI
- MayWrite Access
- Expansion that leads to an access to the original array
Furthermore: We still miss checks for escaping references to the array
base pointers. A future commit will add the missing escape-checks to
stay correct in those cases. The expansion is still locked behind a
CLI-Option and should not yet be used.
Patch contributed by: Nicholas Bonfante <bonfante.nicolas@gmail.com>
Reviewers: simbuerg, Meinersbur, bollu
Reviewed By: Meinersbur
Subscribers: mgorny, llvm-commits, pollydev
Differential Revision: https://reviews.llvm.org/D34982
llvm-svn: 310304
This is an addition to the -polly-optree pass that reuses the array
content analysis from DeLICM to find array elements that contain the
same value as the value loaded when the target statement instance
is executed.
The analysis is now enabled by default.
The known content analysis could also be used to rematerialize any
llvm::Value that was written to some array element, but currently
only loads are forwarded.
Differential Revision: https://reviews.llvm.org/D36380
llvm-svn: 310279
Summary:
This resolves some "instruction does not dominate use" errors, as we used to
prepare the arrays at the location of the first kernel, which not necessarily
dominated all other kernel calls.
Reviewers: Meinersbur, bollu, singam-sanjay
Subscribers: nemanjai, pollydev, llvm-commits, kbarton
Differential Revision: https://reviews.llvm.org/D36372
llvm-svn: 310196
A Scop with a loop outside it is not handled currently by
PPCGCodeGeneration. The test case is such that the Scop has only one inner loop
that is detected. This currently breaks codegen.
The fix is to reuse the existing mechanism in `IslNodeBuilder` within
`GPUNodeBuilder.
Differential Revision: https://reviews.llvm.org/D36290
llvm-svn: 310193
Summary:
In case the option -polly-ignore-parameter-bounds is set, not all parameters
will be added to context and domains. This is useful to keep the size of the
sets and maps we work with small. Unfortunately, for AST generation it is
necessary to ensure all parameters are part of the schedule tree. Hence,
we modify the GPGPU code generation to make sure this is the case.
To obtain the necessary information we expose a new function
Scop::getFullParamSpace(). We also make a couple of functions const to be
able to make SCoP::getFullParamSpace() const.
Reviewers: Meinersbur, bollu, gareevroman, efriedma, huihuiz, sebpop, simbuerg
Subscribers: nemanjai, kbarton, pollydev, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36243
llvm-svn: 309939
When we have `-polly-ignore-parameter-bounds`, `Scop::Context` does not contain
all the paramters present in the program.
The construction of the `isl_multi_pw_aff` requires all the indivisual `pw_aff`
to have the same parameter dimensions. To achieve this, we used to realign
every `pw_aff` with `Scop::Context`. However, in conjunction with
`-polly-ignore-parameter-bounds`, this is now incorrect, since `Scop::Context`
does not contain all parameters.
We set this up correctly by creating a space that has all the parameters
used by all the `isl_pw_aff`. Then, we realign all `isl_pw_aff` to this space.
llvm-svn: 309934
Summary:
**Remove debug metadata from instruction to be copied to prevent the source file's debug metadata being copied into GPUModule and eventually failing Module verification and ASM string codegeneration.**
When copying the instruction onto the Module meant for the GPU, debug metadata attached to an instruction causes all related metadata to be pulled into the Module, including the DICompileUnit, which is not listed in llvm.dbg.cu of the Module. This fails the verification of the Module and generation of the ASM string.
The only debug metadata of the instruction, the DebugLoc, is unset by this patch.
This patch reattempts https://reviews.llvm.org/D35630 by targeting only those instructions that are to end up in a Module meant for the GPU.
Reviewers: grosser, bollu
Reviewed By: grosser
Subscribers: pollydev
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36161
llvm-svn: 309822
The previous algorithm was to search a writes and the sours of its value
operand, and see whether the write just stores the same read value back,
which includes a search whether there is another write access between
them. This is O(n^2) in the max number of accesses in a statement
(+ the complexity of isl comparing the access functions).
The new algorithm is more similar to the one used for searching for
overwrites and coalescable writes. It scans over all accesses in order
of execution while tracking which array elements still have the same
value since it was read. This is O(n), not counting the complexity
within isl. It should be more reliable than trying to catch all
non-conforming cases in the previous approach. It is also less code.
We now also support if the write is a partial write of the read's
domain, and to some extent non-affine subregions.
Differential Revision: https://reviews.llvm.org/D36137
llvm-svn: 309734
With a lot of reads and writes to the same array in a statement,
some isl sets that capture the state between access can become
complex such that isl takes more considerable time and memory
for operations on them.
The problems identified were:
- is_subset() takes considerable time with many disjoints in the
arguments. We limit the number of disjoints to 4, any additional
information is thrown away.
- subtract() can lead to many disjoints. We instead assume that any
array element is possibly accessed, which removes all disjoints.
- subtract_domain() may lead to considerable processing, even if all
elements are are to be removed. Instead, we remove determine and
remove the affected spaces manually. No behaviour is changed.
llvm-svn: 309728
It is possible that the `HostPtr` that coresponds to an array could be
invariant load hoisted. Make sure we use the invariant load hoisted
value by using `IslNodeBuilder::getLatestValue`.
Differential Revision: https://reviews.llvm.org/D36001
llvm-svn: 309681
This allows -polly-optree to move instructions that depend on
synthesizable values.
The difficulty for synthesizable values is that their value depends on
the location. When it is moved over a loop header, and the SCEV
expression depends on the loop induction variable (SCEVAddRecExpr), it
would use the current induction variable instead of the last one.
At the moment we cannot forward PHI nodes such that crossing the header
of loops referenced by SCEVAddRecExpr is not possible (assuming the loop
header has at least two incoming blocks: for entering the loop and the
backedge, such any instruction to be forwarded must have a phi between
use and definition).
A remaining issue is when the forwarded value is used after the loop,
but is only synthesizable inside the loop. This happens e.g. if
ScalarEvolution is unable to determine the number of loop iterations or
the initial loop value. We do not forward in this situation.
Differential Revision: https://reviews.llvm.org/D36102
llvm-svn: 309609
In addition to array and PHI writes, also allow scalar value writes.
The only kind of write not allowed are writes by functions
(including memcpy/memmove/memset).
llvm-svn: 309582
Summary:
This allows us to map functions such as exp, expf, expl, for which no
LLVM intrinsics exist. Instead, we link to NVIDIA's libdevice which provides
high-performance implementations of a wide range of (math) functions. We
currently link only a small subset, the exp, cos and copysign functions. Other
functions will be enabled as needed.
Reviewers: bollu, singam-sanjay
Reviewed By: bollu
Subscribers: tstellar, tra, nemanjai, pollydev, mgorny, llvm-commits, kbarton
Tags: #polly
Differential Revision: https://reviews.llvm.org/D35703
llvm-svn: 309560
This reverts commit r309490 as it triggers on our AOSP buildbut error messages
of the form:
inlinable function call in a function with debug info must have a !dbg location
llvm-svn: 309556
Summary:
**Remove debug metadata from instruction to be copied to prevent the source file's debug metadata being copied into GPUModule and eventually failing Module verification and ASM string codegeneration.**
When copying the instruction onto the Module meant for the GPU, debug metadata attached to an instruction causes all related metadata to be pulled into the Module, including the DICompileUnit, which is not listed in llvm.dbg.cu of the Module. This fails the verification of the Module and generation of the ASM string.
The only debug metadata of the instruction, the DebugLoc, is unset by this patch.
Reviewers: grosser, bollu, Meinersbur
Reviewed By: grosser, bollu
Subscribers: pollydev
Tags: #polly
Differential Revision: https://reviews.llvm.org/D35630
llvm-svn: 309490
Write coalescing combines write accesses that
- Write the same llvm::Value.
- Write to the same array.
- Unless they do not write anything in a statement instance (partial
writes), write to the same element.
- There is no other access between them that accesses the same element.
This is particularly useful after DeLICM, which leaves partial writes to
disjoint domains.
Differential Revision: https://reviews.llvm.org/D36010
llvm-svn: 309489
After region exit simplification, the incoming block of a phi node in
the SCoP region's exit block lands outside of the region. Since we
treat SCoPs as if this already happened, we need to account for that
when looking for outside uses of scalars (i.e. escaping scalars).
llvm-svn: 309271
A PHI node's incoming block is the user of its operand, not the PHI's parent.
Assuming the PHINode's parent being the user lead to the removal of a
MemoryAccesses because its use was assumed to be inside of the SCoP.
llvm-svn: 309164
In the following loop:
int i;
for (i = 0; i < func(); i+=1)
;
SCoP:
for (int j = 0; j<n; j+=1)
S(i, j)
The value i is synthesizable in the SCoP that includes only the j-loop.
This is because i is fixed within the SCoP, it is irrelevant whether
it originates from another loop.
This fixes a strange case where a PHI was synthesiable in a SCoP,
but not its incoming value, triggering an assertion.
This should fix MultiSource/Applications/sgefa/sgefa of the
perf-x86_64-penryn-O3-polly-before-vectorizer-unprofitable buildbot.
llvm-svn: 309109
Summary:
This consists instances of two changes:
- Accept any order of checks for a specific loop form, that appear in different order in the new vs legacy-PM.
- Remove checks for specific regions.
Reviewers: grosser
Reviewed By: grosser
Subscribers: pollydev, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D35837
llvm-svn: 308976
Invariant load hoisted scalars, and arrays whose size we can statically compute
to be 0 do not need to be allocated as arrays.
Invariant load hoisted scalars are sent to the kernel directly as parameters.
Earlier, we used to allocate `0` bytes of memory for these because our
computation of size from `PPCGCodeGeneration::getArraySize` would result in `0`.
Now, since we don't invariant loads as arrays in PPCGCodeGeneration, this
problem does not occur anymore.
Differential Revision: https://reviews.llvm.org/D35795
llvm-svn: 308971
Read-only values (values defined before the SCoP) require special
handing with -polly-analyze-read-only-scalars=true (which is the
default). If active, each use of a value requires a read access.
When a copied value uses a read-only value, we must also ensure that
such a MemoryAccess is available or is created.
Differential Revision: https://reviews.llvm.org/D35764
llvm-svn: 308876
Summary:
- We were using `.count` in `StringRef`, which matches substrings.
- We may want to use this for equality as well.
- Generalise this, so allow regexes as a parameter to `polly-only-func`.
Differential Revision: https://reviews.llvm.org/D35728
llvm-svn: 308875
If the access relation's domain is empty, the access will never be
executed. We can just remove it.
We only remove write accesses. Partial read accesses are not yet
supported and instructions in the statement might require the
llvm::Value holding the read's result to be defined.
llvm-svn: 308830
Hoisted loads can be trivially supported because there are no
MemoryAccess to be modified, the loaded value is just available
at code generation.
llvm-svn: 308826
This pass 'forwards' operand trees into statements that use them in
order to avoid scalar dependencies.
This minimal implementation handles only the case of speculatable
instructions. We will successively add support for:
- Hoisted loads
- Read-only values
- Synthesizable values
- Loads
- PHIs
- Forwarding only parts of the tree
Differential Revision: https://reviews.llvm.org/D35754
llvm-svn: 308825
Summary:
For the ScopInfo lit testsuite, this patch removes some dependences on output behaviour of the legacy PM.
In most cases, these tests checked the tool output for labels created by the pass printer in the legacy PM. This doesn't work for the new PM anymore. Untangling the testcases is the first step to porting the testsuite for the new PM infrastructure.
Reviewers: grosser, Meinersbur, bollu
Reviewed By: grosser
Subscribers: llvm-commits, pollydev
Tags: #polly
Differential Revision: https://reviews.llvm.org/D35727
llvm-svn: 308754
Summary:
Added SPIR Code Generation to the PPCG Code Generator. This can be invoked using
the polly-gpu-arch flag value 'spir32' or 'spir64' for 32 and 64 bit code respectively.
In addition to that, runtime support has been added to execute said SPIR code on Intel
GPU's, where the system is equipped with Intel's open source driver Beignet (development
version). This requires the cmake flag 'USE_INTEL_OCL' to be turned on, and the polly-gpu-runtime
flag value to be 'libopencl'.
The transformation of LLVM IR to SPIR is currently quite a hack, consisting in part of regex
string transformations.
Has been tested (working) with Polybench 3.2 on an Intel i7-5500U (integrated graphics chip).
Reviewers: bollu, grosser, Meinersbur, singam-sanjay
Reviewed By: grosser, singam-sanjay
Subscribers: pollydev, nemanjai, mgorny, Anastasia, kbarton
Tags: #polly
Differential Revision: https://reviews.llvm.org/D35185
llvm-svn: 308751
When performing invariant load hoisting we check that invariant load expressions
are not too complex. Up to this commit, we performed this check by counting the
sum of dimensions in the access range as a very simple heuristic. This heuristic
is a little too conservative, as it prevents hoisting for any scops with a
very large number of parameters. Hence, we update the heuristic to only count
existentially quantified dimensions and set dimensions. We expect this to still
detect the problematic expressions in h264 because of which this check was
originally introduced.
For some unknown reason, this complexity check was originally committed in
IslNodeBuilder. It really belongs in ScopInfo, as there is no point in
optimizing a program which we could have known earlier cannot be code generated.
The benefit of running the check early is that we can avoid to even hoist checks
that are expensive to code generate as invariant loads. This can be seen in
the changed tests, where we now indeed detect the scop, but just not invariant
load hoist the complicated access.
We also improve the formatting of the code, document it, and use isl++ to
simplify expressions.
llvm-svn: 308659
Use a mark-and-sweep algorithm to find and remove unused instructions
and MemoryAccesses. This is useful in particular to remove scalar
writes that are never used anywhere. A scalar write in a loop induces
a write-after-write dependency that stops the loop iterations to be
rescheduled. Such writes can be a result of previous transformations
such as DeLICM and operand tree forwarding.
It adds a new class VirtualInstruction that represents an instruction in
a particular statement. At the moment an instruction can only belong to
the statement that represents a BasicBlock. In the future, instructions
can be in one of multiple statements representing a BasicBlock
(Nandini's work), in different statements than its BasicBlock would
indicate, and even multiple statements at once (by forwarding operand
trees). It also integrates nicely with the VirtualUse class.
ScopStmt::contains(Instruction*) currently uses the instruction's parent
BasicBlock to check whether it contains the instruction. It will need to
check the actual statement list when one of the aforementioned features
become possible.
Differential Revision: https://reviews.llvm.org/D35656
llvm-svn: 308626
This commit *WILL COMPILE*.
1. `PPCG` now uses `isl_multi_pw_aff` instead of an array of `pw_aff`.
This needs us to adjust how we index array bounds and how we construct
array bounds.
2. `PPCG` introduces two new kinds of nodes: `init_device` and `clear_device`.
We should investigate what the correct way to handle these are.
3. `PPCG` has gotten smarter with its use of live range reordering, so some of
the tests have a qualitative improvement.
4. `PPCG` changed its output style, so many test cases need to be updated to
fit the new style for `polly-acc-dump-code` checks.
Differential Revision: https://reviews.llvm.org/D35677
llvm-svn: 308625
This is one possible solution to implement wrap-arounds for integers in
unsigned icmp operations. For example,
store i32 -1, i32* %A_addr
%0 = load i32, i32* %A_addr
%1 = icmp ult i32 %0, 0
%1 should hold false, because under the assumption of unsigned integers,
-1 should wrap around to 2^32-1. However, previously. it was assumed
that the MSB (Most Significant Bit - aka the Sign bit) was never set for
integers in unsigned operations.
This patch modifies the buildConditionSets function in ScopInfo.cpp to
give better information about the integers in these unsigned
comparisons.
Contributed-by: Annanay Agarwal <cs14btech11001@iith.ac.in>
Differential Revision: https://reviews.llvm.org/D35464
llvm-svn: 308608
Some optimizations (e.g., DeLICM) can modify memory accesses (e.g., change
their MemoryKind). Consequently, the pattern matching should take it into
the account.
Reviewed-by: Tobias Grosser <tobias@grosser.es>,
Michael Kruse <llvm@meinersbur.de>
Differential Revision: https://reviews.llvm.org/D33138
llvm-svn: 308494
Utilizing newer LLVM diagnostic remark API in order to enable use of
opt-viewer tool. Polly Diagnostic Remarks also now appear in YAML
remark file.
In this patch, I've added the OptimizationRemarkEmitter into certain
classes where remarks are being emitted and update the remark emit calls
itself. I also provide each remark a BasicBlock or Instruction from where
it is being called, in order to compute the hotness of the remark.
Patch by Tarun Rajendran!
Differential Revision: https://reviews.llvm.org/D35399
llvm-svn: 308233
Summary:
We do not keep domain constraints on access functions when building the
scop. Hence, for consistency reasons, it makes also sense to not include
them when storing a new access function. This change results in simpler
access functions that make output easier to read.
This patch also helps to make DeLICMed memory accesses to be understood by
our matrix multiplication pattern matching pass. Further changes to the
matrix multiplication pattern matching are needed for this to work, so the
corresponding test case will be added in a future commit.
Reviewers: Meinersbur, bollu, gareevroman, efriedma, huihuiz, sebpop, simbuerg
Subscribers: pollydev, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D35237
llvm-svn: 308215
- We should call `preloadInvariantLoads` to make sure that code is
generated for invariant loads in the kernel.
Differential Revision: https://reviews.llvm.org/D35410
llvm-svn: 308187
This patch makes sure that in case a loop is not fully contained within a region
that later forms a SCoP, none of the loop backedges are allowed to be part of
the region. We currently do not support the situation where only some of a loops
backedges are part of a scop. Today, this can break both scop modeling and code
generation. One such breaking test case is for example
test/ScopDetectionDiagnostics/loop_partially_in_scop-2.ll, where we totally
forgot to code generate some of the backedges. Fortunately, it is commonly not
necessary to support these partial loops, it is way more common that either
no backedge is included in a region or all loop backedge are included.
This fixes a recent miscompile in
MultiSource/Benchmarks/MiBench/consumer-typeset which was exposed after
r306477.
llvm-svn: 308113
- There is a conditional branch that is used to switch between the old
and new versions of the code.
- If we detect that the build was unsuccessful, `PPCGCodeGeneration` will
change the runtime check to be always set to false.
- To actually *reach* this runtime check instruction, `PPCGCodeGeneration`
was using assumptions about the layout of the BBs.
- However, invariant load hoisting violates this assumption by inserting
an extra basic block in the middle.
- Fix the assumption on the layout by having `createScopConditionally`
return the conditional branch instruction.
- Use this reference to set to always-false.
llvm-svn: 308010
We need to relax constraints on invariant loads so that they do not
create fake RAW dependences. So, we do not consider invariant loads as
scalar dependences in a region.
During these changes, it turned out that we do not consider `llvm::Value`
replacements correctly within `PPCGCodeGeneration` and `ISLNodeBuilder`.
The replacements dictated by `ValueMap` were not being followed in all
places. This was fixed in this commit. There is no clean way to decouple
this change because this bug only seems to arise when the relaxed
version of invariant load hoisting was enabled.
Differential Revision: https://reviews.llvm.org/D35120
llvm-svn: 307907
Summary:
Add a sequence number that identifies a ptx_kernel's parent Scop within a function to it's name to differentiate it from other kernels produced from the same function, yet different Scops.
Kernels produced from different Scops can end up having the same name. Consider a function with 2 Scops and each Scop being able to produce just one kernel. Both of these kernels have the name "kernel_0". This can lead to the wrong kernel being launched when the runtime picks a kernel from its cache based on the name alone. This patch supplements D33985, by differentiating kernels across Scops as well.
Previously (even before D33985) while profiling kernels generated through JIT e.g. Julia, [[ https://groups.google.com/d/msg/polly-dev/J1j587H3-Qw/mR-jfL16BgAJ | kernels associated with different functions, and even different SCoPs within a function, would be grouped together due to the common name ]]. This patch prevents this grouping and the kernels are reported separately.
Reviewers: grosser, bollu
Reviewed By: grosser
Subscribers: mehdi_amini, nemanjai, pollydev, kbarton
Tags: #polly
Differential Revision: https://reviews.llvm.org/D35176
llvm-svn: 307814
- `lit.util.capture` was removed in `r306625`.
- Replace `lit.util.capture` to `subprocess.check_output` as LLVM did.
- LLVM revision of this change: `https://reviews.llvm.org/D35088`.
Differential Revision: https://reviews.llvm.org/D35255
llvm-svn: 307765
Summary:
There is a bug in the current lit configurations for the unittests. If gtest is not available, the site-config for the unit tests won't be generated. Because lit recurses through the test directory, the lit configuration for the unit tests will be discovered nevertheless, leading to a fatal error in lit.
This patch semi-gracefully skips the unittests if gtest is not available. As a result, running lit now prints this: `warning: test suite 'Polly-Unit' contained no test`.
If people think that this is too annoying, the alternative would be to pick apart the test directory, so that the lit testsuite discovery will always only find one configuration. In fact, both of these things could be combined. While it's certainly nice that running a single lit command runs all the tests, I suppose people use the `check-polly` make target over lit most of the time, so the difference might not be noticed.
Reviewers: Meinersbur, grosser
Reviewed By: grosser
Subscribers: mgorny, bollu, pollydev, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D34053
llvm-svn: 307651