Commit Graph

2709 Commits

Author SHA1 Message Date
Tobias Grosser 4b5e24df9a cmake: avoid "zero-length gnu_printf format string" warning in gcc 6.1.1
Contributed-by: Andy Gibbs <andyg1001@hotmail.co.uk>
llvm-svn: 284302
2016-10-15 05:08:12 +00:00
Michael Kruse fa53c86dc1 [ScopInfo/CodeGen] ExitPHI reads are implicit.
Under some conditions MK_Value read accessed where converted to MK_ExitPHI read
accessed. This is unexpected because MK_ExitPHI read accesses are implicit after
the scop execution. This behaviour was introduced in r265261, which fixed a
failed assertion/crash in CodeGen.

Instead, we fix this failure in CodeGen itself. createExitPHINodeMerges(),
despite its name, also handles accesses of kind MK_Value, only to skip them
because they access values that are usually not PHI nodes in the SCoP region's
exit block. Except in the situation observed in r265261.

Do not convert value accessed to ExitPHI accesses and do not handle
value accesses like ExitPHI accessed in CodeGen anymore.

llvm-svn: 284023
2016-10-12 16:31:09 +00:00
Michael Kruse c9edc2ee8d [DepInfo] Print -debug output outside of max-operations scope.
ISL tries to simplify the polyhedral operations before printing its objects.
This increases the operations counter and therefore can contribute to hitting
the operations limit. Therefore the result could be different when -debug output
is enabled, making debugging harder.

llvm-svn: 283745
2016-10-10 11:45:59 +00:00
Michael Kruse 8bfba1ff46 [Support/DepInfo] Introduce IslMaxOperationsGuard and make DepInfo use it. NFC.
IslMaxOperationsGuard defines a scope where ISL may abort operations because if
it takes too many operations. Replace the call to the raw ISL interface by a
use of the guard.

IslMaxOperationsGuard provides a uniform way to define a maximal computation
time for a code region in C++ using RAII.

llvm-svn: 283744
2016-10-10 11:45:54 +00:00
Tobias Grosser b270288752 Fix formatting after recent cl:: changes
This fixes 'make check-polly'

llvm-svn: 283693
2016-10-09 08:31:35 +00:00
Mehdi Amini 732afdd09a Turn cl::values() (for enum) from a vararg function to using C++ variadic template
The core of the change is supposed to be NFC, however it also fixes
what I believe was an undefined behavior when calling:

 va_start(ValueArgs, Desc);

with Desc being a StringRef.

Differential Revision: https://reviews.llvm.org/D25342

llvm-svn: 283671
2016-10-08 19:41:06 +00:00
Hongbin Zheng 5860aef675 Define PATH_MAX on windows
Differential Revision: https://reviews.llvm.org/D25372

llvm-svn: 283600
2016-10-07 20:58:20 +00:00
Michael Kruse de42b43de0 [cmake] Unify disabling upstream project warnings.
Handle MSVC, ISL and PPCG in one place. The only functional change is that
warnings are also disabled for MSVC compiling PPCG (Which currently fails
anyway).

llvm-svn: 283547
2016-10-07 12:38:32 +00:00
Michael Kruse 4b5f6af2dc [cmake] Move isl_test artifacts to Polly folder.
Folders in Visual Studio solutions help organize the build artifacts from all
LLVM projects. There is a folder to keep Polly-built files in.

llvm-svn: 283546
2016-10-07 12:38:24 +00:00
Tobias Grosser e84ee850d1 Build and run isl_test as part of check-polly
Running isl tests is important to gain confidence that the isl build we created
works as expected. Besides the actual isl tests, there are also isl AST
generation tests shipped with isl. This change only adds support for the isl
unit tests. AST generation test support is left for a later commit.

There is a choice to run tests directly through the build system or in the
context of lit. We choose to run tests as part of lit to as this allows us to
easily set environment variables, print output only on error and generally run
the tests directly from the lit command.

Reviewers: brad.king, Meinersbur

Subscribers: modocache, brad.king, pollydev, beanz, llvm-commits, mgorny

Differential Revision: https://reviews.llvm.org/D25155

llvm-svn: 283245
2016-10-04 19:48:40 +00:00
Michael Kruse 6ab4476835 [ScopInfo] Add -polly-unprofitable-scalar-accs option.
With this option one can disable the heuristic that assumes that statements with
a scalar write access cannot be profitably optimized. Such a statement instances
necessarily have WAW-dependences to itself. With DeLICM scalar accesses can be
changed to array accesses, which can avoid these WAW-dependence.

llvm-svn: 283233
2016-10-04 17:33:39 +00:00
Michael Kruse ca7cbcca37 [ScopInfo] Scalar access do not have indirect base pointers.
ScopArrayInfo used to determine base pointer origins by looking up whether the
base pointer is a load. The "base pointer" for scalar accesses is the
llvm::Value being accessed. This is only a symbolic base pointer, it
represents the alloca variable (.s2a or .phiops) generated for it at code
generation.

This patch disables determining base pointer origin for scalars.
A test case where this caused a crash will be added in the next commit. In that
test SAI tried to get the origin base pointer that was only declared later,
therefore not existing. This is probably only possible for scalars used in
PHINode incoming blocks.

llvm-svn: 283232
2016-10-04 17:33:34 +00:00
Michael Kruse 9116899615 [ScopInfo] Make simplifySCoP() public. NFC.
This function may need to be called after the scop construction. The upcoming
DeLICM will use this to cleanup statement that all write accesses have been
removed from.

llvm-svn: 283221
2016-10-04 14:14:33 +00:00
Tobias Grosser 5652c9c830 isl: update to isl-0.17.1-233-gc911e6a
llvm-svn: 283049
2016-10-01 19:46:51 +00:00
Michael Kruse 4b0c5aea78 [CodeGen] Add assertion for indirect array index expression generation. NFC.
Currently Polly cannot generate code for index expressions if the base pointer
is computed within the scop. The base pointer must be generated as well, but
there is no code that triggers that.

Add an assertion to detect when this would occur and miscompile. The IR verifier
should catch it as well.

llvm-svn: 282893
2016-09-30 18:29:37 +00:00
Michael Kruse f6f795e9ae [Support] Complete ISL annotations to IslPtr<>. NFC.
Add missing __isl_(give/take/keep) annotations to IslPtr<> and NonowningIslPtr<>
methods.

Because IslPtr's constructor's annotation would depend on the TakeOwnership
parameter, the parameter has been removed. Caller must copy the object
themselves if the do not want to take ownership.

llvm-svn: 282883
2016-09-30 17:47:39 +00:00
Michael Kruse 51f514d853 [Support] Compile fix for gcc. NFC.
gcc 5.4 insists on template specialization to be in a namespace polly { ... }
block, instead of being prefixed with 'polly::'. Error message:

root/src/llvm/tools/polly/lib/Support/GICHelper.cpp:203:54: error: specialization of ‘template<class T> void polly::IslPtr<T>::dump() const’ in different namespace [-fpermissive]
   template <> void polly::IslPtr<isl_##TYPE>::dump() const {                   \
                                                      ^
msvc14 and clang 3.8 did not complain.

llvm-svn: 282874
2016-09-30 16:47:43 +00:00
Michael Kruse 55519dad62 [Support] Add (Nonowning-)IslPtr::dump(). NFC.
The dump() methods can be called from a debugger instead of e.g.

    isl_*_dump(Var.Obj)

where Var is a variable of type IslPtr/NonowningIslPtr. To ensure that the
existence of the function pointers do not depdend on whether the methods are
used somwhere, they are declared with external linkage.

llvm-svn: 282870
2016-09-30 16:10:19 +00:00
Michael Kruse 32312d0294 [Support] Call isl_*_free() only on non-null pointers. NFC.
Add a non-NULL check before calling the free function into functions that are
supposed to be inlined. First, this is a form of partial inlining of the free
function, namely the nullptr test that free has to do. Secondly, and more
importantly, it allows the compiler to remove the call to isl_*_free() when it
knows that the object is nullptr, for instance because the last call is a
take(). "Consuming" the last use of an ISL object using take()
(instead of copy()) is a common pattern.

llvm-svn: 282864
2016-09-30 15:29:43 +00:00
Michael Kruse 888ab55140 [CodeGen] Change 'Scalar' to 'Array' in method names. NFC.
generateScalarLoad() and generateScalarStore() are used for explicit (MK_Array)
memory accesses, therefore the method names were misleading. The names also
were similar to generateScalarLoads() and generateScalarStores() (plural forms)
which indeed handle scalar accesses. Presumbly, they were originally named to
contrast VectorBlockGenerator::generateLoad().

Rename the two methods to generateArrayLoad(),
respectively generateArrayStore().

llvm-svn: 282861
2016-09-30 14:34:05 +00:00
Michael Kruse 77394f1394 [CodeGen] Add assertion for partial scalar accesses. NFC.
The code generator always adds unconditional LoadInst and StoreInst, hence the
MemoryAccess must be defined over all statement instances.

llvm-svn: 282853
2016-09-30 14:01:46 +00:00
Tobias Grosser b110296011 www: Add Loopy publication
llvm-svn: 282747
2016-09-29 18:17:30 +00:00
Tobias Grosser ae91bc7353 www: add new code coverage link to Polly website
llvm-svn: 282351
2016-09-25 08:03:38 +00:00
Tobias Grosser 349d1c3368 [ScopDetection] Remove redundant checks for endless loops
Summary:
Both `canUseISLTripCount()` and `addOverApproximatedRegion()` contained checks
to reject endless loops which are now removed and replaced by a single check
in `isValidLoop()`.

For reporting such loops the `ReportLoopOverlapWithNonAffineSubRegion` is
renamed to `ReportLoopHasNoExit`. The test case
`ReportLoopOverlapWithNonAffineSubRegion.ll` is adapted and renamed as well.

The schedule generation in `buildSchedule()` is based on the following
assumption:

Given some block B that is contained in a loop L and a SESE region R,
we assume that L is contained in R or the other way around.

However, this assumption is broken in the presence of endless loops that are
nested inside other loops. Therefore, in order to prevent erroneous behavior
in `buildSchedule()`, r265280 introduced a corresponding check in
`canUseISLTripCount()` to reject endless loops. Unfortunately, it was possible
to bypass this check with -polly-allow-nonaffine-loops which was fixed by adding
another check to reject endless loops in `allowOverApproximatedRegion()` in
r273905. Hence there existed two separate locations that handled this case.

Thank you Johannes Doerfert for helping to provide the above background
information.

Reviewers: Meinersbur, grosser

Subscribers: _jdoerfert, pollydev

Differential Revision: https://reviews.llvm.org/D24560

Contributed-by: Matthias Reisinger <d412vv1n@gmail.com>
llvm-svn: 281987
2016-09-20 17:05:22 +00:00
Tobias Grosser 122d6d74f6 Fix spelling in CMakeLists
llvm-svn: 281897
2016-09-19 10:55:31 +00:00
Tobias Grosser 05ee64e67a GPGPU: add missing REQUIRES line to test case
llvm-svn: 281850
2016-09-18 08:57:38 +00:00
Tobias Grosser bc653f2031 GPGPU: Do not run mostly sequential kernels in GPU
In case sequential kernels are found deeper in the loop tree than any parallel
kernel, the overall scop is probably mostly sequential. Hence, run it on the
CPU.

llvm-svn: 281849
2016-09-18 08:31:09 +00:00
Tobias Grosser 82f2af3508 GPGPU: Dynamically ensure 'sufficient compute'
Offloading to a GPU is only beneficial if there is a sufficient amount of
compute that can be accelerated. Many kernels just have a very small number
of dynamic compute, which means GPU acceleration is not beneficial. We
compute at run-time an approximation of how many dynamic instructions will be
executed and fall back to CPU code in case this number is not sufficiently
large. To keep the run-time checking code simple, we over-approximate the
number of instructions executed in each statement by computing the volume of
the rectangular hull of its iteration space.

llvm-svn: 281848
2016-09-18 06:50:35 +00:00
Tobias Grosser cfdee6582b GPGPU: Make test cases independent of register numbering [NFC]
llvm-svn: 281847
2016-09-18 06:50:28 +00:00
Tobias Grosser 51dfc27589 GPGPU: Store back non-read-only scalars
We may generate GPU kernels that store into scalars in case we run some
sequential code on the GPU because the remaining data is expected to already be
on the GPU. For these kernels it is important to not keep the scalar values
in thread-local registers, but to store them back to the corresponding device
memory objects that backs them up.

We currently only store scalars back at the end of a kernel. This is only
correct if precisely one thread is executed. In case more than one thread may
be run, we currently invalidate the scop. To support such cases correctly,
we would need to always load and store back from a corresponding global
memory slot instead of a thread-local alloca slot.

llvm-svn: 281838
2016-09-17 19:22:31 +00:00
Tobias Grosser fe74a7a1f5 GPGPU: Detect read-only scalar arrays ...
and pass these by value rather than by reference.

llvm-svn: 281837
2016-09-17 19:22:18 +00:00
Tobias Grosser 8f86a47461 Update CFGPrinter -> CFGPrinterLegacyPass
.. to match recent changes in LLVM that broke the Polly compilation.

llvm-svn: 281705
2016-09-16 05:48:09 +00:00
Tobias Grosser aaabbbf886 GPGPU: Do not assume arrays start at 0
Our alias checks precisely check that the minimal and maximal accessed elements
do not overlap in a kernel. Hence, we must ensure that our host <-> device
transfers do not touch additional memory locations that are not covered in
the alias check. To ensure this, we make sure that the data we copy for a
given array is only the data from the smallest element accessed to the largest
element accessed.

We also adjust the size of the array according to the offset at which the array
is actually accessed.

An interesting result of this is: In case array are accessed with negative
subscripts ,e.g., A[-100], we automatically allocate and transfer _more_ data to
cover the full array. This is important as such code indeed exists in the wild.

llvm-svn: 281611
2016-09-15 14:05:58 +00:00
Roman Gareev b3224adfb6 Perform copying to created arrays according to the packing transformation
This is the fourth patch to apply the BLIS matmul optimization pattern on matmul
kernels (http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf).
BLIS implements gemm as three nested loops around a macro-kernel, plus two
packing routines. The macro-kernel is implemented in terms of two additional
loops around a micro-kernel. The micro-kernel is a loop around a rank-1
(i.e., outer product) update. In this change we perform copying to created
arrays, which is the last step to implement the packing transformation.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D23260

llvm-svn: 281441
2016-09-14 06:26:09 +00:00
Tobias Grosser e8c69bbabd cmake: PollyPPCG depends on PollyISL
This line makes BUILD_SHARED_LIBS=ON work for Polly-ACC. Without it, ld
complains about missing isl symbols when constructing the shared library.

llvm-svn: 281396
2016-09-13 21:09:35 +00:00
Tobias Grosser 0a893f7df4 GPGPU: Use const_cast to avoid compiler warning [NFC]
llvm-svn: 281333
2016-09-13 13:22:27 +00:00
Michael Kruse 19c9d99f45 Use value directly instead of reference. NFC.
The alias to the array element is read-only and a primitive type (pointer),
therefore use the value directly instead of a reference to it.

llvm-svn: 281311
2016-09-13 09:56:05 +00:00
Tobias Grosser a82c4b5df8 GPGPU: Allow region statements
llvm-svn: 281305
2016-09-13 08:42:10 +00:00
Tobias Grosser b79f4d3970 GPGPU: Extend types when array sizes have smaller types
This prevents a compiler crash.

llvm-svn: 281303
2016-09-13 08:02:14 +00:00
Tobias Grosser b51d507c74 Adapt test case to recent change in Global Variable Definition
llvm-svn: 281295
2016-09-13 05:19:26 +00:00
Michael Kruse e5e752a28b Remove -fvisibility=hidden and FORCE_STATIC.
The flag -fvisibility=hidden flag was used for the integrated Integer
Set Library (and PPCG) to keep their definitions local to Polly. The
motivation was the be loaded into a DragonEgg-powered GCC, where GCC
might itself use ISL for its Graphite extension. The symbols of Polly's
ISL and GCC's ISL would clash.

The DragonEgg project is not actively developed anymore, but Polly's
unittests need to call ISL functions to set up a testing environment.
Unfortunately, the -fvisibility=hidden flag means that the ISL symbols
are not available to the gtest executable as it resides outside of
libPolly when linked dynamically. Currently, CMake links a second copy
of ISL into the unittests which leads to subtle bugs. What got observed
is that two isl_ids for isl_id_none exist, one for each library
instance. Because isl_id's are compared by address, isl_id_none could
happen to be different from isl_id_none, depending on which library
instance set the address and does the comparison.

Also remove the FORCE_STATIC flag which was introduced to keep the ISL
symbols visible inside the same libPolly shared object, even when build
with BUILD_SHARED_LIBS.

Differential Revision: https://reviews.llvm.org/D24460

llvm-svn: 281242
2016-09-12 18:25:00 +00:00
Roman Gareev f5aff70405 Store the size of the outermost dimension in case of newly created arrays that require memory allocation.
We do not need the size of the outermost dimension in most cases, but if we
allocate memory for newly created arrays, that size is needed.

Reviewed-by: Michael Kruse <llvm@meinersbur.de>

Differential Revision: https://reviews.llvm.org/D23991

llvm-svn: 281234
2016-09-12 17:08:31 +00:00
Tobias Grosser 5857b701a3 GPGPU: Bail out gracefully in case of invalid IR
Instead of aborting, we now bail out gracefully in case the kernel IR we
generate is invalid. This can currently happen in case the SCoP stores
pointer values, which we model as arrays, as data values into other arrays. In
this case, the original pointer value is not available on the device and can
consequently not be stored. As detecting this ahead of time is not so easy, we
detect these situations after the invalid IR has been generated and bail out.

llvm-svn: 281193
2016-09-12 06:06:31 +00:00
Tobias Grosser 0bf4cc6499 Add missing 'REQUIRES' line
llvm-svn: 281166
2016-09-11 13:42:42 +00:00
Tobias Grosser 02293ed755 GPGPU: Do not fail in case of arrays never accessed
If these arrays have never been accessed we failed to derive an upper bound
of the accesses and consequently a size for the outermost dimension. We
now explicitly check for empty access sets and then just use zero as size
for the outermost dimension.

llvm-svn: 281165
2016-09-11 13:30:12 +00:00
Tobias Grosser 89fda2bde2 GPURuntime: ensure compilation with C99
Otherwise, older compiler will error out on some of the C99 features we use.

llvm-svn: 281159
2016-09-11 07:32:50 +00:00
Tobias Grosser 5aea5653b3 FlattenAlgo: Ensure we _really_ obtain a param space
This resolves "isl_space.c:1775: not a parameter space" errors I have seen
on two systems.

llvm-svn: 281052
2016-09-09 16:11:26 +00:00
Tobias Grosser a6987a4ddd Add namespace specifier before nullptr_t
This fixes the following compile time errors:

  error: unknown type name 'nullptr_t'; did you mean 'std::nullptr_t'

llvm-svn: 281039
2016-09-09 12:31:38 +00:00
Tobias Grosser a3afe44d6c IslNodeBuilder: Add missing __isl_take annotation
llvm-svn: 281034
2016-09-09 11:16:50 +00:00
Michael Kruse 7886bd7ca5 Add -polly-flatten-schedule pass.
The -polly-flatten-schedule pass reduces the number of scattering
dimensions in its isl_union_map form to make them easier to understand.
It is not meant to be used in production, only for debugging and
regression tests.

To illustrate, how it can make sets simpler, here is a lifetime set
used computed by the porposed DeLICM pass without flattening:

    { Stmt_reduction_for[0, 4] -> [0, 2, o2, o3] : o2 < 0;
      Stmt_reduction_for[0, 4] -> [0, 1, o2, o3] : o2 >= 5;
      Stmt_reduction_for[0, 4] -> [0, 1, 4, o3] : o3 > 0;
      Stmt_reduction_for[0, i1] -> [0, 1, i1, 1] : 0 <= i1 <= 3;
      Stmt_reduction_for[0, 4] -> [0, 2, 0, o3] : o3 <= 0 }

And here the same lifetime for a semantically identical one-dimensional
schedule:

    { Stmt_reduction_for[0, i1] -> [2 + 3i1] : 0 <= i1 <= 4 }

Differential Revision: https://reviews.llvm.org/D24310

llvm-svn: 280948
2016-09-08 15:02:36 +00:00