Commit Graph

232 Commits

Author SHA1 Message Date
Tobias Grosser 5e6813d184 Derive run-time conditions for delinearization
As our delinearization works optimistically, we need in some cases run-time
checks that verify our optimistic assumptions. A simple example is the
following code:

void foo(long n, long m, long o, double A[n][m][o]) {

  for (long i = 0; i < 100; i++)
    for (long j = 0; j < 150; j++)
      for (long k = 0; k < 200; k++)
        A[i][j][k] = 1.0;
}

After clang linearized the access to A and we delinearized it again to
A[i][j][k] we need to ensure that we do not access the delinearized array
out of bounds (this information is not available in LLVM-IR). Hence, we
need to verify the following constraints at run-time:

CHECK:   Assumed Context:
CHECK:   [o, m] -> {  : m >= 150 and o >= 200 }
llvm-svn: 212198
2014-07-02 17:47:48 +00:00
Johannes Doerfert f618339a37 Introduce reduction types
This change is particularly useful in the code generation as we need
  to know which binary operator/identity element we need to combine/initialize
  the privatization locations.

  + Print the reduction type for each memory access
  + Adjusted the test cases to comply with the new output format and
    to test for the right reduction type

llvm-svn: 212126
2014-07-01 20:52:51 +00:00
Johannes Doerfert 9890a05287 [FIX] Don't consider reductions which are partially outside the SCoP
+ Test case

llvm-svn: 212080
2014-07-01 00:32:29 +00:00
Johannes Doerfert 1a62c7a34a [Fix] Deleted renamed test after r211957
llvm-svn: 211964
2014-06-27 21:48:42 +00:00
Johannes Doerfert e58a012094 Allow multiple reductions per statement
Iterate over all store memory accesses and check for valid binary reduction
  candidate loads by following the operands of the stored value.  For each
  candidate pair we check if they have the same base address and there are no
  other accesses which may overlap with them. This ensures that no intermediate
  value can escape into other memory locations or is overwritten at some point.

  + 17 test cases for reduction detection and reduction dependency modeling

llvm-svn: 211957
2014-06-27 20:31:28 +00:00
Andreas Simbuerger b379edbb3e Don't expand to invalid Scops with -polly-detect-keep-going
Enabling -keep-going in ScopDetection causes expansion to an invalid
Scop candidate.

Region A     <- Valid candidate
   |
Region B     <- Invalid candidate

If -keep-going is enabled, ScopDetection would expand A to A+B because
the RejectLog is never checked for errors during expansion.

With this patch only A becomes a valid Scop.

llvm-svn: 211875
2014-06-27 06:21:14 +00:00
Johannes Doerfert 76dd493eff [Fix] Broken tests after r211796.
llvm-svn: 211797
2014-06-26 19:29:11 +00:00
Johannes Doerfert f8ee915deb Use wrapped reduction dependences
This change will ease the transision to multiple reductions per statement as
  we can now distinguish the effects of multiple reductions in the same
  statement.

  + Wrapped reduction dependences are used to compute privatization dependences
  + Modified test cases to account for the change

llvm-svn: 211795
2014-06-26 18:44:14 +00:00
Johannes Doerfert ea23b1d561 Hybrid dependency analysis
This dependency analysis will keep track of memory accesses if they might be
  part of a reduction. If not, the dependences are tracked on a statement level.
  The main reason to do this is to reduce the compile time while beeing able to
  distinguish the effects of reduction and non-reduction accesses.

  + Adjusted two test cases

llvm-svn: 211794
2014-06-26 18:38:08 +00:00
Andreas Simbuerger 99d4ab2b84 Add diagnostic remark for ReportVariantBasePtr
llvm-svn: 211777
2014-06-26 13:33:35 +00:00
Andreas Simbuerger 5569bf300d Support the new DiagnosticRemarks
Add support for generating optimization remarks after completing the
detection of Scops.
The goal is to provide end-users with useful hints about opportunities that
help to increase the size of the detected Scops in their code.

By default the remark is unspecified and the debug location is empty. Future
patches have to expand on the messages generated.

This patch brings a simple test case for ReportFuncCall to demonstrate the
feature.

Reports all missed opportunities to increase the size/number of valid
Scops:
 clang <...> -Rpass-missed="polly-detect" <...>
 opt <...> -pass-remarks-missed="polly-detect" <...>

Reports beginning and end of all valid Scops:
 clang <...> -Rpass="polly-detect" <...>
 opt <...> -pass-remarks="polly-detect" <...>

Differential Revision: http://reviews.llvm.org/D4171

llvm-svn: 211769
2014-06-26 10:06:40 +00:00
Tobias Grosser 50a5e6dac0 test/ScopInfo: Remove %defaultOpts and list passes explicitly
Due to bad habit we sometimes used a variable %defaultOpts that listed
a set of passes commonly run to prepare for Polly. None of these test cases
actually needs special preparation and only two of them need the 'basicaa' to
be scheduled. Scheduling the required alias analysis explicitly makes the test
cases clearer.

llvm-svn: 211671
2014-06-25 06:38:18 +00:00
Tobias Grosser 08031390d5 Clean up XFAILed test cases
We had a set of test cases that have been incomplete and XFAILED. This patch
completes a couple of the interesting ones and removes the ones which seem
redundant or not sufficiently reduced to be useful.

llvm-svn: 211670
2014-06-25 06:31:19 +00:00
Johannes Doerfert 5e275bc83a [Refactor] Create nicer test cases from C/C++
Insert a header into the new testcase containing a sample RUN line a FIXME and
an XFAIL. Then insert the formated C code and finally the LLVM-IR without
attributes, the module ID or the target triple.

llvm-svn: 211612
2014-06-24 17:02:53 +00:00
Yabin Hu cc91169fd7 Remove use of llvm.codegen intrinsic for GPGPU codegen
We use llvm.codegen intrinsic to generate code for embedded LLVM-IR
strings. The reason we introduce such a intrinsic is that previous
clang/opt tools was NOT linked with various LLVM targets and their
AsmParsers and AsmPrinters. Since clang/opt been linked with all the
needed libraries, we no longer need the llvm.codegen intrinsic.

llvm-svn: 211573
2014-06-24 08:11:36 +00:00
Johannes Doerfert f1906138b4 Model statement wise reduction dependences
+ Collect reduction dependences
+ Introduced TYPE_RED in Dependences.h which can be used to obtain the
  reduction dependences
+ Used TYPE_RED to prevent parallelization while we do not have a privatizing
  code generation
+ Relax the dependences for non-parallel code generation
+ Add privatization dependences to ensure correctness
+ 12 Test cases to check for reduction and privatization dependences

llvm-svn: 211369
2014-06-20 16:37:11 +00:00
Johannes Doerfert da80386700 Missing reduction detection test cases
llvm-svn: 211235
2014-06-18 23:08:14 +00:00
Tobias Grosser f4fcbf4097 Test delinearization of 2D diagonal matrix
llvm-svn: 210538
2014-06-10 14:48:17 +00:00
Tobias Grosser be7eaddc69 Adjust another test case to not access out of bounds
llvm-svn: 210208
2014-06-04 19:41:47 +00:00
Tobias Grosser 5416a0395f Adjust multidim test cases to not access out-of-bound memory
We do this currently only for test cases where we have integer offsets that
clearly access array dimensions out-of-bound.

-;   for (long i = 0; i < n; i++)
-;     for (long j = 0; j < m; j++)
-;       for (long k = 0; k < o; k++)
+;   for (long i = 0; i < n - 3; i++)
+;     for (long j = 4; j < m; j++)
+;       for (long k = 0; k < o - 7; k++)
 ;         A[i+3][j-4][k+7] = 1.0;

This will be helpful if we later want to simplify the access functions under the
assumption that they do not access memory out of bounds.

llvm-svn: 210179
2014-06-04 11:47:54 +00:00
Sebastian Pop 422e33f363 record delinearization result and reuse it in polyhedral translation
Without this patch, the testcase would fail on the delinearization of the second
array:

; void foo(long n, long m, long o, double A[n][m][o]) {
;   for (long i = 0; i < n; i++)
;     for (long j = 0; j < m; j++)
;       for (long k = 0; k < o; k++) {
;         A[i+3][j-4][k+7] = 1.0;
;         A[i][0][k] = 2.0;
;       }
; }

; CHECK: [n, m, o] -> { Stmt_for_body6[i0, i1, i2] -> MemRef_A[3 + i0, -4 + i1, 7 + i2] };
; CHECK: [n, m, o] -> { Stmt_for_body6[i0, i1, i2] -> MemRef_A[i0, 0, i2] };

Here is the output of FileCheck on the testcase without this patch:

; CHECK: [n, m, o] -> { Stmt_for_body6[i0, i1, i2] -> MemRef_A[i0, 0, i2] };
         ^
<stdin>:26:2: note: possible intended match here
 [n, m, o] -> { Stmt_for_body6[i0, i1, i2] -> MemRef_A[o0] };
 ^

It is possible to find a good delinearization for A[i][0][k] only in the context
of the delinearization of both array accesses.

There are two ways to delinearize together all array subscripts touching the
same base address: either duplicate the code from scop detection to first gather
all array references and then run the delinearization; or as implemented in this
patch, use the same delinearization info that we computed during scop detection.

llvm-svn: 210117
2014-06-03 18:16:31 +00:00
Johannes Doerfert c3958b214c Added option for n-dimensional rectangular tiling
+ CL-option --polly-tile-sizes=<int,...,int>
  The i'th value is used as a tile size for dimension i, if
  there is no i'th value, the value of --polly-default-tile-size is
  used

+ CL-option --polly-default-tile-size=int
  Used if no tile size is given for a dimension i

+ 3 Simple testcases

llvm-svn: 209753
2014-05-28 17:21:02 +00:00
Tobias Grosser 5f860fdfe9 Do not run GPGPU test cases without nvptx target
Tag the GPGPU codegen test cases as unsupported if the nvptx target is not
included in the current llvm build.

Contributed-by:  Yabin Hu <yabin.hwu@gmail.com>
llvm-svn: 208779
2014-05-14 14:18:14 +00:00
Sebastian Pop c5c1055e3f do not build llc and lli for polly test
llvm-svn: 208619
2014-05-12 19:43:20 +00:00
Sebastian Pop e8863b8f00 correct the delinearization failing case
collect terms from affine and non affine memory accesses

llvm-svn: 208616
2014-05-12 19:02:02 +00:00
Sebastian Pop fcf68758b8 unxfail passing testcase
llvm-svn: 208233
2014-05-07 18:01:32 +00:00
Tobias Grosser f56af204b9 Add delinearization testcase for ivs that do not follow the loop order
This is a test case that is currently failing, but that should start working
with an upcoming version of our delinearization pass.

llvm-svn: 207678
2014-04-30 17:49:22 +00:00
Tobias Grosser 841009a2cc We missed two files in the last commit.
Contributed-by:  Johannes Doerfert <doerfert@cs.uni-saarland.de>
llvm-svn: 206901
2014-04-22 15:57:30 +00:00
Tobias Grosser 0d11dbabc4 Fixed missing cloog test with automake/configure build setup
Contributed-by:  Johannes Doerfert <doerfert@cs.uni-saarland.de>
llvm-svn: 206900
2014-04-22 15:30:43 +00:00
Tobias Grosser 954939842f Really fix the load case.
Commit r206510 falsely advertised to fix the load cases, even though it only
fixed the store case. This commit adds the same fix for the load case including
the missing test coverage.

llvm-svn: 206577
2014-04-18 09:46:35 +00:00
Tobias Grosser 50fd7010d8 Ensure a scalar pointer when issuing a vector load
Even tough we may want to generate a vector load, the address from which to load
still is a scalar. Make sure even if previous address computations may have been
vectorized, that the addresses are also available as scalars.

This fixes http://llvm.org/PR19469

Reported-by:  Jeremy Huddleston Sequoia <jeremyhu@apple.com>
llvm-svn: 206510
2014-04-17 23:13:49 +00:00
Tobias Grosser 75b76729ab Fix for vector codegen in OpenMP subfunctions
Contributed-by: Johannes Doerfert <doerfert@cs.uni-saarland.de>
llvm-svn: 206332
2014-04-15 22:30:06 +00:00
Tobias Grosser 364c136d08 Dependences: Do not fail in case a schedule eliminates all dependences
The following example shows a non-parallel loop

void f(int a[]) {
  int i;
  for (i = 0; i < 10; ++i)
    A[i] = A[i+5];
}

which, in case we import a schedule that limits the iteration domain
to 0 <= i < 5, becomes parallel. Previously we crashed in such cases, now we
just recognize it as parallel.

This fixes http://llvm.org/PR19435

Reported-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
llvm-svn: 206318
2014-04-15 20:14:57 +00:00
Tobias Grosser efc3013544 Codegeneration: Free memory correctly when using -polly-vectorizer=polly
This fixes PR19421.

Reported-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
llvm-svn: 206156
2014-04-14 08:33:24 +00:00
Sebastian Pop cd3bb59aa2 only delinearize when the access function is not affine
llvm-svn: 205971
2014-04-10 16:08:11 +00:00
Tobias Grosser 79baa21242 ScopInfo: Scalar accesses are zero dimensional
llvm-svn: 205958
2014-04-10 08:38:02 +00:00
Sebastian Pop 1801668af3 delinearize memory access functions
llvm-svn: 205799
2014-04-08 21:20:44 +00:00
Tobias Grosser 64b95123ef Delete trivial PHI nodes (aka stack slot sharing)
During code preperation trivial PHI nodes (mainly introduced by lcssa) are
deleted to decrease the number of introduced allocas (==> dependences). However
simply replacing them by their only incoming value would cause the independent
block pass to introduce new allocas. To prevent this we try to share stack slots
during code preperarion, hence to reuse a already created alloca 'to demote' the
trivial PHI node. This works if we know that the value stored in this alloca
will be the incoming value of the trivial PHI at the end of the predecessor
block of this trivial PHI.

Contributed-by: Johannes Doerfert <doerfert@cs.uni-saarland.de>
llvm-svn: 205320
2014-04-01 16:01:33 +00:00
Tobias Grosser 5fa36c0ff6 Updated test/create_ll.sh to work with old & new clang versions.
We explicitly specifying all filenames instead of assuming some naming
convention used by clang and opt.

Contributed-by: Johannes Doerfert <doerfert@cs.uni-saarland.de>
llvm-svn: 204726
2014-03-25 15:50:44 +00:00
Tobias Grosser e275e9216b Return conservative result in case the dependence check timed out
For complex examples it may happen that we do not compute dependences. In this
case we do not want to crash, but just not detect parallel loops.

llvm-svn: 204470
2014-03-21 15:12:09 +00:00
Tobias Grosser 0dd463facf Support for generating vectors for loads with -1 stride
This patch enables vectorization of loops containing backward array
traversal (array stride is -1).

Contributed-by: Chris Jenneisch <chrisj@codeaurora.org>
llvm-svn: 204257
2014-03-19 19:27:24 +00:00
Tobias Grosser 8111a0ae7d autoconf: Fix module loading in tests
llvm-svn: 203925
2014-03-14 13:27:26 +00:00
Sebastian Pop 7537be92f4 add -load polly.so only when not LINK_POLLY_INTO_TOOLS
llvm-svn: 203888
2014-03-14 04:04:36 +00:00
Rafael Espindola 80f20133d4 Fix polly tests to not include aliases to declarations.
llvm-svn: 203721
2014-03-12 21:48:42 +00:00
Sebastian Pop 1b57e8f028 add dependence of check-polly on llc
to avoid an error when directly doing ninja check-polly after cmake
'Could not find llc in .../ninja/bin'.

llvm-svn: 203696
2014-03-12 18:55:25 +00:00
Tobias Grosser 4ba60fe9eb ScheduleOptimizer: Fix prevectorization.
In case we are at the innermost band, we try to prepare for vectorization. This
means, we look for the innermost parallel loop and strip mine this loop to the
innermost level using a strip-mine factor corresponding to the number of vector
iterations.

For whatever reason, the code that implemented this feature was broken. We now
added a comment, a test case and obviously also the right code.

llvm-svn: 203544
2014-03-11 06:27:36 +00:00
Tobias Grosser e655754d57 Update CLooG and some test cases
This is necessary to avoid test failures in the CLooG test suite due to the
recent isl update.

We also need to update two polly test cases which rely on a certain order in the
textual description that isl chooses for its sets and maps. Changes here are not
often, but we should probably switch to a check that verifies such maps are
semantically equivalent instead of represented identically.

llvm-svn: 203476
2014-03-10 17:31:22 +00:00
Tobias Grosser 37c9b8e0f2 Emit llvm.loop metadata for parallel loops
For now we only mark innermost loops for the loop vectorizer.  We could later
also mark not-innermost loops to enable the introduction of openmp parallelism.

llvm-svn: 202854
2014-03-04 14:59:00 +00:00
Tobias Grosser 356faa8f09 Dead code elimination: Schedule another approximative step before actual DCE
In 'obsequi' we have a scop in which the current dead code elimination works,
but the generated code is way too complex. To avoid this trouble (and to not
disable the DCE entirely) we add an additional approximative step before
the actual dead code elimination. This should fix one of the two current
nightly-test issues.

Polly could be improved to handle 'obsequi' by teaching it to introduce only a
single parameter for (%1 and zext %1) which halves the number of parameters and
allows polly to derive a simpler representation for the set of live iterations.
However, this needs some time to investigate.

I will commit a test case as soon as we have a reduced one.

llvm-svn: 202010
2014-02-24 08:52:20 +00:00
Tobias Grosser 472d3b7037 codegen: Update LoopInfo correctly
Add the 'polly.start' basic block to the loop that surrounds the scop we just
codegenerate.

This fixes PR13441

llvm-svn: 202000
2014-02-24 00:50:49 +00:00