Commit Graph

772 Commits

Author SHA1 Message Date
Philip Reames 963febd4f8 Fix for pr24866
Turns out that not every basic block is guaranteed to have a node within the DominatorTree.  This is really hard to trigger, but the test case from the PR managed to do so.  There's active discussion continuing about what documentation and/or invariants needed cleaned up.

llvm-svn: 248216
2015-09-21 22:04:10 +00:00
Artur Pilipenko 84bc62f7a3 Support align attribute for return values
Reviewed By: reames

Differential Revision: http://reviews.llvm.org/D12844

llvm-svn: 247984
2015-09-18 12:33:31 +00:00
David Blaikie 2f40830dde [opaque pointer type] Add textual IR support for explicit type parameter for global aliases
update.py:
import fileinput
import sys
import re

alias_match_prefix = r"(.*(?:=|:|^)\s*(?:external |)(?:(?:private|internal|linkonce|linkonce_odr|weak|weak_odr|common|appending|extern_weak|available_externally) )?(?:default |hidden |protected )?(?:dllimport |dllexport )?(?:unnamed_addr |)(?:thread_local(?:\([a-z]*\))? )?alias"
plain = re.compile(alias_match_prefix + r" (.*?))(| addrspace\(\d+\) *)\*($| *(?:%|@|null|undef|blockaddress|addrspacecast|\[\[[a-zA-Z]|\{\{).*$)")
cast  = re.compile(alias_match_prefix + r") ((?:bitcast|inttoptr|addrspacecast)\s*\(.* to (.*?)(| addrspace\(\d+\) *)\*\)\s*(?:;.*)?$)")
gep   = re.compile(alias_match_prefix + r") ((?:getelementptr)\s*(?:inbounds)?\s*\((?P<type>.*), (?P=type)(?:\s*addrspace\(\d+\)\s*)?\* .*\)\s*(?:;.*)?$)")

def conv(line):
  m = re.match(cast, line)
  if m:
    return m.group(1) + " " + m.group(3) + ", " + m.group(2)
  m = re.match(gep, line)
  if m:
    return m.group(1) + " " + m.group(3) + ", " + m.group(2)
  m = re.match(plain, line)
  if m:
    return m.group(1) + ", " + m.group(2) + m.group(3) + "*" + m.group(4) + "\n"
  return line

for line in sys.stdin:
  sys.stdout.write(conv(line))

apply.sh:
for name in "$@"
do
  python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name"
  rm -f "$name.tmp"
done

The actual commands:
From llvm/src:
find test/ -name *.ll | xargs ./apply.sh
From llvm/src/tools/clang:
find test/ -name *.mm -o -name *.m -o -name *.cpp -o -name *.c | xargs -I '{}' ../../apply.sh "{}"
From llvm/src/tools/polly:
find test/ -name *.ll | xargs ./apply.sh

llvm-svn: 247378
2015-09-11 03:22:04 +00:00
Matthew Simpson ddb4d9741f [SCEV] Consistently Handle Expressions That Cannot Be Divided
This patch addresses the issue of SCEV division asserting on some
input expressions (e.g., non-affine expressions) and quietly giving
up on others.  When giving up, we set the quotient to be equal to
zero and the remainder to be equal to the numerator. With this
patch, we always quietly give up when we cannot perform the
division.

This patch also adds a test case for DependenceAnalysis that
previously caused an assertion.

Differential Revision: http://reviews.llvm.org/D11725

llvm-svn: 247314
2015-09-10 18:12:47 +00:00
Sanjoy Das f3132d3b03 [ScalarEvolution] Fix PR24757.
Summary:
PR24757 was caused by some incorect math in
`ScalarEvolution::HowFarToZero` -- the smallest unsigned solution for X
in

  2^N * A = 2^N * X

is not necessarily A.

Reviewers: atrick, majnemer, meheff

Subscribers: llvm-commits, sanjoy

Differential Revision: http://reviews.llvm.org/D12721

llvm-svn: 247242
2015-09-10 05:27:38 +00:00
Piotr Padlewski 0dde00d239 ScalarEvolution assume hanging bugfix
http://reviews.llvm.org/D12719

llvm-svn: 247184
2015-09-09 20:47:30 +00:00
Chandler Carruth 7b560d40bd [PM/AA] Rebuild LLVM's alias analysis infrastructure in a way compatible
with the new pass manager, and no longer relying on analysis groups.

This builds essentially a ground-up new AA infrastructure stack for
LLVM. The core ideas are the same that are used throughout the new pass
manager: type erased polymorphism and direct composition. The design is
as follows:

- FunctionAAResults is a type-erasing alias analysis results aggregation
  interface to walk a single query across a range of results from
  different alias analyses. Currently this is function-specific as we
  always assume that aliasing queries are *within* a function.

- AAResultBase is a CRTP utility providing stub implementations of
  various parts of the alias analysis result concept, notably in several
  cases in terms of other more general parts of the interface. This can
  be used to implement only a narrow part of the interface rather than
  the entire interface. This isn't really ideal, this logic should be
  hoisted into FunctionAAResults as currently it will cause
  a significant amount of redundant work, but it faithfully models the
  behavior of the prior infrastructure.

- All the alias analysis passes are ported to be wrapper passes for the
  legacy PM and new-style analysis passes for the new PM with a shared
  result object. In some cases (most notably CFL), this is an extremely
  naive approach that we should revisit when we can specialize for the
  new pass manager.

- BasicAA has been restructured to reflect that it is much more
  fundamentally a function analysis because it uses dominator trees and
  loop info that need to be constructed for each function.

All of the references to getting alias analysis results have been
updated to use the new aggregation interface. All the preservation and
other pass management code has been updated accordingly.

The way the FunctionAAResultsWrapperPass works is to detect the
available alias analyses when run, and add them to the results object.
This means that we should be able to continue to respect when various
passes are added to the pipeline, for example adding CFL or adding TBAA
passes should just cause their results to be available and to get folded
into this. The exception to this rule is BasicAA which really needs to
be a function pass due to using dominator trees and loop info. As
a consequence, the FunctionAAResultsWrapperPass directly depends on
BasicAA and always includes it in the aggregation.

This has significant implications for preserving analyses. Generally,
most passes shouldn't bother preserving FunctionAAResultsWrapperPass
because rebuilding the results just updates the set of known AA passes.
The exception to this rule are LoopPass instances which need to preserve
all the function analyses that the loop pass manager will end up
needing. This means preserving both BasicAAWrapperPass and the
aggregating FunctionAAResultsWrapperPass.

Now, when preserving an alias analysis, you do so by directly preserving
that analysis. This is only necessary for non-immutable-pass-provided
alias analyses though, and there are only three of interest: BasicAA,
GlobalsAA (formerly GlobalsModRef), and SCEVAA. Usually BasicAA is
preserved when needed because it (like DominatorTree and LoopInfo) is
marked as a CFG-only pass. I've expanded GlobalsAA into the preserved
set everywhere we previously were preserving all of AliasAnalysis, and
I've added SCEVAA in the intersection of that with where we preserve
SCEV itself.

One significant challenge to all of this is that the CGSCC passes were
actually using the alias analysis implementations by taking advantage of
a pretty amazing set of loop holes in the old pass manager's analysis
management code which allowed analysis groups to slide through in many
cases. Moving away from analysis groups makes this problem much more
obvious. To fix it, I've leveraged the flexibility the design of the new
PM components provides to just directly construct the relevant alias
analyses for the relevant functions in the IPO passes that need them.
This is a bit hacky, but should go away with the new pass manager, and
is already in many ways cleaner than the prior state.

Another significant challenge is that various facilities of the old
alias analysis infrastructure just don't fit any more. The most
significant of these is the alias analysis 'counter' pass. That pass
relied on the ability to snoop on AA queries at different points in the
analysis group chain. Instead, I'm planning to build printing
functionality directly into the aggregation layer. I've not included
that in this patch merely to keep it smaller.

Note that all of this needs a nearly complete rewrite of the AA
documentation. I'm planning to do that, but I'd like to make sure the
new design settles, and to flesh out a bit more of what it looks like in
the new pass manager first.

Differential Revision: http://reviews.llvm.org/D12080

llvm-svn: 247167
2015-09-09 17:55:00 +00:00
Silviu Baranga a3e27edb5d [CostModel][AArch64] Remove amortization factor for some of the vector select instructions
Summary:
We are not scalarizing the wide selects in codegen for i16 and i32 and
therefore we can remove the amortization factor. We still have issues
with i64 vectors in codegen though.

Reviewers: mcrosier

Subscribers: mcrosier, aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D12724

llvm-svn: 247156
2015-09-09 15:35:02 +00:00
Diego Novillo f9aa39b0cf Fix PR 24723 - Handle 0-mass backedges in irreducible loops
This corner case happens when we have an irreducible SCC that is
deeply nested.  As we work down the tree, the backedge masses start
getting smaller and smaller until we reach one that is down to 0.

Since we distribute the incoming mass using the backedge masses as
weight, the distributor does not allow zero weights.  So, we simply
ignore them (which will just use the weights of the non-zero nodes).

llvm-svn: 247050
2015-09-08 19:22:17 +00:00
Hal Finkel f11bc761d8 [PowerPC] Include the permutation cost for unaligned vector loads
Pre-P8, when we generate code for unaligned vector loads (for Altivec and QPX
types), even when accounting for the combining that takes place for multiple
consecutive such loads, there is at least one load instructions and one
permutation for each load. Make sure the cost reported reflects the cost of the
permutes as well.

llvm-svn: 246807
2015-09-03 21:23:18 +00:00
Hal Finkel 79dbf5b562 [PowerPC] Cleanup cost model for unaligned vector loads/stores
I'm adding a regression test to better cover code generation for unaligned
vector loads and stores, but there's no functional change to the code
generation here. There is an improvement to the cost model for unaligned vector
loads and stores, mostly for QPX (for which we were not previously accounting
for the permutation-based loads), and the cost model implementation is cleaner.

llvm-svn: 246712
2015-09-02 21:03:28 +00:00
Quentin Colombet 5989bc6f41 [BasicAA] Fix the handling of sext and zext in the analysis of GEPs.
Hopefully this will end the GEPs saga!

This commit reverts r245394, i.e., it reapplies r221876 while incorporating the
fixes from D11847.
r221876 was not reapplied alone because it was not safe and D11847 was not
applied alone because it needs r221876 to produce correct results.

This should fix PR24596.

Original commit message for r221876:
Let's try this again...

This reverts r219432, plus a bug fix.

Description of the bug in r219432 (by Nick):

The bug was using AllPositive to break out of the loop; if the loop break
condition i != e is changed to i != e && AllPositive then the
test_modulo_analysis_with_global test I've added will fail as the Modulo will
be calculated incorrectly (as the last loop iteration is skipped, so Modulo
isn't updated with its Scale).

Nick also adds this comment:

ComputeSignBit is safe to use in loops as it takes into account phi nodes, and
the  == EK_ZeroEx check is safe in loops as, no matter how the variable changes
between iterations, zero-extensions will always guarantee a zero sign bit. The
isValueEqualInPotentialCycles check is therefore definitely not needed as all
the variable analysis holds no matter how the variables change between loop
iterations.

And this patch also adds another enhancement to GetLinearExpression - basically
to convert ConstantInts to Offsets (see test_const_eval and
test_const_eval_scaled for the situations this improves).

Original commit message:

This reverts r218944, which reverted r218714, plus a bug fix.

Description of the bug in r218714 (by Nick):

The original patch forgot to check if the Scale in VariableGEPIndex flipped the
sign of the variable. The BasicAA pass iterates over the instructions in the
order they appear in the function, and so BasicAliasAnalysis::aliasGEP is
called with the variable it first comes across as parameter GEP1. Adding a
%reorder label puts the definition of %a after %b so aliasGEP is called with %b
as the first parameter and %a as the second. aliasGEP later calculates that %a
== %b + 1 - %idxprom where %idxprom >= 0 (if %a was passed as the first
parameter it would calculate %b == %a - 1 + %idxprom where %idxprom >= 0) -
ignoring that %idxprom is scaled by -1 here lead the patch to incorrectly
conclude that %a > %b.

Revised patch by Nick White, thanks! Thanks to Lang to isolating the bug.
Slightly modified by me to add an early exit from the loop and avoid
unnecessary, but expensive, function calls.

Original commit message:

Two related things:

1. Fixes a bug when calculating the offset in GetLinearExpression. The code
   previously used zext to extend the offset, so negative offsets were converted
   to large positive ones.

2. Enhance aliasGEP to deduce that, if the difference between two GEP
   allocations is positive and all the variables that govern the offset are also
   positive (i.e. the offset is strictly after the higher base pointer), then
   locations that fit in the gap between the two base pointers are NoAlias.

Patch by Nick White!

Message from D11847:
Un-revert of r241981 and fix for PR23626. The 'Or' case of GetLinearExpression
delegates to 'Add' if possible, and if not it returns an Opaque value.
Unfortunately the Scale and Offsets weren't being set (and so defaulted to 0) -
and a scale of zero effectively removes the variable from the GEP instruction.
This meant that BasicAA would return MustAliases when it should have been
returning PartialAliases (and PR23626 was an example of the GVN pass using an
incorrect MustAlias to merge loads from what should have been different
pointers).

Differential Revision: http://reviews.llvm.org/D11847
Patch by Nick White <n.j.white@gmail.com>!

llvm-svn: 246502
2015-08-31 22:32:47 +00:00
George Burgess IV 68b36e01da Fix: CFLAA -- Mark no-args returns as unknown
Prior to this patch, we hadn't been marking StratifiedSets with the
appropriate StratifiedAttrs when handling the result of no-args call
instructions. This caused us to report NoAlias when handed, for
example, an escaped alloca and a result from an opaque function. Now we
properly mark the return value of said functions.

Thanks again to Chandler, Richard, and Nick for pinging me about this.

Differential review: http://reviews.llvm.org/D12408

llvm-svn: 246240
2015-08-28 00:16:18 +00:00
Hal Finkel 0ef2b10f16 Fix how DependenceAnalysis calls delinearization
Fix how DependenceAnalysis calls delinearization, mirroring what is done in
Delinearization.cpp (mostly by making sure to call getSCEVAtScope before
delinearizing, and by removing the unnecessary 'Pairs == 1' check).

Patch by Vaivaswatha Nagaraj!

llvm-svn: 245408
2015-08-19 02:56:36 +00:00
Quentin Colombet 861ad97e6f [BasicAA] Add a test for PR24468 to be sure we won't regress
when we finally get the GEP aliasing right.

llvm-svn: 245395
2015-08-19 00:08:26 +00:00
Quentin Colombet b700e357b5 [BasicAA] Revert r221876 because it can produce incorrect aliasing
information: see PR24468.

llvm-svn: 245394
2015-08-19 00:07:20 +00:00
Silviu Baranga d5ac26937c [CostModel][ARM] Increase cost of insert/extract operations
Summary:
This change limits the minimum cost of an insert/extract
element operation to 2 in cases where this would result
in mixing of NEON and VFP code.

Reviewers: rengolin

Subscribers: mssimpso, aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D12030

llvm-svn: 245225
2015-08-17 15:57:05 +00:00
Artur Pilipenko 34d8ba84c8 Take alignment into account in isSafeToSpeculativelyExecute and isSafeToLoadUnconditionally.
Reviewed By: hfinkel, sanjoy, MatzeB

Differential Revision: http://reviews.llvm.org/D9791

llvm-svn: 245223
2015-08-17 15:54:26 +00:00
Michael Kuperstein adc4e9c414 [GMR] isNonEscapingGlobalNoAlias() should look through Bitcasts/GEPs when looking at loads.
This fixes yet another case from PR24288.

Differential Revision: http://reviews.llvm.org/D12064

llvm-svn: 245207
2015-08-17 10:06:08 +00:00
Chandler Carruth 2f1fd1658f [PM] Port ScalarEvolution to the new pass manager.
This change makes ScalarEvolution a stand-alone object and just produces
one from a pass as needed. Making this work well requires making the
object movable, using references instead of overwritten pointers in
a number of places, and other refactorings.

I've also wired it up to the new pass manager and added a RUN line to
a test to exercise it under the new pass manager. This includes basic
printing support much like with other analyses.

But there is a big and somewhat scary change here. Prior to this patch
ScalarEvolution was never *actually* invalidated!!! Re-running the pass
just re-wired up the various other analyses and didn't remove any of the
existing entries in the SCEV caches or clear out anything at all. This
might seem OK as everything in SCEV that can uses ValueHandles to track
updates to the values that serve as SCEV keys. However, this still means
that as we ran SCEV over each function in the module, we kept
accumulating more and more SCEVs into the cache. At the end, we would
have a SCEV cache with every value that we ever needed a SCEV for in the
entire module!!! Yowzers. The releaseMemory routine would dump all of
this, but that isn't realy called during normal runs of the pipeline as
far as I can see.

To make matters worse, there *is* actually a key that we don't update
with value handles -- there is a map keyed off of Loop*s. Because
LoopInfo *does* release its memory from run to run, it is entirely
possible to run SCEV over one function, then over another function, and
then lookup a Loop* from the second function but find an entry inserted
for the first function! Ouch.

To make matters still worse, there are plenty of updates that *don't*
trip a value handle. It seems incredibly unlikely that today GVN or
another pass that invalidates SCEV can update values in *just* such
a way that a subsequent run of SCEV will incorrectly find lookups in
a cache, but it is theoretically possible and would be a nightmare to
debug.

With this refactoring, I've fixed all this by actually destroying and
recreating the ScalarEvolution object from run to run. Technically, this
could increase the amount of malloc traffic we see, but then again it is
also technically correct. ;] I don't actually think we're suffering from
tons of malloc traffic from SCEV because if we were, the fact that we
never clear the memory would seem more likely to have come up as an
actual problem before now. So, I've made the simple fix here. If in fact
there are serious issues with too much allocation and deallocation,
I can work on a clever fix that preserves the allocations (while
clearing the data) between each run, but I'd prefer to do that kind of
optimization with a test case / benchmark that shows why we need such
cleverness (and that can test that we actually make it faster). It's
possible that this will make some things faster by making the SCEV
caches have higher locality (due to being significantly smaller) so
until there is a clear benchmark, I think the simple change is best.

Differential Revision: http://reviews.llvm.org/D12063

llvm-svn: 245193
2015-08-17 02:08:17 +00:00
Bjarke Hammersholt Roune 9791ed4705 [SCEV] Apply NSW and NUW flags via poison value analysis for sub, mul and shl
Summary:
http://reviews.llvm.org/D11212 made Scalar Evolution able to propagate NSW and NUW flags from instructions to SCEVs for add instructions. This patch expands that to sub, mul and shl instructions.

This change makes LSR able to generate pointer induction variables for loops like these, where the index is 32 bit and the pointer is 64 bit:

  for (int i = 0; i < numIterations; ++i)
    sum += ptr[i - offset];

  for (int i = 0; i < numIterations; ++i)
    sum += ptr[i * stride];

  for (int i = 0; i < numIterations; ++i)
    sum += ptr[3 * (i << 7)];


Reviewers: atrick, sanjoy

Subscribers: sanjoy, majnemer, hfinkel, llvm-commits, meheff, jingyue, eliben

Differential Revision: http://reviews.llvm.org/D11860

llvm-svn: 245118
2015-08-14 22:45:26 +00:00
Igor Laevsky 30143aee11 Emit argmemonly attribute for intrinsics.
Differential Revision: http://reviews.llvm.org/D11352

llvm-svn: 244920
2015-08-13 17:40:04 +00:00
Adam Nemet abc794d3db [LAA] Fix typo in test
llvm-svn: 244690
2015-08-11 23:03:09 +00:00
Michael Kuperstein 07f31d92ca [GMR] Be a bit smarter about which globals don't alias when doing recursive lookups
Should hopefully fix the remainder of PR24288.

Differential Revision: http://reviews.llvm.org/D11900

llvm-svn: 244575
2015-08-11 08:06:44 +00:00
Jonathan Roelofs 49e46ce8e2 Fix a bunch of trivial cases of 'CHECK[^:]*$' in the tests. NFCI
I looked into adding a warning / error for this to FileCheck, but there doesn't
seem to be a good way to avoid it triggering on the instances of it in RUN lines.

llvm-svn: 244481
2015-08-10 19:01:27 +00:00
Chandler Carruth 405e4f9051 [GMR] Teach the conservative path of GMR to catch even more easy cases.
In PR24288 it was pointed out that the easy case of a non-escaping
global and something that *obviously* required an escape sometimes is
hidden behind PHIs (or selects in theory). Because we have this binary
test, we can easily just check that all possible input values satisfy
the requirement. This is done with a (very small) recursion through PHIs
and selects. With this, the specific example from the PR is correctly
folded by GVN.

Differential Revision: http://reviews.llvm.org/D11707

llvm-svn: 244078
2015-08-05 17:58:30 +00:00
Simon Pilgrim 86478c6909 [X86][SSE] Vectorize i64 ASHR operations
This patch vectorizes the v2i64/v4i64 ASHR shift operations - the last remaining integer vector shifts that are still being transferred to/from the scalar unit to be completed.

Differential Revision: http://reviews.llvm.org/D11439

llvm-svn: 243569
2015-07-29 20:31:45 +00:00
Jingyue Wu 42f1d67a45 [SCEV] Apply NSW and NUW flags via poison value analysis
Summary:
Make Scalar Evolution able to propagate NSW and NUW flags from instructions to SCEVs in some cases. This is based on reasoning about when poison from instructions with these flags would trigger undefined behavior. This gives a 13% speed-up on some Eigen3-based Google-internal microbenchmarks for NVPTX.

There does not seem to be clear agreement about when poison should be considered to propagate through instructions. In this analysis, poison propagates only in cases where that should be uncontroversial.

This change makes LSR able to create induction variables for expressions like &ptr[i + offset] for loops like this:

  for (int i = 0; i < limit; ++i) {
    sum += ptr[i + offset];
  }

Here ptr is a 64 bit pointer and offset is a 32 bit integer. For NVPTX, LSR currently creates an induction variable for i + offset instead, which is not as fast. Improving this situation is what brings the 13% speed-up on some Eigen3-based Google-internal microbenchmarks for NVPTX.


There are more details in this discussion on llvmdev.
June: http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-June/thread.html#87234
July: http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-July/thread.html#87392

Patch by Bjarke Roune

Reviewers: eliben, atrick, sanjoy

Subscribers: majnemer, hfinkel, jingyue, meheff, llvm-commits

Differential Revision: http://reviews.llvm.org/D11212

llvm-svn: 243460
2015-07-28 18:22:40 +00:00
Silviu Baranga 4825060059 [LAA] Add clarifying comments for the checking pointer grouping algorithm. NFC
llvm-svn: 243416
2015-07-28 13:44:08 +00:00
Chandler Carruth 99ad7bb49c [GMR] Teach GlobalsModRef to distinguish an important and safe case of
no-alias with non-addr-taken globals: they cannot alias a captured
pointer.

If the non-global underlying object would have been a capture were it to
alias the global, we can firmly conclude no-alias. It isn't reasonable
for a transformation to introduce a capture in a way observable by an
alias analysis. Consider, even if it were to temporarily capture one
globals address into another global and then restore the other global
afterward, there would be no way for the load in the alias query to
observe that capture event correctly. If it observes it then the
temporary capturing would have changed the meaning of the program,
making it an invalid transformation. Even instrumentation passes or
a pass which is synthesizing stores to global variables to expose race
conditions in programs could not trigger this unless it queried the
alias analysis infrastructure mid-transform, in which case it seems
reasonable to return results from before the transform started.

See the comments in the change for a more detailed outlining of the
theory here.

This should address the primary performance regression found when the
non-conservatively-correct path of the alias query was disabled.

Differential Revision: http://reviews.llvm.org/D11410

llvm-svn: 243405
2015-07-28 11:11:11 +00:00
Adam Nemet 54f0b83ee2 [LAA] Split out a helper to print a collection of memchecks
This is effectively an NFC but we can no longer print the index of the
pointer group so instead I print its address.  This still lets us
cross-check the section that list the checks against the section that
list the groups (see how I modified the test).

E.g. before we printed this:

    Run-time memory checks:
    Check 0:
      Comparing group 0:
        %arrayidxC = getelementptr inbounds i16, i16* %c, i64 %store_ind
        %arrayidxC1 = getelementptr inbounds i16, i16* %c, i64 %store_ind_inc
      Against group 1:
        %arrayidxA = getelementptr i16, i16* %a, i64 %ind
        %arrayidxA1 = getelementptr i16, i16* %a, i64 %add
    ...
    Grouped accesses:
      Group 0:
        (Low: %c High: (78 + %c))
          Member: {%c,+,4}<%for.body>
          Member: {(2 + %c),+,4}<%for.body>

Now we print this (changes are underlined):

    Run-time memory checks:
    Check 0:
      Comparing group (0x7f9c6040c320):
                       ~~~~~~~~~~~~~~
        %arrayidxC1 = getelementptr inbounds i16, i16* %c, i64 %store_ind_inc
        %arrayidxC = getelementptr inbounds i16, i16* %c, i64 %store_ind
      Against group (0x7f9c6040c358):
                     ~~~~~~~~~~~~~~
        %arrayidxA1 = getelementptr i16, i16* %a, i64 %add
        %arrayidxA = getelementptr i16, i16* %a, i64 %ind
    ...
    Grouped accesses:
      Group 0x7f9c6040c320:
            ~~~~~~~~~~~~~~
        (Low: %c High: (78 + %c))
          Member: {(2 + %c),+,4}<%for.body>
          Member: {%c,+,4}<%for.body>

llvm-svn: 243354
2015-07-27 23:54:41 +00:00
Jingyue Wu bfefff555e Roll forward r243250
r243250 appeared to break clang/test/Analysis/dead-store.c on one of the build
slaves, but I couldn't reproduce this failure locally. Probably a false
positive as I saw this test was broken by r243246 or r243247 too but passed
later without people fixing anything.

llvm-svn: 243253
2015-07-26 19:10:03 +00:00
Jingyue Wu 84879b71a9 Revert r243250
breaks tests

llvm-svn: 243251
2015-07-26 18:30:13 +00:00
Jingyue Wu bf485f059c [TTI/CostModel] improve TTI::getGEPCost and use it in CostModel::getInstructionCost
Summary:
This patch updates TargetTransformInfoImplCRTPBase::getGEPCost to consider
addressing modes. It now returns TCC_Free when the GEP can be completely folded
to an addresing mode.

I started this patch as I refactored SLSR. Function isGEPFoldable looks common
and is indeed used by some WIP of mine. So I extracted that logic to getGEPCost.

Furthermore, I noticed getGEPCost wasn't directly tested anywhere. The best
testing bed seems CostModel, but its getInstructionCost method invokes
getAddressComputationCost for GEPs which provides very coarse estimation. So
this patch also makes getInstructionCost call the updated getGEPCost for GEPs.
This change inevitably breaks some tests because the cost model changes, but
nothing looks seriously wrong -- if we believe the new cost model is the right
way to go, these tests should be updated.

This patch is not perfect yet -- the comments in some tests need to be updated.
I want to know whether this is a right approach before fixing those details.

Reviewers: chandlerc, hfinkel

Subscribers: aschwaighofer, llvm-commits, aemerson

Differential Revision: http://reviews.llvm.org/D9819

llvm-svn: 243250
2015-07-26 17:28:13 +00:00
Igor Laevsky 2ba8374612 NFC. Explicitly specify attributes in BasicAA/cs-cs.ll test.
This will simplify verifying correctness for a changes which modify attributes.

llvm-svn: 243016
2015-07-23 14:31:18 +00:00
Jingyue Wu d058ea927f [MDA] change BlockScanLimit into a command line option.
Summary:
In the benchmark (https://github.com/vetter/shoc) we are researching,
the duplicated load is not eliminated because MemoryDependenceAnalysis
hit the BlockScanLimit. This patch change it into a command line option
instead of a hardcoded value.

Patched by Xuetian Weng. 

Test Plan: test/Analysis/MemoryDependenceAnalysis/memdep-block-scan-limit.ll

Reviewers: jingyue, reames

Subscribers: reames, llvm-commits

Differential Revision: http://reviews.llvm.org/D11366

llvm-svn: 242842
2015-07-21 21:50:39 +00:00
Simon Pilgrim 59764dccfb [X86][SSE] Updated SHL/LSHR i64 vectorization costs.
This was missed in D8416.

llvm-svn: 242621
2015-07-18 20:06:30 +00:00
Chandler Carruth f55803f761 [PM/AA] Disable the core unsafe aspect of GlobalsModRef in the face of
basic changes to the IR such as folding pointers through PHIs, Selects,
integer casts, store/load pairs, or outlining.

This leaves the feature available behind a flag. This flag's default
could be flipped if necessary, but the real-world performance impact of
this particular feature of GMR may not be sufficiently significant for
many folks to want to run the risk.

Currently, the risk here is somewhat mitigated by half-hearted attempts
to update GlobalsModRef when the rest of the optimizer changes
something. However, I am currently trying to remove that update
mechanism as it makes migrating the AA infrastructure to a form that can
be readily shared between new and old pass managers very challenging.
Without this update mechanism, it is possible that this still unlikely
failure mode will start to trip people, and so I wanted to try to
proactively avoid that.

There is a lengthy discussion on the mailing list about why the core
approach here is flawed, and likely would need to look totally different
to be both reasonably effective and resilient to basic IR changes
occuring. This patch is essentially the first of two which will enact
the result of that discussion. The next patch will remove the current
update mechanism.

Thanks to lots of folks that helped look at this from different angles.
Especial thanks to Michael Zolotukhin for doing some very prelimanary
benchmarking of LTO without GlobalsModRef to get a rough idea of the
impact we could be facing here. So far, it looks very small, but there
are some concerns lingering from other benchmarking. The default here
may get flipped if performance results end up pointing at this as a more
significant issue.

Also thanks to Pete and Gerolf for reviewing!

Differential Revision: http://reviews.llvm.org/D11213

llvm-svn: 242512
2015-07-17 06:58:24 +00:00
Silviu Baranga 0e5804a6af Fix memcheck interval ends for pointers with negative strides
Summary:
The checking pointer grouping algorithm assumes that the
starts/ends of the pointers are well formed (start <= end).

The runtime memory checking algorithm also assumes this by doing:

 start0 < end1 && start1 < end0

to detect conflicts. This check only works if start0 <= end0 and
start1 <= end1.

This change correctly orders the interval ends by either checking
the stride (if it is constant) or by using min/max SCEV expressions.

Reviewers: anemet, rengolin

Subscribers: rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D11149

llvm-svn: 242400
2015-07-16 14:02:58 +00:00
Sean Silva 6dd10bade7 Add a test for r242281 from an old patch of mine.
This isn't thorough, but should serve as a sanity check.

llvm-svn: 242356
2015-07-15 23:23:02 +00:00
Tobias Edler von Koch d8ce16b1e6 Analyze recursive PHI nodes in BasicAA
Summary:
This patch allows phi nodes like
  %x = phi [ %incptr, ... ] [ %var, ... ]
  %incptr = getelementptr %x, 1
to be analyzed by BasicAliasAnalysis.

In aliasPHI, we can detect incoming values that are recursive GEPs with a
constant offset. Instead of trying to analyze a recursive GEP (and failing), 
we now ignore it and instead set the size of the memory referenced by
the PHINode to UnknownSize. This represents all the possible memory
locations the pointer represented by the PHINode could be advanced to
by the GEP.

For now, this new behavior is turned off by default to allow debugging of
performance degradations seen with SPEC/x86 and Hexagon benchmarks.
The flag -basicaa-recphi turns it on.


Reviewers: hfinkel, sanjoy

Subscribers: tobiasvk_caf, sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D10368

llvm-svn: 242320
2015-07-15 19:32:22 +00:00
Silviu Baranga a647c30f88 Cleanup after r241809 - remove uncessary call to std::sort
Summary:
The iteration order within a member of DepCands is deterministic
and therefore we don't have to sort the accesses within a member.
We also don't have to copy the indices of the pointers into a
vector, since we can iterate over the members of the class.

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11145

llvm-svn: 242033
2015-07-13 14:48:24 +00:00
Manuel Klimek 779cf85a4f Revert r241981 "Revert "Revert r236894 "[BasicAA] Fix zext & sext handling"""
The repros from PR23626 still fail.

llvm-svn: 242025
2015-07-13 13:50:55 +00:00
Simon Pilgrim 64cc4ad0a2 [X86][SSE] Vectorized v4i32 non-uniform shifts.
While the v4i32 shl operation is already vectorized using a cvttps2dq/pmulld pattern, the lshr/ashr opeations are still scalarized.

This patch adds vectorization support for non-uniform v4i32 shift operations - it splats constant shift amounts to allow them to use the immediate sse shift instructions, or extracts/zero-extends non-constant shift amounts. The individual results are then blended together.

Differential Revision: http://reviews.llvm.org/D11063

llvm-svn: 241989
2015-07-12 11:15:19 +00:00
Hal Finkel ef28aad9f4 Revert "Revert r236894 "[BasicAA] Fix zext & sext handling""
r236894 caused PR23626 (Clang miscompiles webkit's base64 decoder), and was
reverted in r237984. This reapplies the patch with an additional test case for
PR23626 and the associated fix (both scales and offsets in the
BasicAliasAnalysis::constantOffsetHeuristic should initially be zero).

Patch by Nick White, thanks!

llvm-svn: 241981
2015-07-11 11:04:54 +00:00
Igor Laevsky 39d662f7ba Add argmemonly attribute.
This change adds new attribute called "argmemonly". Function marked with this attribute can only access memory through it's argument pointers. This attribute directly corresponds to the "OnlyAccessesArgumentPointees" ModRef behaviour in alias analysis.

Differential Revision: http://reviews.llvm.org/D10398

llvm-svn: 241979
2015-07-11 10:30:36 +00:00
Silviu Baranga 3e3095c53a Add a test of a regression discovered during testing of r241673
Summary:
We were missing a corner case where DepCands was not available,
but we were using DepCands to compute the checking pointer
groups.

This adds a test for that regression.

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11068

llvm-svn: 241818
2015-07-09 16:40:25 +00:00
Silviu Baranga ce3877fc8c Don't rely on the DepCands iteration order when constructing checking pointer groups
Summary:
The checking pointer group construction algorithm relied on the iteration on DepCands.
We would need the same leaders across runs and the same iteration order over the underlying std::set for determinism.

This changes the algorithm to process the pointers in the order in which they were added to the runtime check, which is deterministic.
We need to update the tests, since the order in which pointers appear has changed.

No new tests were added, since it is impossible to test for non-determinism.

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11064

llvm-svn: 241809
2015-07-09 15:18:25 +00:00
Adam Nemet 424edc6c80 [LAA] Revert a small part of r239295
This commit ([LAA] Fix estimation of number of memchecks) regressed the
logic a bit.  We shouldn't quit the analysis if we encounter a pointer
without known bounds *unless* we actually need to emit a memcheck for
it.

The original code was using NumComparisons which is now computed
differently.  Instead I compute NeedRTCheck from NumReadPtrChecks and
NumWritePtrChecks.

As side note, I find the separation of NeedRTCheck and CanDoRT
confusing, so I will try to merge them in a follow-up patch.

llvm-svn: 241756
2015-07-08 22:58:48 +00:00
Silviu Baranga 1b6b50a921 [LAA] Merge memchecks for accesses separated by a constant offset
Summary:
Often filter-like loops will do memory accesses that are
separated by constant offsets. In these cases it is
common that we will exceed the threshold for the
allowable number of checks.

However, it should be possible to merge such checks,
sice a check of any interval againt two other intervals separated
by a constant offset (a,b), (a+c, b+c) will be equivalent with
a check againt (a, b+c), as long as (a,b) and (a+c, b+c) overlap.
Assuming the loop will be executed for a sufficient number of
iterations, this will be true. If not true, checking against
(a, b+c) is still safe (although not equivalent).

As long as there are no dependencies between two accesses,
we can merge their checks into a single one. We use this
technique to construct groups of accesses, and then check
the intervals associated with the groups instead of
checking the accesses directly.

Reviewers: anemet

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D10386

llvm-svn: 241673
2015-07-08 09:16:33 +00:00