Commit Graph

6419 Commits

Author SHA1 Message Date
Duncan P. N. Exon Smith 4ec55f8ab6 Reapply "ValueMapper: Treat LocalAsMetadata more like function-local Values"
This reverts commit r265765, reapplying r265759 after changing a call from
LocalAsMetadata::get to ValueAsMetadata::get (and adding a unit test).  When a
local value is mapped to a constant (like "i32 %a" => "i32 7"), the new debug
intrinsic operand may no longer be pointing at a local.

    http://lab.llvm.org:8080/green/job/clang-stage1-configure-RA_build/19020/

The previous coommit message follows:

--

This is a partial re-commit -- maybe more of a re-implementation -- of
r265631 (reverted in r265637).

This makes RF_IgnoreMissingLocals behave (almost) consistently between
the Value and the Metadata hierarchy.  In particular:

  - MapValue returns nullptr or "metadata !{}" for missing locals in
    MetadataAsValue/LocalAsMetadata bridging paris, depending on
    the RF_IgnoreMissingLocals flag.

  - MapValue doesn't memoize LocalAsMetadata-related results.

  - MapMetadata no longer deals with LocalAsMetadata or
    RF_IgnoreMissingLocals at all.  (This wasn't in r265631 at all, but
    I realized during testing it would make the patch simpler with no
    loss of generality.)

r265631 went too far, making both functions universally ignore
RF_IgnoreMissingLocals.  This broke building (e.g.) compiler-rt.
Reassociate (and possibly other passes) don't currently maintain
dominates-use invariants for metadata operands, resulting in IR like
this:

    define void @foo(i32 %arg) {
      call void @llvm.some.intrinsic(metadata i32 %x)
      %x = add i32 1, i32 %arg
    }

If the inliner chooses to inline @foo into another function, then
RemapInstruction will call `MapValue(metadata i32 %x)` and assert that
the return is not nullptr.

I've filed PR27273 to add a Verifier check and fix the underlying
problem in the optimization passes.

As a workaround, return `!{}` instead of nullptr for unmapped
LocalAsMetadata when RF_IgnoreMissingLocals is unset.  Otherwise, match
the behaviour of r265631.

Original commit message:

    ValueMapper: Make LocalAsMetadata match function-local Values

    Start treating LocalAsMetadata similarly to function-local members of
    the Value hierarchy in MapValue and MapMetadata.

      - Don't memoize them.
      - Return nullptr if they are missing.

    This also cleans up ConstantAsMetadata to stop listening to the
    RF_IgnoreMissingLocals flag.

llvm-svn: 265768
2016-04-08 03:13:22 +00:00
Duncan P. N. Exon Smith 805873148a Revert "ValueMapper: Treat LocalAsMetadata more like function-local Values"
This reverts commit r265759, since even this limited version breaks some
bots:
  http://lab.llvm.org:8011/builders/clang-ppc64be-linux/builds/3311
  http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-autoconf/builds/17696

This also reverts r265761 "ValueMapper: Unduplicate
RF_NoModuleLevelChanges check, NFC", since I had trouble separating it
from r265759.

llvm-svn: 265765
2016-04-08 00:56:21 +00:00
Sanjoy Das 5ce3272833 Don't IPO over functions that can be de-refined
Summary:
Fixes PR26774.

If you're aware of the issue, feel free to skip the "Motivation"
section and jump directly to "This patch".

Motivation:

I define "refinement" as discarding behaviors from a program that the
optimizer has license to discard.  So transforming:

```
void f(unsigned x) {
  unsigned t = 5 / x;
  (void)t;
}
```

to

```
void f(unsigned x) { }
```

is refinement, since the behavior went from "if x == 0 then undefined
else nothing" to "nothing" (the optimizer has license to discard
undefined behavior).

Refinement is a fundamental aspect of many mid-level optimizations done
by LLVM.  For instance, transforming `x == (x + 1)` to `false` also
involves refinement since the expression's value went from "if x is
`undef` then { `true` or `false` } else { `false` }" to "`false`" (by
definition, the optimizer has license to fold `undef` to any non-`undef`
value).

Unfortunately, refinement implies that the optimizer cannot assume
that the implementation of a function it can see has all of the
behavior an unoptimized or a differently optimized version of the same
function can have.  This is a problem for functions with comdat
linkage, where a function can be replaced by an unoptimized or a
differently optimized version of the same source level function.

For instance, FunctionAttrs cannot assume a comdat function is
actually `readnone` even if it does not have any loads or stores in
it; since there may have been loads and stores in the "original
function" that were refined out in the currently visible variant, and
at the link step the linker may in fact choose an implementation with
a load or a store.  As an example, consider a function that does two
atomic loads from the same memory location, and writes to memory only
if the two values are not equal.  The optimizer is allowed to refine
this function by first CSE'ing the two loads, and the folding the
comparision to always report that the two values are equal.  Such a
refined variant will look like it is `readonly`.  However, the
unoptimized version of the function can still write to memory (since
the two loads //can// result in different values), and selecting the
unoptimized version at link time will retroactively invalidate
transforms we may have done under the assumption that the function
does not write to memory.

Note: this is not just a problem with atomics or with linking
differently optimized object files.  See PR26774 for more realistic
examples that involved neither.

This patch:

This change introduces a new set of linkage types, predicated as
`GlobalValue::mayBeDerefined` that returns true if the linkage type
allows a function to be replaced by a differently optimized variant at
link time.  It then changes a set of IPO passes to bail out if they see
such a function.

Reviewers: chandlerc, hfinkel, dexonsmith, joker.eph, rnk

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D18634

llvm-svn: 265762
2016-04-08 00:48:30 +00:00
Duncan P. N. Exon Smith 267185ec92 ValueMapper: Treat LocalAsMetadata more like function-local Values
This is a partial re-commit -- maybe more of a re-implementation -- of
r265631 (reverted in r265637).

This makes RF_IgnoreMissingLocals behave (almost) consistently between
the Value and the Metadata hierarchy.  In particular:

  - MapValue returns nullptr or "metadata !{}" for missing locals in
    MetadataAsValue/LocalAsMetadata bridging paris, depending on
    the RF_IgnoreMissingLocals flag.

  - MapValue doesn't memoize LocalAsMetadata-related results.

  - MapMetadata no longer deals with LocalAsMetadata or
    RF_IgnoreMissingLocals at all.  (This wasn't in r265631 at all, but
    I realized during testing it would make the patch simpler with no
    loss of generality.)

r265631 went too far, making both functions universally ignore
RF_IgnoreMissingLocals.  This broke building (e.g.) compiler-rt.
Reassociate (and possibly other passes) don't currently maintain
dominates-use invariants for metadata operands, resulting in IR like
this:

    define void @foo(i32 %arg) {
      call void @llvm.some.intrinsic(metadata i32 %x)
      %x = add i32 1, i32 %arg
    }

If the inliner chooses to inline @foo into another function, then
RemapInstruction will call `MapValue(metadata i32 %x)` and assert that
the return is not nullptr.

I've filed PR27273 to add a Verifier check and fix the underlying
problem in the optimization passes.

As a workaround, return `!{}` instead of nullptr for unmapped
LocalAsMetadata when RF_IgnoreMissingLocals is unset.  Otherwise, match
the behaviour of r265631.

Original commit message:

    ValueMapper: Make LocalAsMetadata match function-local Values

    Start treating LocalAsMetadata similarly to function-local members of
    the Value hierarchy in MapValue and MapMetadata.

      - Don't memoize them.
      - Return nullptr if they are missing.

    This also cleans up ConstantAsMetadata to stop listening to the
    RF_IgnoreMissingLocals flag.

llvm-svn: 265759
2016-04-08 00:33:44 +00:00
Ulrich Weigand 6e6966460a [GVN] Fix handling of sub-byte types in big-endian mode
When GVN wants to re-interpret an already available value in a smaller
type, it needs to right-shift the value on big-endian systems to ensure
the correct bytes are accessed.  The shift value is the difference of
the sizes of the two types.

This is correct as long as both types occupy multiples of full bytes.
However, when one of them is a sub-byte type like i1, this no longer
holds true: we still need to shift, but only to access the correct
*byte*.  Accessing bits within the byte requires no shift in either
endianness; e.g. an i1 resides in the least-significant bit of its
containing byte on both big- and little-endian systems.

Therefore, the appropriate shift value to be used is the difference of
the *storage* sizes of the two types.  This is already handled correctly
in one place where such a shift takes place (GetStoreValueForLoad), but
is incorrect in two other places: GetLoadValueForLoad and
CoerceAvailableValueToLoadType.

This patch changes both places to use the storage size as well.

Differential Revision: http://reviews.llvm.org/D18662

llvm-svn: 265684
2016-04-07 15:45:02 +00:00
Michael Zolotukhin 97567e141e [LoopUnroll] Fix the way we update DT after complete unrolling.
Updating dominators for exit-blocks of the unrolled loops is not enough,
as shown in PR27157. The proper way is to update dominators for all
dominance-children of original loop blocks.

llvm-svn: 265605
2016-04-06 21:47:12 +00:00
Sanjay Patel 6cc488004d regenerate checks
llvm-svn: 265591
2016-04-06 19:58:06 +00:00
Fiona Glaser 16332ba861 LoopUnroll: only allow non-modulo Partial unrolling when Runtime=true
Patch by Evgeny Stupachenko <evstupac@gmail.com>.

llvm-svn: 265558
2016-04-06 16:43:45 +00:00
Silviu Baranga a393baf1fd Revert r265535 until we know how we can fix the bots
llvm-svn: 265541
2016-04-06 14:06:32 +00:00
Silviu Baranga 72b4a4a330 [SCEV] Introduce a guarded backedge taken count and use it in LAA and LV
Summary:
When the backedge taken codition is computed from an icmp, SCEV can
deduce the backedge taken count only if one of the sides of the icmp
is an AddRecExpr. However, due to sign/zero extensions, we sometimes
end up with something that is not an AddRecExpr.

However, we can use SCEV predicates to produce a 'guarded' expression.
This change adds a method to SCEV to get this expression, and the
SCEV predicate associated with it.

In HowManyGreaterThans and HowManyLessThans we will now add a SCEV
predicate associated with the guarded backedge taken count when the
analyzed SCEV expression is not an AddRecExpr. Note that we only do
this as an alternative to returning a 'CouldNotCompute'.

We use new feature in Loop Access Analysis and LoopVectorize to analyze
and transform more loops.

Reviewers: anemet, mzolotukhin, hfinkel, sanjoy

Subscribers: flyingforyou, mcrosier, atrick, mssimpso, sanjoy, mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D17201

llvm-svn: 265535
2016-04-06 13:18:26 +00:00
David Majnemer 12fd50410d [SLPVectorizer] Vectorizing the libm sqrt to llvm's sqrt intrinsic requires nnan
To quote the langref "Unlike sqrt in libm, however, llvm.sqrt has
undefined behavior for negative numbers other than -0.0 (which allows
for better optimization, because there is no need to worry about errno
being set). llvm.sqrt(-0.0) is defined to return -0.0 like IEEE sqrt."

This means that it's unsafe to replace sqrt with llvm.sqrt unless the
call is annotated with nnan.

Thanks to Hal Finkel for pointing this out!

llvm-svn: 265521
2016-04-06 07:04:53 +00:00
Duncan P. N. Exon Smith 6f2e37429a ValueMapper: Fix delayed blockaddress handling after r265273
r265273 added Mapper::mapBlockAddress, which delays mapping a
blockaddress value until the function has a body.  The condition was
backwards, and should be checking Function::empty instead of
GlobalValue::isDeclaration.

llvm-svn: 265508
2016-04-06 02:25:12 +00:00
David Majnemer 25d03dbcde [SLPVectorizer] Vectorize libcalls of sqrt
We didn't realize that we could transform the libcall into a vectorized
intrinsic.

llvm-svn: 265493
2016-04-06 00:14:59 +00:00
Davide Italiano ea04026c13 [DebugInfo] Fix tests so that each subprogram belongs to a CU.
llvm-svn: 265490
2016-04-05 23:37:08 +00:00
Sanjoy Das 49e974b33b [RS4GC] Better codegen for deoptimize calls
Don't emit a gc.result for a statepoint lowered from
@llvm.experimental.deoptimize since the call into __llvm_deoptimize is
effectively noreturn.  Instead follow the corresponding gc.statepoint
with an "unreachable".

llvm-svn: 265485
2016-04-05 23:18:35 +00:00
Sanjay Patel fd16e62d56 add tests to show missing optimization from D18230
llvm-svn: 265431
2016-04-05 18:09:36 +00:00
Sanjay Patel 6ecf1b6760 [InstCombine] regenerate checks
utils/update_test_checks.py was improved with:
http://reviews.llvm.org/rL265414
to CHECK-NEXT the first line of the IR function. This ensures that nothing bad
has happened before that.

llvm-svn: 265417
2016-04-05 17:24:54 +00:00
Chuang-Yu Cheng d3fb38cae5 Don't delete empty preheaders in CodeGenPrepare if it would create a critical edge
Presently, CodeGenPrepare deletes all nearly empty (only phi and branch)
basic blocks. This pass can delete loop preheaders which frequently creates
critical edges. A preheader can be a convenient place to spill registers to
the stack. If the entrance to a loop body is a critical edge, then spills
may occur in the loop body rather than immediately before it. This patch
protects loop preheaders from deletion in CodeGenPrepare even if they are
nearly empty.

Since the patch alters the CFG, it affects a large number of test cases.
In most cases, the changes are merely cosmetic (basic blocks have different
names or instruction orders change slightly). I am somewhat concerned about
the test/CodeGen/Mips/brdelayslot.ll test case. If the loop preheader is not
deleted, then the MIPS backend does not take advantage of a branch delay
slot. Consequently, I would like some close review by a MIPS expert.

The patch also partially subsumes D16893 from George Burgess IV. George
correctly notes that CodeGenPrepare does not actually preserve the dominator
tree. I think the dominator tree was usually not valid when CodeGenPrepare
ran, but I am using LoopInfo to mark preheaders, so the dominator tree is
now always valid before CodeGenPrepare.

Author: Tom Jablin (tjablin)
Reviewers: hfinkel george.burgess.iv vkalintiris dsanders kbarton cycheng

http://reviews.llvm.org/D16984

llvm-svn: 265397
2016-04-05 14:06:20 +00:00
David L Kreitzer 188de5ae69 Adds the ability to use an epilog remainder loop during loop unrolling and makes
this the default behavior.

Patch by Evgeny Stupachenko (evstupac@gmail.com).

Differential Revision: http://reviews.llvm.org/D18158

llvm-svn: 265388
2016-04-05 12:19:35 +00:00
Zia Ansari a82a58a4e5 Enable unroll for constant bound loops when TripCount is not modulo of unroll factor, reducing it to maximum power-of-2 that satisfies threshold limit.
Commit for Evgeny Stupachenko (evstupac@gmail.com)

Differential Revision: http://reviews.llvm.org/D18290

llvm-svn: 265337
2016-04-04 19:24:46 +00:00
Teresa Johnson 03e93bab7f Fix bot errors from r265327, exact GUID which depends on path
E.g. http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/21919

The source file path name will affect exact GUID, don't try to match
exact value.

llvm-svn: 265334
2016-04-04 19:11:00 +00:00
Betul Buyukkurt 18131c4216 [PGO] Avoid instrumenting direct callee's at value sites.
Direct callees' that are cast to other function prototypes,
show up in the Call/Invoke instructions as ConstantExpr's.
Currently llvm::CallSite's getCalledFunction() fails
to return the callees in such expressions as direct calls.
Value profiling should avoid instrumenting such cases. Mostly NFC.

llvm-svn: 265330
2016-04-04 18:56:36 +00:00
Teresa Johnson 916495d894 [ThinLTO] Add option to dump value name to GUID mapping
Summary:
Useful for debugging since we lose this correlation after the permodule
summary/VST is read and until we later materialize source modules in the
function importer.

Reviewers: joker.eph

Subscribers: llvm-commits, joker.eph

Differential Revision: http://reviews.llvm.org/D18555

llvm-svn: 265327
2016-04-04 18:52:58 +00:00
Peter Zotov 0b6d7bc682 [CodeGenPrepare] Avoid sinking soft-FP comparisons
Sinking comparisons in CGP can undo the job of hoisting them done
earlier by LICM, and soft-FP makes this an expensive mistake.

A common pattern that produces floating point comparisons uniform
over a loop is an explicit check for division by zero. If the divisor
is hoisted out of the loop, the comparison can also be, but hoisting
the function that unwinds is never legal, since it may cause side
effects in the loop body prior to the unwinding to not be executed.

Differential Revision: http://reviews.llvm.org/D18744

llvm-svn: 265264
2016-04-03 16:36:17 +00:00
Peter Zotov 0218d0f383 Mark some FP intrinsics as safe to speculatively execute
Floating point intrinsics in LLVM are generally not speculatively
executed, since most of them are defined to behave the same as libm
functions, which set errno.

However, the only error that can happen  when executing ceil, floor,
nearbyint, rint and round libm functions per POSIX.1-2001 is -ERANGE,
and that requires the maximum value of the exponent to be smaller
than  the number of mantissa bits, which is not the case with any of
the floating point types supported by LLVM.

The trunc and copysign functions never set errno per per POSIX.1-2001.

Differential Revision: http://reviews.llvm.org/D18643

llvm-svn: 265262
2016-04-03 12:30:46 +00:00
David Majnemer fe3f9d1721 [InstCombine] Don't sink an instr after a catchswitch
A catchswitch is a terminator, instructions cannot be inserted after it.

llvm-svn: 265158
2016-04-01 17:28:17 +00:00
David Majnemer 6f1f85f0e1 [SLPVectorizer] Don't insert an extractelement before a catchswitch
A catchswitch cannot be preceded by another instruction in the same
basic block (other than a PHI node).

Instead, insert the extract element right after the materialization of
the vectorized value.  This isn't optimal but is a reasonable compromise
given the constraints of WinEH.

This fixes PR27163.

llvm-svn: 265157
2016-04-01 17:28:15 +00:00
Vedant Kumar 7784171721 [PGOProfile] Rename a test to make it more reusable, NFC
llvm-svn: 265144
2016-04-01 15:45:33 +00:00
Sanjoy Das f83ab6de56 Don't insert stackrestore on deoptimizing returns
They're not necessary (since the stack pointer is trivially restored on
return), and the way LLVM inserts the stackrestore calls breaks the
IR (we get a stackrestore between the deoptimize call and the return).

llvm-svn: 265101
2016-04-01 02:51:30 +00:00
Sanjoy Das 18b92968ea Don't insert lifetime end markers on deoptimizing returns
They're not necessary (since the lifetime of the alloca is trivially
over due to the return), and the way LLVM inserts the lifetime.end
markers breaks the IR (we get a lifetime end marker between the
deoptimize call and the return).

llvm-svn: 265100
2016-04-01 02:51:26 +00:00
Adrian Prantl b8089516a5 testcase gardening: update the emissionKind enum to the new syntax. (NFC)
llvm-svn: 265081
2016-04-01 00:16:49 +00:00
Adrian Prantl b939a25707 Move the DebugEmissionKind enum from DIBuilder into DICompileUnit.
This mostly cosmetic patch moves the DebugEmissionKind enum from DIBuilder
into DICompileUnit. DIBuilder is not the right place for this enum to live
in — a metadata consumer should not have to include DIBuilder.h.
I also added a Verifier check that checks that the emission kind of a
DICompileUnit is actually legal.

http://reviews.llvm.org/D18612
<rdar://problem/25427165>

llvm-svn: 265077
2016-03-31 23:56:58 +00:00
David Majnemer ae272d718e [NVPTX] Infer __nvvm_reflect as nounwind, readnone
This patch simply mirrors the attributes we give to @llvm.nvvm.reflect
to the __nvvm_reflect libdevice call.  This shaves about 30% of the code
in libdevice away because of CSE opportunities.  It's also helps us
figure out that libdevice implementations of transcendental functions
don't have side-effects.

llvm-svn: 265060
2016-03-31 21:29:57 +00:00
Sanjoy Das 56df0ec610 [InstCombine] Fix incorrect rule from rL236202
The rule for SMIN introduced in rL236202 doesn't work as advertised: the
check for Pred == ICmpInst::ICMP_SGT was missing.

llvm-svn: 264996
2016-03-31 05:14:34 +00:00
Davide Italiano 936a2b09f3 [DebugInfo] Subprograms should belong to a CU.
Start fixing tests accordingly. There are still
about 35 failures before we can enable this check
in the IR verifier.

llvm-svn: 264990
2016-03-31 03:40:07 +00:00
Sanjoy Das 021de058df Introduce a @llvm.experimental.guard intrinsic
Summary:
As discussed on llvm-dev[1].

This change adds the basic boilerplate code around having this intrinsic
in LLVM:

 - Changes in Intrinsics.td, and the IR Verifier
 - A lowering pass to lower @llvm.experimental.guard to normal
   control flow
 - Inliner support

[1]: http://lists.llvm.org/pipermail/llvm-dev/2016-February/095523.html

Reviewers: reames, atrick, chandlerc, rnk, JosephTremoulet, echristo

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D18527

llvm-svn: 264976
2016-03-31 00:18:46 +00:00
Matt Arsenault 2fe4fbc184 AMDGPU: Add frexp_exp intrinsic
llvm-svn: 264944
2016-03-30 22:28:52 +00:00
Matt Arsenault 5cd4f8f89f AMDGPU: Constant folding for frexp_mant
llvm-svn: 264943
2016-03-30 22:28:26 +00:00
David Majnemer 5d518386b6 [IndVarSimplify] Don't insert after a catchswitch
Widening a PHI requires us to insert a trunc.
The logical place for this trunc is in the same BB as the PHI.
This is not possible if the BB is terminated by a catchswitch.

This fixes PR27133.

llvm-svn: 264926
2016-03-30 21:12:06 +00:00
Justin Bogner a5a6378700 test: Remove a test for a transform that hasn't existed in 5 years.
The TailDup transform was removed in r138841 in 2011, along with most
of the tests for it. This test, however, was missed. Probably because
it had already been XFAIL'd for 3 years at that point (since r52243!)
and continued to fail when the opt flag for -tailduplicate stopped
being valid.

llvm-svn: 264916
2016-03-30 20:36:07 +00:00
Hal Finkel 2e0ff2b244 [LoopVectorize] Don't vectorize loops when everything will be scalarized
This change prevents the loop vectorizer from vectorizing when all of the vector
types it generates will be scalarized. I've run into this problem on the PPC's QPX
vector ISA, which only holds floating-point vector types. The loop vectorizer
will, however, happily vectorize loops with purely integer computation. Here's
an example:

  LV: The Smallest and Widest types: 32 / 32 bits.
  LV: The Widest register is: 256 bits.
  LV: Found an estimated cost of 0 for VF 1 For instruction:   %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ]
  LV: Found an estimated cost of 0 for VF 1 For instruction:   %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25
  LV: Found an estimated cost of 0 for VF 1 For instruction:   %2 = trunc i64 %indvars.iv25 to i32
  LV: Found an estimated cost of 1 for VF 1 For instruction:   store i32 %2, i32* %arrayidx, align 4
  LV: Found an estimated cost of 1 for VF 1 For instruction:   %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1
  LV: Found an estimated cost of 1 for VF 1 For instruction:   %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600
  LV: Found an estimated cost of 0 for VF 1 For instruction:   br i1 %exitcond27, label %for.cond.cleanup, label %for.body
  LV: Scalar loop costs: 3.
  LV: Found an estimated cost of 0 for VF 2 For instruction:   %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ]
  LV: Found an estimated cost of 0 for VF 2 For instruction:   %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25
  LV: Found an estimated cost of 0 for VF 2 For instruction:   %2 = trunc i64 %indvars.iv25 to i32
  LV: Found an estimated cost of 2 for VF 2 For instruction:   store i32 %2, i32* %arrayidx, align 4
  LV: Found an estimated cost of 1 for VF 2 For instruction:   %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1
  LV: Found an estimated cost of 1 for VF 2 For instruction:   %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600
  LV: Found an estimated cost of 0 for VF 2 For instruction:   br i1 %exitcond27, label %for.cond.cleanup, label %for.body
  LV: Vector loop of width 2 costs: 2.
  LV: Found an estimated cost of 0 for VF 4 For instruction:   %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ]
  LV: Found an estimated cost of 0 for VF 4 For instruction:   %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25
  LV: Found an estimated cost of 0 for VF 4 For instruction:   %2 = trunc i64 %indvars.iv25 to i32
  LV: Found an estimated cost of 4 for VF 4 For instruction:   store i32 %2, i32* %arrayidx, align 4
  LV: Found an estimated cost of 1 for VF 4 For instruction:   %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1
  LV: Found an estimated cost of 1 for VF 4 For instruction:   %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600
  LV: Found an estimated cost of 0 for VF 4 For instruction:   br i1 %exitcond27, label %for.cond.cleanup, label %for.body
  LV: Vector loop of width 4 costs: 1.
  ...
  LV: Selecting VF: 8.
  LV: The target has 32 registers
  LV(REG): Calculating max register usage:
  LV(REG): At #0 Interval # 0
  LV(REG): At #1 Interval # 1
  LV(REG): At #2 Interval # 2
  LV(REG): At #4 Interval # 1
  LV(REG): At #5 Interval # 1
  LV(REG): VF = 8

The problem is that the cost model here is not wrong, exactly. Since all of
these operations are scalarized, their cost (aside from the uniform ones) are
indeed VF*(scalar cost), just as the model suggests. In fact, the larger the VF
picked, the lower the relative overhead from the loop itself (and the
induction-variable update and check), and so in a sense, picking the largest VF
here is the right thing to do.

The problem is that vectorizing like this, where all of the vectors will be
scalarized in the backend, isn't really vectorizing, but rather interleaving.
By itself, this would be okay, but then the vectorizer itself also interleaves,
and that's where the problem manifests itself. There's aren't actually enough
scalar registers to support the normal interleave factor multiplied by a factor
of VF (8 in this example). In other words, the problem with this is that our
register-pressure heuristic does not account for scalarization.

While we might want to improve our register-pressure heuristic, I don't think
this is the right motivating case for that work. Here we have a more-basic
problem: The job of the vectorizer is to vectorize things (interleaving aside),
and if the IR it generates won't generate any actual vector code, then
something is wrong. Thus, if every type looks like it will be scalarized (i.e.
will be split into VF or more parts), then don't consider that VF.

This is not a problem specific to PPC/QPX, however. The problem comes up under
SSE on x86 too, and as such, this change fixes PR26837 too. I've added Sanjay's
reduced test case from PR26837 to this commit.

Differential Revision: http://reviews.llvm.org/D18537

llvm-svn: 264904
2016-03-30 19:37:08 +00:00
James Molloy 8e46cd05a1 [VectorUtils] Don't try and truncate PHIs to a smaller bitwidth
We already try not to truncate PHIs in computeMinimalBitwidths. LoopVectorize can't handle it and we really don't need to, because both induction and reduction PHIs are truncated by other means.

However, we weren't bailing out in all the places we should have, and we ended up by returning a PHI to be truncated, which has caused PR27018.

This fixes PR17018.

llvm-svn: 264852
2016-03-30 10:11:43 +00:00
George Burgess IV 49cad7d70b [MemorySSA] Make the visitor more careful with calls.
Prior to this patch, the MemorySSA caching visitor would cache all
calls that it visited. When paired with phi optimization, this can be
problematic. Consider:

define void @foo() {
  ; 1 = MemoryDef(liveOnEntry)
  call void @clobberFunction()
  br i1 undef, label %if.end, label %if.then

if.then:
  ; MemoryUse(??)
  call void @readOnlyFunction()
  ; 2 = MemoryDef(1)
  call void @clobberFunction()
  br label %if.end

if.end:
  ; 3 = MemoryPhi(...)
  ; MemoryUse(?)
  call void @readOnlyFunction()
  ret void
}

When optimizing MemoryUse(?), we visit defs 1 and 2, so we note to
cache them later. We ultimately end up not being able to optimize
passed the Phi, so we set MemoryUse(?) to point to the Phi. We then
cache the clobbering call for def 1 to be the Phi.

This commit changes this behavior so that we wipe out any calls
added to VisistedCalls while visiting the defs of a phi we couldn't
optimize.

Aside: With this patch, we now can bootstrap clang/LLVM without a
single MemorySSA verifier failure. Woohoo. :)

llvm-svn: 264820
2016-03-30 03:12:08 +00:00
Xinliang David Li a55fd1a9dc [PGO] Handle invoke inst in IR based icall instrumentation
Differential Revision: http://reviews.llvm.org/D18580

llvm-svn: 264818
2016-03-30 02:16:07 +00:00
George Burgess IV 82ee942a8c [MemorySSA] Change how the walker views/walks visited phis.
This patch teaches the caching MemorySSA walker a few things:

1. Not to walk Phis we've walked before. It seems that we tried to do
   this before, but it didn't work so well in cases like:

define void @foo() {
  %1 = alloca i8
  %2 = alloca i8
  br label %begin

begin:
  ; 3 = MemoryPhi({%0,liveOnEntry},{%end,2})
  ; 1 = MemoryDef(3)
  store i8 0, i8* %2
  br label %end

end:
  ; MemoryUse(?)
  load i8, i8* %1
  ; 2 = MemoryDef(1)
  store i8 0, i8* %2
  br label %begin
}

Because we wouldn't put Phis in Q.Visited until we tried to visit them.
So, when trying to optimize MemoryUse(?):
  - We would visit 3 above
    - ...Which would make us put {%0,liveOnEntry} in Q.Visited
    - ...Which would make us visit {%0,liveOnEntry}
    - ...Which would make us put {%end,2} in Q.Visited
    - ...Which would make us visit {%end,2}
      - ...Which would make us visit 3
        - ...Which would realize we've already visited everything in 3
        - ...Which would make us conservatively return 3.

In the added test-case, (@looped_visitedonlyonce) this behavior would
cause us to give incorrect results. Specifically, we'd visit 4 twice
in the same query, but on the second visit, we'd skip while.cond because
it had been visited, visit if.then/if.then2, and cache "1" as the
clobbering def on the way back.

2. If we try to walk the defs of a {Phi,MemLoc} and see it has been
   visited before, just hand back the Phi we're trying to optimize.

I promise this isn't as terrible as it seems. :)

We now insert {Phi,MemLoc} pairs just before walking the Phi's upward
defs. So, we check the cache for the {Phi,MemLoc} pair before checking
if we've already walked the Phi.

The {Phi,MemLoc} pair is (almost?) always guaranteed to have a cache
entry if we've already fully walked it, because we cache as we go.

So, if the {Phi,MemLoc} pair isn't in cache, either:
 (a) we must be in the process of visiting it (in which case, we can't
     give a better answer in a cache-as-we-go DFS walker)

 (b) we visited it, but didn't cache it on the way back (...which seems
     to require `ModifyingAccess` to not dominate `StartingAccess`,
     so I'm 99% sure that would be an error. If it's not an error, I
     haven't been able to get it to happen locally, so I suspect it's
     rare.)

- - - - -

As a consequence of this change, we no longer skip upward defs of phis,
so we can kill the `VisitedOnlyOne` check. This gives us better accuracy
than we had before, at the cost of potentially doing a bit more work
when we have a loop.

llvm-svn: 264814
2016-03-30 00:26:26 +00:00
Adam Nemet 1428d41f9a [LoopDataPrefetch] Centralize the tuning cl::opts under the pass
This is effectively NFC, minus the renaming of the options
(-cyclone-prefetch-distance -> -prefetch-distance).

The change was requested by Tim in D17943.

llvm-svn: 264806
2016-03-29 23:45:52 +00:00
Duncan P. N. Exon Smith e8eb94a9a5 ADCE: Remove debug info intrinsics in dead scopes
During ADCE, track which debug info scopes still have live references
from the code, and delete debug info intrinsics for the dead ones.

These intrinsics describe the locations of variables (in registers or
stack slots).  If there's no code left corresponding to a variable's
scope, then there's no way to reference the variable in the debugger and
it doesn't matter what its value is.

I add a DEBUG printout when the described location in an SSA register,
in case it helps some trying to track down why locations get lost.
However, we still delete these; the scope itself isn't attached to any
real code, so the ship has already sailed.

llvm-svn: 264800
2016-03-29 22:57:12 +00:00
Adrian Prantl 4a09777b37 Upgrade some wildly anachronistic debug info in testcases.
llvm-svn: 264797
2016-03-29 22:34:30 +00:00
Nirav Dave 2aab7f4358 Add support for no-jump-tables
Add function soft attribute to the generation of Jump Tables in CodeGen
as initial step towards clang support of gcc's no-jump-table support

Reviewers: hans, echristo

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D18321

llvm-svn: 264756
2016-03-29 17:46:23 +00:00
Elena Demikhovsky 0053eb9479 fixed typo - CHECK-LABEL
llvm-svn: 264702
2016-03-29 06:49:38 +00:00