Commit Graph

22276 Commits

Author SHA1 Message Date
Volkan Keles 20d3c4200d [GlobalISel] Translate floating-point negation
Reviewers: qcolombet, javed.absar, aditya_nandakumar, dsanders, t.p.northover, ab

Reviewed By: qcolombet

Subscribers: dberris, rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30671

llvm-svn: 297171
2017-03-07 18:03:28 +00:00
Tim Northover c2c545b8f7 GlobalISel: restrict G_EXTRACT instruction to just one operand.
A bit more painful than G_INSERT because it was more widely used, but this
should simplify the handling of extract operations in most locations.

llvm-svn: 297100
2017-03-06 23:50:28 +00:00
Sanjay Patel 7f18ec50ba [DAG] refactor related div/rem folds; NFCI
This is known incomplete and not called in the right order relative to
other folds, but that's the current behavior. I'm just trying to clean
this up before making actual functional changes to make the patch smaller.

The logic here should mimic the IR equivalents that are in InstSimplify's
simplifyDivRem().

llvm-svn: 297086
2017-03-06 22:32:40 +00:00
Paul Robinson f96e21ad6d [DWARFv5] Update definitions to match published spec.
Some late additions to DWARF v5 were not in Dwarf.def; also one form
was redefined.  Add the new cases to relevant switches in different
parts of LLVM.  Replace DW_FORM_ref_sup with DW_FORM_ref_sup[4,8].

I did not add support for DW_FORM_strx3/addrx3 other that defining the
constants. We don't have any infrastructure to support these.

Differential Revision: http://reviews.llvm.org/D30664

llvm-svn: 297085
2017-03-06 22:20:03 +00:00
Jessica Paquette 596f483a5e [Outliner] Fixed Asan bot failure in r296418
Fixed the asan bot failure which led to the last commit of the outliner being reverted.
The change is in lib/CodeGen/MachineOutliner.cpp in the SuffixTree's constructor. LeafVector
is no longer initialized using reserve but just a standard constructor.

llvm-svn: 297081
2017-03-06 21:31:18 +00:00
Krzysztof Parzyszek 5b8fae5edd [IfConversion] Only renormalize probabilities if branches are analyzable
If a block has non-analyzable branches, the listed successors don't need
to add up to one. For example, if a block has a conditional tail call,
that tail call will not have a corresponding successor in the successor
list, but will still be a possible branch.

Differential Revision: https://reviews.llvm.org/D30556

llvm-svn: 297054
2017-03-06 19:12:42 +00:00
Tim Northover 95b6d5f2b1 GlobalISel: don't emit degenerate G_INSERT instructions.
Before, we were producing G_INSERT instructions that were actually closer to a
cast or even a COPY when both input and output sizes are the same. This doesn't
really make sense and means that everything interpreting a G_INSERT also has to
handle all these kinds of casts.

So now we detect these degenerate cases and emit real casts instead.

llvm-svn: 297051
2017-03-06 19:04:17 +00:00
Tim Northover 81dafc1c88 GlobalISel: add buildUndef method to MachineIRBuilder. NFC.
llvm-svn: 297044
2017-03-06 18:36:40 +00:00
Tim Northover 75e0b91e59 GlobalISel: refactor legalization of G_INSERT.
Now that G_INSERT instructions can only insert one register, this code was
overly general. In another direction it didn't handle registers that crossed
split boundaries properly, which needed to be fixed.

llvm-svn: 297042
2017-03-06 18:23:04 +00:00
Sanjay Patel 7f7947bf41 [DAGCombiner] simplify div/rem-by-0
Refactoring of duplicated code and more fixes to follow.

This is motivated by the post-commit comments for r296699:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170306/435182.html

Ie, we can crash if we're missing obvious simplifications like this that
exist in the IR simplifier or if these occur later than expected.

The x86 change for non-splat division shows a potential opportunity to improve
vector codegen: we assumed that since only one lane had meaningful results, we
should do the math in scalar. But that means moving back and forth from vector
registers.

llvm-svn: 297026
2017-03-06 16:36:42 +00:00
Sanjay Patel 6b029a5380 [DAG] fix formatting; NFC
llvm-svn: 297015
2017-03-06 15:27:57 +00:00
Sanjay Patel 5273afd4bb [DAG] fix typo in comment; NFC
llvm-svn: 297011
2017-03-06 15:07:43 +00:00
Dean Michael Berris 7e8eea429f [XRay] Allow logging the first argument of a function call.
Summary:
Functions with the "xray-log-args" attribute will have a special XRay sled kind
emitted, for compiler-rt to copy any call arguments to your logging handler.

For practical and performance reasons, only the first argument is supported, and
only up to 64 bits.

Reviewers: dberris

Reviewed By: dberris

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29702

llvm-svn: 296998
2017-03-06 06:48:56 +00:00
Simon Pilgrim 584d6d9d91 [SelectionDAG] Fix vector splitting for *_EXTEND_VECTOR_INREG instructions
Found by fuzz testing after rL296985 landed

llvm-svn: 296989
2017-03-05 15:52:18 +00:00
Simon Pilgrim 9f5c251d57 [X86][SSE] Lower 128-bit vectors to SIGN/ZERO_EXTEND_VECTOR_IN_REG ops
As described on PR31712, we miss a variety of legalization combines because we lower these to X86ISD::VSEXT/VZEXT despite them having the same functionality. This patch makes 128-bit (SSE41) SIGN/ZERO_EXTEND_VECTOR_IN_REG ops legal, adds the necessary tablegen plumbing and uses a helper 'getExtendInVec' to decide when to use SIGN/ZERO_EXTEND_VECTOR_IN_REG or VSEXT/VZEXT.

We're missing a couple of shuffle combines that will be added in a future patch for review.

Later patches can then support the AVX2 cases as a mixture of SIGN/ZERO_EXTEND and SIGN/ZERO_EXTEND_VECTOR_IN_REG, and then finally deal with the AVX512 cases.

Differential Revision: https://reviews.llvm.org/D30549

llvm-svn: 296985
2017-03-05 09:57:20 +00:00
Craig Topper 6ffc044b2f [DAGCombine] Use APInt::operator|(uint64_t) instead of creating a temporary APInt and calling APInt::Or. NFC
This is more efficient by itself. But this is prep for a future patch that may remove APInt::Or while making operator| support rvalue references similar to add/sub.

llvm-svn: 296981
2017-03-05 01:08:16 +00:00
Sanjay Patel 066f3208bf [DAGCombiner] allow transforming (select Cond, C +/- 1, C) to (add(ext Cond), C)
select Cond, C +/- 1, C --> add(ext Cond), C -- with a target hook.

This is part of the ongoing process to obsolete D24480.  The motivation is to 
canonicalize to select IR in InstCombine whenever possible, so we need to have a way to
undo that easily in codegen.
 
PowerPC is an obvious winner for this kind of transform because it has fast and complete 
bit-twiddling abilities but generally lousy conditional execution perf (although this might
have changed in recent implementations).

x86 also sees some wins, but the effect is limited because these transforms already mostly
exist in its target-specific combineSelectOfTwoConstants(). The fact that we see any x86 
changes just shows that that code is a mess of special-case holes. We may be able to remove 
some of that logic now.

My guess is that other targets will want to enable this hook for most cases. The likely 
follow-ups would be to add value type and/or the constants themselves as parameters for the
hook. As the tests in select_const.ll show, we can transform any select-of-constants to 
math/logic, but the general transform for any 2 constants needs one more instruction 
(multiply or 'and').

ARM is one target that I think may not want this for most cases. I see infinite loops there
because it wants to use selects to enable conditionally executed instructions.

Differential Revision: https://reviews.llvm.org/D30537

llvm-svn: 296977
2017-03-04 19:18:09 +00:00
Florian Hahn 6406f98342 [legalize-types] Remove stale entries from SoftenedFloats.
Summary:
When replacing a SDValue, we should remove the replaced value from
SoftenedFloats (and possibly the other maps as well?).

When we revisit a Node because it needs analyzing again, we have to
remove all result values from SoftenedFloats (and possibly other maps?).

This fixes the fp128 test failures with expensive checks for X86.

I think we probably should also remove the values from the other maps
(PromotedIntegers and so on), let me know what you think.

Reviewers: baldrick, bogner, davidxl, ab, arsenm, pirama, chh, RKSimon

Reviewed By: chh

Subscribers: danalbert, wdng, srhines, hfinkel, sepavloff, llvm-commits

Differential Revision: https://reviews.llvm.org/D29265

llvm-svn: 296964
2017-03-04 12:00:35 +00:00
Eli Friedman 49f0220086 [MISched] Remove unused arguments. NFC.
llvm-svn: 296934
2017-03-04 00:42:55 +00:00
Matthias Braun ffe40dd69e RegAllocGreedy: Follow-up to r296722
We can now end up in situations where we initiate LiveIntervalUnion
queries with different SubRanges against the same register unit, so the
assert() no longer holds in all cases. Just recalculate now when we know
the cache is out of date.

llvm-svn: 296928
2017-03-03 23:27:20 +00:00
Tim Northover 3e6a7afd81 GlobalISel: constrain G_INSERT to inserting just one value per instruction.
It's much easier to reason about single-value inserts and no-one was actually
using the variadic variants before.

llvm-svn: 296923
2017-03-03 23:05:47 +00:00
Tim Northover bf017293af GlobalISel: add merge/unmerge nodes for legalization.
These are simplified variants of the current G_SEQUENCE and G_EXTRACT, which
assume the individual parts will be contiguous, homogeneous, and occupy the
entirity of the larger register. This makes reasoning about them much easer
since you only have to look at the first register being merged and the result
to know what the instruction is doing.

I intend to gradually replace all uses of the more complicated sequence/extract
with these (or single-element insert/extracts), and then remove the older
variants. For now we start with legalization.

llvm-svn: 296921
2017-03-03 22:46:09 +00:00
Matthias Braun a04d7ad851 RegisterCoalescer: Simplify subrange splitting code; NFC
- Use slightly better variable names / compute in a more direct way.

llvm-svn: 296905
2017-03-03 19:05:34 +00:00
Simon Pilgrim 6dfab414db Use APInt::setBits instead of OR'ing in a separate APInt::getBitsSet call
llvm-svn: 296886
2017-03-03 17:03:52 +00:00
Simon Pilgrim cf12b5e1a6 Use APInt::getOneBitSet instead of APInt::getBitsSet for sign bit mask creation
Avoids all the unnecessary extra bitrange creation/shift stages.

llvm-svn: 296879
2017-03-03 16:35:57 +00:00
Simon Pilgrim 10754abe7e Use APInt::getOneBitSet instead of APInt::getBitsSet for sign bit mask creation
Avoids all the unnecessary extra bitrange creation/shift stages.

llvm-svn: 296871
2017-03-03 14:25:46 +00:00
Simon Pilgrim b01bb3a1b2 Fix Wdocumentation warning
llvm-svn: 296866
2017-03-03 12:09:11 +00:00
Chandler Carruth ce52b80744 [SDAG] Revert r296476 (and r296486, r296668, r296690).
This patch causes compile times for some patterns to explode. I have
a (large, unreduced) test case that slows down by more than 20x and
several test cases slow down by 2x. I'm sending some of the test cases
directly to Nirav and following up with more details in the review log,
but this should unblock anyone else hitting this.

llvm-svn: 296862
2017-03-03 10:02:25 +00:00
Adrian Prantl ea8880b81f LiveDebugValues: Assume calls never clobber SP.
A call should never modify the stack pointer, but some backends are
not so sure about this and never list SP in the regmask. For the
purposes of LiveDebugValues we assume a call never clobbers SP. We
already have a similar workaround in DbgValueHistoryCalculator (which
we hopefully can retire soon).

This fixes the availabilty of local ASANified variables on AArch64.

rdar://problem/27757381

llvm-svn: 296847
2017-03-03 01:08:25 +00:00
Kyle Butt 1fa6030767 CodeGen: BlockPlacement: Precompute layout for chains of triangles.
For chains of triangles with small join blocks that can be tail duplicated, a
simple calculation of probabilities is insufficient. Tail duplication
can be profitable in 3 different ways for these cases:

1) The post-dominators marked 50% are actually taken 56% (This shrinks with
   longer chains)
2) The chains are statically correlated. Branch probabilities have a very
   U-shaped distribution.
   [http://nrs.harvard.edu/urn-3:HUL.InstRepos:24015805]
   If the branches in a chain are likely to be from the same side of the
   distribution as their predecessor, but are independent at runtime, this
   transformation is profitable. (Because the cost of being wrong is a small
   fixed cost, unlike the standard triangle layout where the cost of being
   wrong scales with the # of triangles.)
3) The chains are dynamically correlated. If the probability that a previous
   branch was taken positively influences whether the next branch will be
   taken
We believe that 2 and 3 are common enough to justify the small margin in 1.

The code pre-scans a function's CFG to identify this pattern and marks the edges
so that the standard layout algorithm can use the computed results.

llvm-svn: 296845
2017-03-03 01:00:22 +00:00
Taewook Oh 96c6415697 [DAGCombiner] Fix DebugLoc propagation when folding !(x cc y) -> (x !cc y)
Summary:
Currently, when 't1: i1 = setcc t2, t3, cc' followed by 't4: i1 = xor t1, Constant:i1<-1>' is folded into 't5: i1 = setcc t2, t3 !cc', SDLoc of newly created SDValue 't5' follows SDLoc of 't4', not 't1'. However, as the opcode of newly created SDValue is 'setcc', it make more sense to take DebugLoc from 't1' than 't4'. For the code below

```
extern int bar();
extern int baz();

int foo(int x, int y) {
  if (x != y)
    return bar();
  else
    return baz();
}
```

, following is the bitcode representation of 'foo' at the end of llvm-ir level optimization:

```
define i32 @foo(i32 %x, i32 %y) !dbg !4 {
entry:
  tail call void @llvm.dbg.value(metadata i32 %x, i64 0, metadata !9, metadata !11), !dbg !12
  tail call void @llvm.dbg.value(metadata i32 %y, i64 0, metadata !10, metadata !11), !dbg !13
  %cmp = icmp ne i32 %x, %y, !dbg !14
  br i1 %cmp, label %if.then, label %if.else, !dbg !16

if.then:                                          ; preds = %entry
  %call = tail call i32 (...) @bar() #3, !dbg !17
  br label %return, !dbg !18

if.else:                                          ; preds = %entry
  %call1 = tail call i32 (...) @baz() #3, !dbg !19
  br label %return, !dbg !20

return:                                           ; preds = %if.else, %if.then
  %retval.0 = phi i32 [ %call, %if.then ], [ %call1, %if.else ]
  ret i32 %retval.0, !dbg !21
}

!14 = !DILocation(line: 5, column: 9, scope: !15)
!16 = !DILocation(line: 5, column: 7, scope: !4)

```

As you can see, in 'entry' block, 'icmp' instruction and 'br' instruction have different debug locations. However, with current implementation, there's no distinction between debug locations of these two when they are lowered to asm instructions. This is because 'icmp' and 'br' become 'setcc' 'xor' and 'brcond' in SelectionDAG, where SDLoc of 'setcc' follows the debug location of 'icmp' but SDLOC of 'xor' and 'brcond' follows the debug location of 'br' instruction, and SDLoc of 'xor' overwrites SDLoc of 'setcc' when they are folded. This patch addresses this issue.

Reviewers: atrick, bogner, andreadb, craig.topper, aprantl

Reviewed By: andreadb

Subscribers: jlebar, mkuper, jholewinski, andreadb, llvm-commits

Differential Revision: https://reviews.llvm.org/D29813

llvm-svn: 296825
2017-03-02 21:58:35 +00:00
Sanjay Patel 7884dcb788 [DAG] early exit to improve readability and formatting of visitMemCmpCall(); NFCI
llvm-svn: 296824
2017-03-02 21:56:43 +00:00
Kyle Butt 1393761e0c CodeGen: MachineBlockPlacement: Remove the unused outlining heuristic.
Outlining optional branches isn't a good heuristic, and it's never been
on by default. Remove it to clean things up.

llvm-svn: 296818
2017-03-02 21:44:24 +00:00
Zachary Turner d9dc2829ea [Support] Move Stream library from MSF -> Support.
After several smaller patches to get most of the core improvements
finished up, this patch is a straight move and header fixup of
the source.

Differential Revision: https://reviews.llvm.org/D30266

llvm-svn: 296810
2017-03-02 20:52:51 +00:00
Sanjay Patel 209b0f9aad [DAG] improve documentation comments; NFC
llvm-svn: 296808
2017-03-02 20:48:08 +00:00
Simon Pilgrim 7b227fecb5 Fix some Wdocumentation warnings
llvm-svn: 296783
2017-03-02 18:59:07 +00:00
Sanjay Patel fffa179837 [DAGCombiner] avoid assertion when folding binops with opaque constants
This bug was introduced with:
https://reviews.llvm.org/rL296699

There may be a way to loosen the restriction, but for now just bail out
on any opaque constant.

The tests show that opacity is target-specific. This goes back to cost
calculations in ConstantHoisting based on TTI->getIntImmCost().

llvm-svn: 296768
2017-03-02 17:18:56 +00:00
Sanjay Patel f7aba7ba22 fix typo in comment; NFC
llvm-svn: 296760
2017-03-02 16:37:24 +00:00
Serge Pavlov e2bf69715f Do not verify MachimeDominatorTree if it is not calculated
If dominator tree is not calculated or is invalidated, set corresponding
pointer in the pass state to nullptr. Such pointer value will indicate
that operations with dominator tree are not allowed. In particular, it
allows to skip verification for such pass state. The dominator tree is
not calculated if the machine dominator pass was skipped, it occures in
the case of entities with linkage available_externally.

The change fixes some test fails observed when expensive checks
are enabled.

Differential Revision: https://reviews.llvm.org/D29280

llvm-svn: 296742
2017-03-02 12:00:10 +00:00
Matthias Braun dbcf9e2ee4 LiveRegMatrix: Fix some subreg interference checks
Surprisingly, one of the three interference checks in LiveRegMatrix was
using the main live range instead of the apropriate subregister range
resulting in unnecessarily conservative results.

llvm-svn: 296722
2017-03-02 00:35:08 +00:00
Paul Robinson a94f76b18c Remove spurious use of LLVM_FALLTHROUGH (NFC)
llvm-svn: 296713
2017-03-01 23:59:11 +00:00
Amaury Sechet 71f511fd1e [DAGCombiner] mulhi + 1 never overflow.
Summary:
This can be used to optimize large multiplications after legalization.

Depends on D29565

Reviewers: mkuper, spatel, RKSimon, zvi, bkramer, aaboud, craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29587

llvm-svn: 296711
2017-03-01 23:44:17 +00:00
Ahmed Bougacha 120ae22d70 [GlobalISel] Add a way for targets to enable GISel.
Until now, we've had to use -global-isel to enable GISel.  But using
that on other targets that don't support it will result in an abort, as we
can't build a full pipeline.
Additionally, we want to experiment with enabling GISel by default for
some targets: we can't just enable GISel by default, even among those
target that do have some support, because the level of support varies.

This first step adds an override for the target to explicitly define its
level of support.  For AArch64, do that using
a new command-line option (I know..):
  -aarch64-enable-global-isel-at-O=<N>
Where N is the opt-level below which GISel should be used.

Default that to -1, so that we still don't enable GISel anywhere.
We're not there yet!

While there, remove a couple LLVM_UNLIKELYs.  Building the pipeline is
such a cold path that in practice that shouldn't matter at all.

llvm-svn: 296710
2017-03-01 23:33:08 +00:00
Sanjay Patel 92938657a0 [DAGCombiner] fold binops with constant into select-of-constants
This is part of the ongoing attempt to improve select codegen for all targets and select 
canonicalization in IR (see D24480 for more background). The transform is a subset of what
is done in InstCombine's FoldOpIntoSelect().

I first noticed a regression in the x86 avx512-insert-extract.ll tests with a patch that 
hopes to convert more selects to basic math ops. This appears to be a general missing DAG
transform though, so I added tests for all standard binops in rL296621 
(PowerPC was chosen semi-randomly; it has scripted FileCheck support, but so do ARM and x86).

The poor output for "sel_constants_shl_constant" is tracked with:
https://bugs.llvm.org/show_bug.cgi?id=32105

Differential Revision: https://reviews.llvm.org/D30502

llvm-svn: 296699
2017-03-01 22:51:31 +00:00
Victor Leschuk d7bfa40ace [DebugInfo] [DWARFv5] Unique abbrevs for DIEs with different implicit_const values
Take DW_FORM_implicit_const attribute value into account when profiling
DIEAbbrevData.

Currently if we have two similar types with implicit_const attributes and
different values we end up with only one abbrev in .debug_abbrev section.
For example consider two structures: S1 with implicit_const attribute ATTR
and value VAL1 and S2 with implicit_const ATTR and value VAL2.
The .debug_abbrev section will contain only 1 related record:

[N] DW_TAG_structure_type       DW_CHILDREN_yes
        DW_AT_ATTR        DW_FORM_implicit_const  VAL1
        // ....

This is incorrect as struct S2 (with VAL2) will use abbrev record with VAL1.

With this patch we will have two different abbreviations here:

[N] DW_TAG_structure_type       DW_CHILDREN_yes
        DW_AT_ATTR        DW_FORM_implicit_const  VAL1
        // ....

[M] DW_TAG_structure_type       DW_CHILDREN_yes
        DW_AT_ATTR        DW_FORM_implicit_const  VAL2
        // ....

llvm-svn: 296691
2017-03-01 22:13:42 +00:00
Benjamin Kramer 0e429606b0 [DAGCombiner] Remove non-ascii character and reflow comment.
llvm-svn: 296690
2017-03-01 22:10:43 +00:00
Matthias Braun 173e11439e LIU:::Query: Query LiveRange instead of LiveInterval; NFC
- We only need the information from the base class, not the additional
  details in the LiveInterval class.
- Spread more `const`
- Some code cleanup

llvm-svn: 296684
2017-03-01 21:48:12 +00:00
Reid Kleckner f7c0980c10 Elide argument copies during instruction selection
Summary:
Avoids tons of prologue boilerplate when arguments are passed in memory
and left in memory. This can happen in a debug build or in a release
build when an argument alloca is escaped.  This will dramatically affect
the code size of x86 debug builds, because X86 fast isel doesn't handle
arguments passed in memory at all. It only handles the x86_64 case of up
to 6 basic register parameters.

This is implemented by analyzing the entry block before ISel to identify
copy elision candidates. A copy elision candidate is an argument that is
used to fully initialize an alloca before any other possibly escaping
uses of that alloca. If an argument is a copy elision candidate, we set
a flag on the InputArg. If the the target generates loads from a fixed
stack object that matches the size and alignment requirements of the
alloca, the SelectionDAG builder will delete the stack object created
for the alloca and replace it with the fixed stack object. The load is
left behind to satisfy any remaining uses of the argument value. The
store is now dead and is therefore elided. The fixed stack object is
also marked as mutable, as it may now be modified by the user, and it
would be invalid to rematerialize the initial load from it.

Supersedes D28388

Fixes PR26328

Reviewers: chandlerc, MatzeB, qcolombet, inglorion, hans

Subscribers: igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D29668

llvm-svn: 296683
2017-03-01 21:42:00 +00:00
Matthias Braun 702f55bb4a LIU::Query: Remove always false member+getter; NFC
llvm-svn: 296675
2017-03-01 21:02:52 +00:00
Nemanja Ivanovic b223cfabcc Improve scheduling with branch coalescing
This patch adds a MachineSSA pass that coalesces blocks that branch
on the same condition.

Committing on behalf of Lei Huang.

Differential Revision: https://reviews.llvm.org/D28249

llvm-svn: 296670
2017-03-01 20:29:34 +00:00
Nirav Dave 0a4703b5ec [DAG] Prevent Stale nodes from entering worklist
Add check that deleted nodes do not get added to worklist. This can
occur when a node's operand is simplified to an existing node.

This fixes PR32108.

Reviewers: jyknight, hfinkel, chandlerc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30506

llvm-svn: 296668
2017-03-01 20:19:38 +00:00
Paul Robinson d4f1c487f3 Alphabetize some cases (NFC)
llvm-svn: 296655
2017-03-01 19:01:47 +00:00
Paul Robinson 91d74813a6 [DWARF] Default lower bound should respect requested DWARF version.
DWARF may define a default lower-bound for arrays in languages defined
in a particular DWARF version.  But the logic to suppress an
unnecessary lower-bound attribute was looking at the hard-coded
default DWARF version, not the version that had been requested.

Also updated the list with all languages defined in DWARF v5.

Differential Revision: http://reviews.llvm.org/D30484

llvm-svn: 296652
2017-03-01 18:32:37 +00:00
Artur Pilipenko e1b2d31468 [DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combine
Resubmit r295336 after the bug with non-zero offset patterns on BE targets is fixed (r296336).

Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters.

Reviewed By: filcab

Differential Revision: https://reviews.llvm.org/D29591

llvm-svn: 296651
2017-03-01 18:12:29 +00:00
Ahmed Bougacha 20b3e9a835 [CodeGen] Remove dead FastISel code after SDAG emitted a tailcall.
When SDAGISel (top-down) selects a tail-call, it skips the remainder
of the block.

If, before that, FastISel (bottom-up) selected some of the (no-op) next
few instructions, we can end up with dead instructions following the
terminator (selected by SDAGISel).

We need to erase them, as we know they aren't necessary (in addition to
being incorrect).

We already do this when FastISel falls back on the tail-call itself.
Also remove the FastISel-emitted code if we fallback on the
instructions between the tail-call and the return.

llvm-svn: 296552
2017-03-01 00:43:42 +00:00
Ahmed Bougacha 67d1c7c3c2 [GlobalISel] Replace all combined G_EXTRACT uses.
Iterating on the use-list we're modifying doesn't work: after the first
iteration, the use-list iterator will point to a MachineOperand
referencing the new register.  This caused us to skip the other uses to
replace.

Instead, use MRI.replaceRegWith(), which accounts for this behavior.

llvm-svn: 296551
2017-03-01 00:43:39 +00:00
Paul Robinson 3443575c03 Add missing module/license header. NFC.
llvm-svn: 296550
2017-03-01 00:14:42 +00:00
Paul Robinson cddd60445e [DWARFv5] Emit new unit header format.
Requesting DWARF v5 will now get you the new compile-unit and
type-unit headers.  llvm-dwarfdump will also recognize them.

Differential Revision: http://reviews.llvm.org/D30206

llvm-svn: 296514
2017-02-28 20:24:55 +00:00
Sanjay Patel ea61ea9f19 [DAGCombiner] use dyn_cast values in foldSelectOfConstants(); NFC
llvm-svn: 296502
2017-02-28 18:41:49 +00:00
Craig Topper 419f145ebb [DAGISel] When checking if chain node is foldable, make sure the intermediate nodes have a single use across all results not just the result that was used to reach the chain node.
This recovers a test case that was severely broken by r296476, my making sure we don't create ADD/ADC that loads and stores when there is also a flag dependency.

llvm-svn: 296486
2017-02-28 16:52:05 +00:00
David Bozier 5159968786 [Stack Protection] Add diagnostic information for why stack protection was applied to a function
Stack Smash Protection is not completely free, so in hot code, the overhead it causes can cause performance issues. By adding diagnostic information for which functions have SSP and why, a user can quickly determine what they can do to stop SSP being applied to a specific hot function.

This change adds a remark that is reported by the stack protection code when an instruction or attribute is encountered that causes SSP to be applied.

Patch by: James Henderson

Differential Revision: https://reviews.llvm.org/D29023

llvm-svn: 296483
2017-02-28 16:02:37 +00:00
Daniel Sanders 983c9b98e9 Revert r296474 - [globalisel] Change LLT constructor string into an LLT subclass that knows how to generate it.
There's a circular dependency that's only revealed when LLVM_ENABLE_MODULES=1.

llvm-svn: 296478
2017-02-28 15:00:27 +00:00
Nirav Dave f830dec3f2 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 296476
2017-02-28 14:24:15 +00:00
Daniel Sanders a5afdefec6 [globalisel] Change LLT constructor string into an LLT subclass that knows how to generate it.
Summary:
This will allow future patches to inspect the details of the LLT. The implementation is now split between
the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns.

Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem.

Reviewers: t.p.northover, qcolombet, rovka, aditya_nandakumar, ab, javed.absar

Subscribers: arsenm, nhaehnle, mgorny, dberris, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30046

llvm-svn: 296474
2017-02-28 14:21:31 +00:00
Sanjoy Das eef785c1a5 [ImplicitNullCheck] Add alias analysis usage
Summary:
With this change ImplicitNullCheck optimization uses alias analysis
and can use load/store memory access for implicit null check if there
are other load/store before but memory accesses do not alias.

Patch by Serguei Katkov!

Reviewers: sanjoy

Reviewed By: sanjoy

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30331

llvm-svn: 296440
2017-02-28 07:04:49 +00:00
Matthias Braun 81f68ec3a9 Revert "Add MIR-level outlining pass"
Revert Machine Outliner for now, as it breaks the asan bot.

This reverts commit r296418.

llvm-svn: 296426
2017-02-28 02:24:30 +00:00
Matthias Braun d36410945f Add MIR-level outlining pass
This is a patch for the outliner described in the RFC at:
http://lists.llvm.org/pipermail/llvm-dev/2016-August/104170.html

The outliner is a code-size reduction pass which works by finding
repeated sequences of instructions in a program, and replacing them with
calls to functions. This is useful to people working in low-memory
environments, where sacrificing performance for space is acceptable.

This adds an interprocedural outliner directly before printing assembly.
For reference on how this would work, this patch also includes X86
target hooks and an X86 test.

The outliner is run like so:

clang -mno-red-zone -mllvm -enable-machine-outliner file.c

Patch by Jessica Paquette<jpaquette@apple.com>!

rdar://29166825

Differential Revision: https://reviews.llvm.org/D26872

llvm-svn: 296418
2017-02-28 00:33:32 +00:00
Michael Kuperstein 13bf8a2684 [CGP] Split some critical edges coming out of indirect branches
Splitting critical edges when one of the source edges is an indirectbr
is hard in general (because it requires changing the memory the indirectbr
reads). But if a block only has a single indirectbr predecessor (which is
the common case), we can simulate splitting that edge by splitting
the destination block, and retargeting the *direct* branches.

This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame()
ends up using an indirect branch with ~100 successors, and passing a constant to
each of those. Since MachineSink can't break indirect critical edges on demand
(and doing this in MIR doesn't look feasible), this causes us to emit about ~100
defs of registers containing constants, which we in the predecessor block, where
only one of those constants is used in each successor. So, at each computed goto,
we needlessly spill about a 100 constants to stack. The end result is that a
clang-compiled python interpreter can be about ~2.5x slower on a simple python
reduction loop than a gcc-compiled interpreter.

Differential Revision: https://reviews.llvm.org/D29916

llvm-svn: 296416
2017-02-28 00:11:34 +00:00
Zachary Turner 695ed56ba5 [PDB] Make streams carry their own endianness.
Before the endianness was specified on each call to read
or write of the StreamReader / StreamWriter, but in practice
it's extremely rare for streams to have data encoded in
multiple different endiannesses, so we should optimize for the
99% use case.

This makes the code cleaner and more general, but otherwise
has NFC.

llvm-svn: 296415
2017-02-28 00:04:07 +00:00
Eugene Zelenko fa912a7151 [CodeGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 296404
2017-02-27 22:45:06 +00:00
Arnold Schwaighofer b2605f31ed ISel: We need to notify FastIS of the IMPLICIT_DEF we created in createSwiftErrorEntriesInEntryBlock
Otherwise, it will insert instructions before it.

rdar://30536186

llvm-svn: 296395
2017-02-27 22:12:06 +00:00
Zachary Turner 120faca41b [PDB] Partial resubmit of r296215, which improved PDB Stream Library.
This was reverted because it was breaking some builds, and
because of incorrect error code usage.  Since the CL was
large and contained many different things, I'm resubmitting
it in pieces.

This portion is NFC, and consists of:

1) Renaming classes to follow a consistent naming convention.
2) Fixing the const-ness of the interface methods.
3) Adding detailed doxygen comments.
4) Fixing a few instances of passing `const BinaryStream& X`.  These
   are now passed as `BinaryStreamRef X`.

llvm-svn: 296394
2017-02-27 22:11:43 +00:00
Matt Arsenault 4a7cc16e89 Revert "DAG: Check if extract_vector_elt is legal or custom"
This reverts r295782. This could potentially result in some
legalization loops and I avoided the need for this.

llvm-svn: 296393
2017-02-27 21:59:07 +00:00
Simon Pilgrim 5c4efcdddf [X86][SSE] Attempt to extract vector elements through target shuffles
DAGCombiner already supports peeking thorough shuffles to improve vector element extraction, but legalization often leaves us in situations where we need to extract vector elements after shuffles have already been lowered.

This patch adds support for VECTOR_EXTRACT_ELEMENT/PEXTRW/PEXTRB instructions to attempt to handle target shuffles as well. I've covered some basic scenarios including handling shuffle mask scaling and the implicit zero-extension of PEXTRW/PEXTRB, there is more that could be done here (that I've mentioned in TODOs) but I haven't found many cases where its worth it.

Differential Revision: https://reviews.llvm.org/D30176

llvm-svn: 296381
2017-02-27 21:01:57 +00:00
Taewook Oh a49eb8578a [TailDuplicator] Maintain DebugLoc for branch instructions
Summary: Existing implementation of duplicateSimpleBB function drops DebugLoc metadata of branch instructions during the transformation. This patch addresses this issue by making newly created branch instructions to keep the metadata of replaced branch instructions.

Reviewers: qcolombet, craig.topper, aprantl, MatzeB, sanjoy, dblaikie

Reviewed By: dblaikie

Subscribers: dblaikie, llvm-commits

Differential Revision: https://reviews.llvm.org/D30026

llvm-svn: 296371
2017-02-27 19:30:01 +00:00
Artur Pilipenko f7196c8d9e [DAGCombine] Fix for a load combine bug with non-zero offset patterns on BE targets
This pattern is essentially a i16 load from p+1 address:

  %p1.i16 = bitcast i8* %p to i16*
  %p2.i8 = getelementptr i8, i8* %p, i64 2
  %v1 = load i16, i16* %p1.i16
  %v2.i8 = load i8, i8* %p2.i8
  %v2 = zext i8 %v2.i8 to i16
  %v1.shl = shl i16 %v1, 8
  %res = or i16 %v1.shl, %v2

Current implementation would identify %v1 load as the first byte load and would mistakenly emit a i16 load from %p1.i16 address. This patch adds a check that the first byte is loaded from a non-zero offset of the first load address. This way this address can be used as the base address for the combined value. Otherwise just give up combining.

llvm-svn: 296336
2017-02-27 13:04:23 +00:00
Artur Pilipenko c43b20a43b [DAGCombine] NFC. MatchLoadCombine extract MemoryByteOffset lambda helper
This refactoring will simplify the upcoming change to fix the bug in folding patterns with non-zero offsets on BE targets.

llvm-svn: 296332
2017-02-27 11:42:54 +00:00
Artur Pilipenko f2c26e0bf2 [DAGCombine] NFC. MatchLoadCombine remember the first byte provider, not the load node
This refactoring will simplify the upcoming change to fix a bug in folding patterns with non-zero offsets on BE targets.

llvm-svn: 296331
2017-02-27 11:40:14 +00:00
Daniel Jasper 3ca4525612 Revert "[CGP] Split some critical edges coming out of indirect branches"
This reverts commit r296149 as it leads to crashes when compiling for
PPC.

llvm-svn: 296295
2017-02-26 11:09:12 +00:00
Nirav Dave 73cd0194cf Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r296252 until 256-bit operations are more efficiently generated in X86.

llvm-svn: 296279
2017-02-26 01:27:32 +00:00
Craig Topper 3b8aca2ecf [ExecutionDepsFix] Don't make copies of LiveReg objects when collecting operands for soft instructions
Summary:
While collecting operands we make copies of the LiveReg objects which are stored in the LiveRegs array. If the instruction uses the same register multiple times we end up with multiple copies. Later we iterate through the collected list of LiveReg objects and merge DomainValues. In the process of doing this the merge function can change the contents of the original LiveReg object in the LiveRegs array, but not the copies that have been made. So when we get to the second usage of the register we end up seeing a stale copy of the LiveReg object.

To fix this I've stopped copying and now just store a pointer to the original LiveReg object. Another option might be to avoid adding the same register to the Regs array twice, but this approach seemed simpler.

The included test case exposes this bug due to an AVX-512 masked OR instruction using the same register for the passthru operand and one of the inputs to the OR operation.

Fixes PR30284.

Reviewers: RKSimon, stoklund, MatzeB, spatel, myatsina

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30242

llvm-svn: 296260
2017-02-25 18:12:25 +00:00
Artyom Skrobov ac56719231 No need to copy the variable [NFC]
llvm-svn: 296259
2017-02-25 17:18:09 +00:00
NAKAMURA Takumi 05a75e40da Revert r296215, "[PDB] General improvements to Stream library." and followings.
r296215, "[PDB] General improvements to Stream library."
r296217, "Disable BinaryStreamTest.StreamReaderObject temporarily."
r296220, "Re-enable BinaryStreamTest.StreamReaderObject."
r296244, "[PDB] Disable some tests that are breaking bots."
r296249, "Add static_cast to silence -Wc++11-narrowing."

std::errc::no_buffer_space should be used for OS-oriented errors for socket transmission.
(Seek discussions around llvm/xray.)

I could substitute s/no_buffer_space/others/g, but I revert whole them ATM.

Could we define and use LLVM errors there?

llvm-svn: 296258
2017-02-25 17:04:23 +00:00
Nirav Dave beabf456df In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 296252
2017-02-25 11:43:58 +00:00
Junmo Park 061bec802e Minor code cleanup. NFC.
llvm-svn: 296222
2017-02-25 01:50:45 +00:00
Zachary Turner af299ea5d4 [PDB] General improvements to Stream library.
This adds various new functionality and cleanup surrounding the
use of the Stream library.  Major changes include:

* Renaming of all classes for more consistency / meaningfulness
* Addition of some new methods for reading multiple values at once.
* Full suite of unit tests for reader / writer functionality.
* Full set of doxygen comments for all classes.
* Streams now store their own endianness.
* Fixed some bugs in a few of the classes that were discovered
  by the unit tests.

llvm-svn: 296215
2017-02-25 00:44:30 +00:00
Zachary Turner d2684b7969 [PDB] Rename Stream related source files.
This is part of a larger effort to get the Stream code moved
up to Support.  I don't want to do it in one large patch, in
part because the changes are so big that it will treat everything
as file deletions and add, losing history in the process.
Aside from that though, it's just a good idea in general to
make small changes.

So this change only changes the names of the Stream related
source files, and applies necessary source fix ups.

llvm-svn: 296211
2017-02-25 00:33:34 +00:00
Dan Gohman 82607f56bd [WebAssembly] Add support for using a wasm global for the stack pointer.
This replaces the __stack_pointer variable which was allocated in linear
memory.

llvm-svn: 296201
2017-02-24 23:46:05 +00:00
Dan Gohman d934cb8806 [WebAssembly] Basic support for Wasm object file encoding.
With the "wasm32-unknown-unknown-wasm" triple, this allows writing out
simple wasm object files, and is another step in a larger series toward
migrating from ELF to general wasm object support. Note that this code
and the binary format itself is still experimental.

llvm-svn: 296190
2017-02-24 23:18:00 +00:00
Stanislav Mekhanoshin 42259cf35e Revert "Correct register pressure calculation in presence of subregs"
This reverts commit r296009. It broke one out of tree target and also
does not account for all partial lines added or removed when calculating
PressureDiff.

llvm-svn: 296182
2017-02-24 21:56:16 +00:00
Tim Northover ef29e7284b GlobalISel: check for CImm rather than Imm on G_CONSTANTs.
All G_CONSTANTS created by the MachineIRBuilder have an operand of type CImm
(i.e. a ConstantInt), so that's what the selector needs to look for.

llvm-svn: 296176
2017-02-24 21:21:38 +00:00
Eli Friedman c12a5a7595 [CodeGenPrepare] Make -addr-sink-using-gep work with address spaces.
When we construct addressing modes, we use isNoopAddrSpaceCast to ignore
addrspacecast instructions. Make sure we insert the correct addrspacecast
when we reconstruct the addressing mode.

Differential Revision: https://reviews.llvm.org/D30114

llvm-svn: 296167
2017-02-24 20:51:36 +00:00
Michael Kuperstein 46b131e3f8 [CGP] Split some critical edges coming out of indirect branches
Splitting critical edges when one of the source edges is an indirectbr
is hard in general (because it requires changing the memory the indirectbr
reads). But if a block only has a single indirectbr predecessor (which is
the common case), we can simulate splitting that edge by splitting
the destination block, and retargeting the *direct* branches.

This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame()
ends up using an indirect branch with ~100 successors, and passing a constant to
each of those. Since MachineSink can't break indirect critical edges on demand
(and doing this in MIR doesn't look feasible), this causes us to emit about ~100
defs of registers containing constants, which we in the predecessor block, where
only one of those constants is used in each successor. So, at each computed goto,
we needlessly spill about a 100 constants to stack. The end result is that a
clang-compiled python interpreter can be about ~2.5x slower on a simple python
reduction loop than a gcc-compiled interpreter.

Differential Revision: https://reviews.llvm.org/D29916

llvm-svn: 296149
2017-02-24 18:41:32 +00:00
Sanjay Patel 832b1622d8 [DAGCombiner] add missing folds for scalar select of {-1,0,1}
The motivation for filling out these select-of-constants cases goes back to D24480, 
where we discussed removing an IR fold from add(zext) --> select. And that goes back to:
https://reviews.llvm.org/rL75531
https://reviews.llvm.org/rL159230

The idea is that we should always canonicalize patterns like this to a select-of-constants 
in IR because that's the smallest IR and the best for value tracking. Note that we currently 
do the opposite in some cases (like the cases in *this* patch). Ie, the proposed folds in 
this patch already exist in InstCombine today:
https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineSelect.cpp#L1151

As this patch shows, most targets generate better machine code for simple ext/add/not ops 
rather than a select of constants. So the follow-up steps to make this less of a patchwork 
of special-case folds and missing IR canonicalization:

1. Have DAGCombiner convert any select of constants into ext/add/not ops.
2  Have InstCombine canonicalize in the other direction (create more selects).

Differential Revision: https://reviews.llvm.org/D30180

llvm-svn: 296137
2017-02-24 17:17:33 +00:00
Daniel Sanders 066ebbfd46 [globalisel] Decouple src pattern operands from dst pattern operands.
Summary:
This isn't testable for AArch64 by itself so this patch also adds
support for constant immediates in the pattern and physical
register uses in the result.

The new IntOperandMatcher matches the constant in patterns such as
'(set $rd:GPR32, (G_XOR $rs:GPR32, -1))'. It's always safe to fold
immediates into an instruction so this is the first rule that will match
across multiple BB's.

The Renderer hierarchy is responsible for adding operands to the result
instruction. Renderers can copy operands (CopyRenderer) or add physical
registers (in particular %wzr and %xzr) to the result instruction
in any order (OperandMatchers now import the operand names from
SelectionDAG to allow renderers to access any operand). This allows us to
emit the result instruction for:
  %1 = G_XOR %0, -1 --> %1 = ORNWrr %wzr, %0
  %1 = G_XOR -1, %0 --> %1 = ORNWrr %wzr, %0
although the latter is untested since the matcher/importer has not been
taught about commutativity yet.

Added BuildMIAction which can build new instructions and mutate them where
possible. W.r.t the mutation aspect, MatchActions are now told the name of
an instruction they can recycle and BuildMIAction will emit mutation code
when the renderers are appropriate. They are appropriate when all operands
are rendered using CopyRenderer and the indices are the same as the matcher.
This currently assumes that all operands have at least one matcher.

Finally, this change also fixes a crash in
AArch64InstructionSelector::select() caused by an immediate operand
passing isImm() rather than isCImm(). This was uncovered by the other
changes and was detected by existing tests.

Depends on D29711

Reviewers: t.p.northover, ab, qcolombet, rovka, aditya_nandakumar, javed.absar

Reviewed By: rovka

Subscribers: aemerson, dberris, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D29712

llvm-svn: 296131
2017-02-24 15:43:30 +00:00
Justin Bogner 259a0cf3ee Add missing initialization for MachineOptimizationRemarkEmitter
This was missed in r293110.

llvm-svn: 296096
2017-02-24 07:42:35 +00:00
Craig Topper c446b1ffae [ExecutionDepsFix] Use range-based for loop. NFC
llvm-svn: 296093
2017-02-24 06:38:24 +00:00
Michael Kuperstein 581c9f4b20 Revert r269060 to pacify bots.
llvm-svn: 296064
2017-02-24 01:22:19 +00:00
Michael Kuperstein 12e79d5002 [CGP] Split some critical edges coming out of indirect branches
Splitting critical edges when one of the source edges is an indirectbr
is hard in general (because it requires changing the memory the indirectbr
reads). But if a block only has a single indirectbr predecessor (which is
the common case), we can simulate splitting that edge by splitting
the destination block, and retargeting the *direct* branches.

This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame()
ends up using an indirect branch with ~100 successors, and passing a constant to
each of those. Since MachineSink can't break indirect critical edges on demand
(and doing this in MIR doesn't look feasible), this causes us to emit about ~100
defs of registers containing constants, which we in the predecessor block, where
only one of those constants is used in each successor. So, at each computed goto,
we needlessly spill about a 100 constants to stack. The end result is that a
clang-compiled python interpreter can be about ~2.5x slower on a simple python
reduction loop than a gcc-compiled interpreter.

Differential Revision: https://reviews.llvm.org/D29916

llvm-svn: 296060
2017-02-24 00:56:21 +00:00
Ahmed Bougacha 7daaf88c70 [GlobalISel] Use the same name for all remarks.
While there, switch to the explicit ctor.

llvm-svn: 296059
2017-02-24 00:34:47 +00:00
Ahmed Bougacha 7c88a4e12b [GlobalISel] Use the DISubprogram for translation failure remarks.
Justin added support for DISubprogram locs in r295531 and r296052.
Use that instead of no-loc for constants and arguments.

llvm-svn: 296058
2017-02-24 00:34:44 +00:00
Ahmed Bougacha 8f9e99bcb6 [GlobalISel] Remove now-unnecessary variable. NFC.
Since r296047, we're able to return early on failures.
Don't track whether we succeeded.

llvm-svn: 296057
2017-02-24 00:34:41 +00:00
Justin Bogner 369ba753aa OptDiag: Summarize the instruction count in asm-printer
Add an optimization remark to asm-printer that summarizes the number
of instructions emitted per function.

llvm-svn: 296053
2017-02-24 00:19:22 +00:00
Ahmed Bougacha 4f8dd0202d [GlobalISel] Don't translate other blocks when one failed.
We were stopping the translation of the parent block when the
translation of an instruction failed, but we were still trying to
translate the other blocks of the parent function.

Don't do that.

llvm-svn: 296047
2017-02-23 23:57:36 +00:00
Ahmed Bougacha eceabddcfd [GlobalISel] Finalize translated function on scope exit. NFC.
This is the compromise between having a per-function IRTranslator
and manually managing the per-function state.

llvm-svn: 296046
2017-02-23 23:57:28 +00:00
Kyle Butt ebe6cc4dad CodeGen: MachineBlockPlacement: Rename member to more general name. NFC.
Rename ComputedTrellisEdges to ComputedEdges to allow for other methods of
pre-computing edges.

Differential Revision: https://reviews.llvm.org/D30308

llvm-svn: 296018
2017-02-23 21:22:24 +00:00
Ahmed Bougacha ae9dadecf3 [GlobalISel] Emit opt remarks on isel fallbacks.
Having more fine-grained information on the specific construct that
caused us to fallback is valuable for large-scale data collection.

We still have the fallback warning, that's also used for FastISel.
We still need to remove the fallback warning, and teach FastISel to also
emit remarks (it currently has a combination of the warning, stats, and
debug prints: the remarks could unify all three).

The abort-on-fallback path could also be better handled using remarks:
one could imagine a "-Rpass-error", analoguous to "-Werror", which would
promote missed/failed remarks to errors.  It's not clear whether that
would be useful for other remarks though, so we're not there yet.

llvm-svn: 296013
2017-02-23 21:05:42 +00:00
Ahmed Bougacha 272739751d [CodeGen] Teach opt remarks how to print MI instructions.
This will be used with GISel opt remarks.

llvm-svn: 296012
2017-02-23 21:05:33 +00:00
Ahmed Bougacha 97119d48db [CodeGen] Print MI without a newline when skipping debugloc. NFC.
This matches the behavior for skip-operands. While there, document it.
This is a follow-up to r296007.

llvm-svn: 296011
2017-02-23 21:05:29 +00:00
Stanislav Mekhanoshin ce3ddd2de4 Correct register pressure calculation in presence of subregs
If a subreg is used in an instruction it counts as a whole superreg
for the purpose of register pressure calculation. This patch corrects
improper register pressure calculation by examining operand's lane mask.

Differential Revision: https://reviews.llvm.org/D29835

llvm-svn: 296009
2017-02-23 20:19:44 +00:00
Ahmed Bougacha 4319224628 [CodeGen] Add a way to SkipDebugLoc in MachineInstr::print(). NFC.
llvm-svn: 296007
2017-02-23 19:17:31 +00:00
Ahmed Bougacha 1b3339447d [GlobalISel] Simplify Select type cleanup using a ScopeExit. NFC.
This lets us use more natural early-returns when selection fails.

llvm-svn: 296006
2017-02-23 19:17:24 +00:00
Sanjay Patel 4a4fbe162f [DAG] add convenience function to get -1 constant; NFCI
llvm-svn: 296004
2017-02-23 19:02:33 +00:00
Adam Nemet b516cf3f3f [LazyMachineBFI] Reimplement with getAnalysisIfAvailable
Since LoopInfo is not available in machine passes as universally as in IR
passes, using the same approach for OptimizationRemarkEmitter as we did for IR
will run LoopInfo and DominatorTree unnecessarily.  (LoopInfo is not used
lazily by ORE.)

To fix this, I am modifying the approach I took in D29836.  LazyMachineBFI now
uses its client passes including MachineBFI itself that are available or
otherwise compute them on the fly.

So for example GreedyRegAlloc, since it's already using MBFI, will reuse that
instance.  On the other hand, AsmPrinter in Justin's patch will generate DT,
LI and finally BFI on the fly.

(I am of course wondering now if the simplicity of this approach is even
preferable in IR.  I will do some experiments.)

Testing is provided by an updated version of D29837 which requires Justin's
patch to bring ORE to the AsmPrinter.

Differential Revision: https://reviews.llvm.org/D30128

llvm-svn: 295996
2017-02-23 17:30:01 +00:00
Simon Pilgrim 858d8e672d Fix signed/unsigned comparison warning on MSVC
llvm-svn: 295962
2017-02-23 12:00:34 +00:00
Eugene Zelenko db56e5a89a [CodeGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 295893
2017-02-22 22:32:51 +00:00
Bill Seurer 8e48f416ad [DAGCombiner] revert r295336
r295336 causes a bootstrapped clang to fail for many compilations on
powerpc BE.  See 
http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/2315
for example.

Reverting as per the developer's request.

llvm-svn: 295849
2017-02-22 16:27:33 +00:00
Igor Breger f7359d893a [X86][GlobalISel] Initial implementation , select G_ADD gpr, gpr
Summary: Initial implementation for X86InstructionSelector. Handle selection COPY and G_ADD/G_SUB gpr, gpr .

Reviewers: qcolombet, rovka, zvi, ab

Reviewed By: rovka

Subscribers: mgorny, dberris, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D29816

llvm-svn: 295824
2017-02-22 12:25:09 +00:00
Dan Gohman 18eafb6c68 [WebAssembly] Add skeleton MC support for the Wasm container format
This just adds the basic skeleton for supporting a new object file format.
All of the actual encoding will be implemented in followup patches.

Differential Revision: https://reviews.llvm.org/D26722

llvm-svn: 295803
2017-02-22 01:23:18 +00:00
Matt Arsenault f0a4823b91 DAG: Check if extract_vector_elt is legal or custom
Avoids test regressions in future AMDGPU commits when
more vector types are custom lowered.

llvm-svn: 295782
2017-02-21 22:47:27 +00:00
Eugene Zelenko 49e2fc4f5f [CodeGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 295773
2017-02-21 22:07:52 +00:00
Geoff Berry 5d534b6a11 [CodeGenPrepare] Sink and duplicate more 'and' instructions.
Summary:
Rework the code that was sinking/duplicating (icmp and, 0) sequences
into blocks where they were being used by conditional branches to form
more tbz instructions on AArch64.  The new code is more general in that
it just looks for 'and's that have all icmp 0's as users, with a target
hook used to select which subset of 'and' instructions to consider.
This change also enables 'and' sinking for X86, where it is more widely
beneficial than on AArch64.

The 'and' sinking/duplicating code is moved into the optimizeInst phase
of CodeGenPrepare, where it can take advantage of the fact the
OptimizeCmpExpression has already sunk/duplicated any icmps into the
blocks where they are used.  One minor complication from this change is
that optimizeLoadExt needed to be updated to always mark 'and's it has
determined should be in the same block as their feeding load in the
InsertedInsts set to avoid an infinite loop of hoisting and sinking the
same 'and'.

This change fixes a regression on X86 in the tsan runtime caused by
moving GVNHoist to a later place in the optimization pipeline (see
PR31382).

Reviewers: t.p.northover, qcolombet, MatzeB

Subscribers: aemerson, mcrosier, sebpop, llvm-commits

Differential Revision: https://reviews.llvm.org/D28813

llvm-svn: 295746
2017-02-21 18:53:14 +00:00
Matthias Braun 9ab403942b ScheduleDAG: Cleanup; NFC
- Fix doxygen comments (do not repeat documented name, remove definition
    comment if there is already one at the declaration, add \p, ...)
- Add some const modifiers
- Use range based for

llvm-svn: 295688
2017-02-21 01:27:33 +00:00
Taewook Oh 4cf5c1087c [BranchFolding] Update debug location along with the update of branch instruction.
Summary:
Currently, BranchFolder drops DebugLoc for branch instructions in some places. For example, for the test code attached, the branch instruction of 'entry' block has a DILocation of

```
!12 = !DILocation(line: 6, column: 3, scope: !11)
```

, but this information is gone when then block is lowered because BranchFolder misses it. This patch is a fix for this issue.

Reviewers: qcolombet, aprantl, craig.topper, MatzeB

Reviewed By: aprantl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29902

llvm-svn: 295684
2017-02-21 00:12:38 +00:00
Simon Pilgrim c0dc9a4913 Strip trailing whitespace.
llvm-svn: 295653
2017-02-20 11:56:43 +00:00
Simon Pilgrim 50b958c07a [SelectionDAG] Add scalarization support for ISD::*_EXTEND_VECTOR_INREG opcodes.
Thanks to Mikael Holmén for the initial test case

llvm-svn: 295652
2017-02-20 11:55:58 +00:00
Artyom Skrobov be31754094 Remove redundant call to GluedNodes.back() [NFC]
llvm-svn: 295607
2017-02-19 16:56:18 +00:00
Matthias Braun 431305927f MachineRegionInfo: Fix pass initialization
- Adapt MachineBasicBlock::getName() to have the same behavior as the IR
  BasicBlock (Value::getName()).
- Add it to lib/CodeGen/CodeGen.cpp::initializeCodeGen so that it is linked in
  the CodeGen library.
- MachineRegionInfoPass's name conflicts with RegionInfoPass's name ("region").
- MachineRegionInfo should depend on MachineDominatorTree,
  MachinePostDominatorTree and MachineDominanceFrontier instead of their
  respective IR versions.
- Since there were no tests for this, add a X86 MIR test.

Patch by Francis Visoiu Mistrih<fvisoiumistrih@apple.com>

llvm-svn: 295518
2017-02-18 00:41:16 +00:00
Eugene Zelenko be37db1882 [CodeGen] Revert changes in LowLevelType to pre-r295499 to fix broken buildbots.
llvm-svn: 295505
2017-02-17 22:23:34 +00:00
Eugene Zelenko 5db84df728 [CodeGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 295499
2017-02-17 21:43:25 +00:00
Adrian Prantl 67c2442210 Debug Info: Sort frame index expressions before emitting them.
This fixes PR31381, which caused an assertion and/or invalid debug info.

This affects debug variables that have multiple fragments in the MMI
side (i.e.: in the stack frame) table.
rdar://problem/30571676

llvm-svn: 295486
2017-02-17 19:42:32 +00:00
Tim Northover 88634996c7 GlobalISel: verify that generic loads & stores have a mem operand.
The mem operand is used by GlobalISel to convey atomic constraints so dropping
it is invalid.

llvm-svn: 295476
2017-02-17 18:50:15 +00:00
Sanjay Patel 7f2e58972c [DAGCombiner] split i1 select-of-constants from non-i1 case; NFCI
I can't find any tests of the non-i1 code path, so it may be unnecessary at this point.

llvm-svn: 295463
2017-02-17 17:13:27 +00:00
Simon Pilgrim 0429c0cf8b Fix signed/unsigned comparison warning.
llvm-svn: 295453
2017-02-17 16:01:16 +00:00
Simon Pilgrim 511d788a95 [DAGCombine] Recognise any_extend_vector_inreg and truncation style shuffle masks
During legalization we are often creating shuffles (via a build_vector scalarization stage) that are "any_extend_vector_inreg" style masks, and also other masks that are the equivalent of "truncate_vector_inreg" (if we had such a thing).

This patch is an attempt to match these cases to help undo the effects of just leaving shuffle lowering to handle it - which typically means we lose track of the undefined elements of the shuffles resulting in an unnecessary extension+truncation stage for widened illegal types.

The 2011-10-21-widen-cmp.ll regression will be fixed by making SIGN_EXTEND_VECTOR_IN_REG legal in SSE instead of lowering them to X86ISD::VSEXT (PR31712).

Differential Revision: https://reviews.llvm.org/D29454

llvm-svn: 295451
2017-02-17 15:14:48 +00:00
Sanjay Patel 5573042035 [DAGCombiner] improve readability; NFCI
llvm-svn: 295447
2017-02-17 14:21:59 +00:00
Teresa Johnson 76b5b7493c Handle link of NoDebug CU with a CU that has debug emission enabled
Summary:
This is an issue both with regular and Thin LTO. When we link together
a DICompileUnit that is marked NoDebug (e.g when compiling with -g0
but applying an AutoFDO profile, which requires location tracking
in the compiler) and a DICompileUnit with debug emission enabled,
we can have failures during dwarf debug generation. Specifically,
when we have inlined from the NoDebug compile unit into the debug
compile unit, we can fail during construction of the abstract and
inlined scope DIEs. This is because the SPMap does not include NoDebug
CUs (they are skipped in the debug_compile_units_iterator).

This patch fixes the failures by skipping locations from NoDebug CUs
when extracting lexical scopes.

Reviewers: dblaikie, aprantl

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D29765

llvm-svn: 295384
2017-02-17 00:21:19 +00:00
Benjamin Kramer 3f6260cab4 [MachinePipeliner] Remove redundant destructor. NFC.
llvm-svn: 295372
2017-02-16 20:26:51 +00:00
David Blaikie b2fbb4b276 Refactor DebugHandlerBase a bit to common non-debug-having-function filtering
llvm-svn: 295354
2017-02-16 18:48:33 +00:00
Artur Pilipenko 85d758299e [DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combine
Resubmit -r295314 with PowerPC and AMDGPU tests updated.

Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters.

Reviewed By: filcab

Differential Revision: https://reviews.llvm.org/D29591

llvm-svn: 295336
2017-02-16 17:07:27 +00:00
Artur Pilipenko a1b384c4ce Rever -r295314 "[DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combine"
This change causes some of AMDGPU and PowerPC tests to fail.

llvm-svn: 295316
2017-02-16 13:04:46 +00:00
Artur Pilipenko daaa0c0f7d [DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combine
Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters.

Reviewed By: filcab

Differential Revision: https://reviews.llvm.org/D29591

llvm-svn: 295314
2017-02-16 12:53:26 +00:00
Diana Picus ca6a890d7f [ARM] GlobalISel: Lower double precision FP args
For the hard float calling convention, we just use the D registers.

For the soft-fp calling convention, we use the R registers and move values
to/from the D registers by means of G_SEQUENCE/G_EXTRACT. While doing so, we
make sure to honor the endianness of the target, since the CCAssignFn doesn't do
that for us.

For pure soft float targets, we still bail out because we don't support the
libcalls yet.

llvm-svn: 295295
2017-02-16 07:53:07 +00:00
Hans Wennborg a468601e0e [X86] Re-enable conditional tail calls and fix PR31257.
This reverts r294348, which removed support for conditional tail calls
due to the PR above. It fixes the PR by marking live registers as
implicitly used and defined by the now predicated tailcall. This is
similar to how IfConversion predicates instructions.

Differential Revision: https://reviews.llvm.org/D29856

llvm-svn: 295262
2017-02-16 00:04:05 +00:00
Tim Northover 9136617a3f GlobalISel: legalize va_arg on AArch64.
Uses a Custom implementation because the slot sizes being a multiple of the
pointer size isn't really universal, even for the architectures that do have a
simple "void *" va_list.

llvm-svn: 295255
2017-02-15 23:22:50 +00:00
Tim Northover 4a652227dd GlobalISel: support translating va_arg
Since (say) i128 and [16 x i8] map to the same type in generic MIR, we also
need to attach the required alignment info.

llvm-svn: 295254
2017-02-15 23:22:33 +00:00
Matt Arsenault 900b21c350 Fix typos
llvm-svn: 295246
2017-02-15 22:19:06 +00:00
Matt Arsenault 5de8dc9cf5 DAG: Do not scalarize fsub if fneg is legal
Tests will be included with future commit.

llvm-svn: 295242
2017-02-15 22:02:42 +00:00
Kyle Butt 7fbec9bdf1 Codegen: Make chains from trellis-shaped CFGs
Lay out trellis-shaped CFGs optimally.
A trellis of the shape below:

  A     B
  |\   /|
  | \ / |
  |  X  |
  | / \ |
  |/   \|
  C     D

would be laid out A; B->C ; D by the current layout algorithm. Now we identify
trellises and lay them out either A->C; B->D or A->D; B->C. This scales with an
increasing number of predecessors. A trellis is a a group of 2 or more
predecessor blocks that all have the same successors.

because of this we can tail duplicate to extend existing trellises.

As an example consider the following CFG:

    B   D   F   H
   / \ / \ / \ / \
  A---C---E---G---Ret

Where A,C,E,G are all small (Currently 2 instructions).

The CFG preserving layout is then A,B,C,D,E,F,G,H,Ret.

The current code will copy C into B, E into D and G into F and yield the layout
A,C,B(C),E,D(E),F(G),G,H,ret

define void @straight_test(i32 %tag) {
entry:
  br label %test1
test1: ; A
  %tagbit1 = and i32 %tag, 1
  %tagbit1eq0 = icmp eq i32 %tagbit1, 0
  br i1 %tagbit1eq0, label %test2, label %optional1
optional1: ; B
  call void @a()
  br label %test2
test2: ; C
  %tagbit2 = and i32 %tag, 2
  %tagbit2eq0 = icmp eq i32 %tagbit2, 0
  br i1 %tagbit2eq0, label %test3, label %optional2
optional2: ; D
  call void @b()
  br label %test3
test3: ; E
  %tagbit3 = and i32 %tag, 4
  %tagbit3eq0 = icmp eq i32 %tagbit3, 0
  br i1 %tagbit3eq0, label %test4, label %optional3
optional3: ; F
  call void @c()
  br label %test4
test4: ; G
  %tagbit4 = and i32 %tag, 8
  %tagbit4eq0 = icmp eq i32 %tagbit4, 0
  br i1 %tagbit4eq0, label %exit, label %optional4
optional4: ; H
  call void @d()
  br label %exit
exit:
  ret void
}

here is the layout after D27742:
straight_test:                          # @straight_test
; ... Prologue elided
; BB#0:                                 # %entry ; A (merged with test1)
; ... More prologue elided
	mr 30, 3
	andi. 3, 30, 1
	bc 12, 1, .LBB0_2
; BB#1:                                 # %test2 ; C
	rlwinm. 3, 30, 0, 30, 30
	beq	 0, .LBB0_3
	b .LBB0_4
.LBB0_2:                                # %optional1 ; B (copy of C)
	bl a
	nop
	rlwinm. 3, 30, 0, 30, 30
	bne	 0, .LBB0_4
.LBB0_3:                                # %test3 ; E
	rlwinm. 3, 30, 0, 29, 29
	beq	 0, .LBB0_5
	b .LBB0_6
.LBB0_4:                                # %optional2 ; D (copy of E)
	bl b
	nop
	rlwinm. 3, 30, 0, 29, 29
	bne	 0, .LBB0_6
.LBB0_5:                                # %test4 ; G
	rlwinm. 3, 30, 0, 28, 28
	beq	 0, .LBB0_8
	b .LBB0_7
.LBB0_6:                                # %optional3 ; F (copy of G)
	bl c
	nop
	rlwinm. 3, 30, 0, 28, 28
	beq	 0, .LBB0_8
.LBB0_7:                                # %optional4 ; H
	bl d
	nop
.LBB0_8:                                # %exit ; Ret
	ld 30, 96(1)                    # 8-byte Folded Reload
	addi 1, 1, 112
	ld 0, 16(1)
	mtlr 0
	blr

The tail-duplication has produced some benefit, but it has also produced a
trellis which is not laid out optimally. With this patch, we improve the layouts
of such trellises, and decrease the cost calculation for tail-duplication
accordingly.

This patch produces the layout A,C,E,G,B,D,F,H,Ret. This layout does have
back edges, which is a negative, but it has a bigger compensating
positive, which is that it handles the case where there are long strings
of skipped blocks much better than the original layout. Both layouts
handle runs of executed blocks equally well. Branch prediction also
improves if there is any correlation between subsequent optional blocks.

Here is the resulting concrete layout:

straight_test:                          # @straight_test
; BB#0:                                 # %entry ; A (merged with test1)
	mr 30, 3
	andi. 3, 30, 1
	bc 12, 1, .LBB0_4
; BB#1:                                 # %test2 ; C
	rlwinm. 3, 30, 0, 30, 30
	bne	 0, .LBB0_5
.LBB0_2:                                # %test3 ; E
	rlwinm. 3, 30, 0, 29, 29
	bne	 0, .LBB0_6
.LBB0_3:                                # %test4 ; G
	rlwinm. 3, 30, 0, 28, 28
	bne	 0, .LBB0_7
	b .LBB0_8
.LBB0_4:                                # %optional1 ; B (Copy of C)
	bl a
	nop
	rlwinm. 3, 30, 0, 30, 30
	beq	 0, .LBB0_2
.LBB0_5:                                # %optional2 ; D (Copy of E)
	bl b
	nop
	rlwinm. 3, 30, 0, 29, 29
	beq	 0, .LBB0_3
.LBB0_6:                                # %optional3 ; F (Copy of G)
	bl c
	nop
	rlwinm. 3, 30, 0, 28, 28
	beq	 0, .LBB0_8
.LBB0_7:                                # %optional4 ; H
	bl d
	nop
.LBB0_8:                                # %exit

Differential Revision: https://reviews.llvm.org/D28522

llvm-svn: 295223
2017-02-15 19:49:14 +00:00
Xinliang David Li 538d666814 include function name in dot filename
Differential Revision: http://reviews.llvm.org/D29975

llvm-svn: 295220
2017-02-15 19:21:04 +00:00