Commit Graph

142829 Commits

Author SHA1 Message Date
Daniel Berlin 49a34165d2 NewGVN: Print out DefiningAccess for both loads and stores when debugging.
llvm-svn: 290782
2016-12-31 07:34:36 +00:00
Philip Reames 0ef5d288b4 [SmallPtrSet] Introduce a find primitive and rewrite count/erase in terms of it
This was originally motivated by a compile time problem I've since figured out how to solve differently, but the cleanup seemed useful. We had the same logic - which essentially implemented find - in several places. By commoning them out, I can implement find and allow erase to be inlined at the call sites if profitable.

Differential Revision: https://reviews.llvm.org/D28183

llvm-svn: 290779
2016-12-31 02:33:22 +00:00
Dylan McKay 97cf837b46 [AVR] Optimize 16-bit ANDs with '1'
Summary: Fixes PR 31345

Reviewers: dylanmckay

Subscribers: fhahn, llvm-commits

Differential Revision: https://reviews.llvm.org/D28186

llvm-svn: 290778
2016-12-31 01:07:14 +00:00
Craig Topper d00db69227 [InstCombine][AVX-512] Teach InstCombine that llvm.x86.avx512.vcomi.sd and llvm.x86.avx512.vcomi.ss don't use the upper elements of their input.
This was already done for the SSE/SSE2 version of the intrinsics.

llvm-svn: 290776
2016-12-31 00:45:06 +00:00
Craig Topper 991636312b [InstCombine][AVX-512] When turning intrinsics with masking into native IR, don't emit a select if the mask is known to be all ones.
This saves InstCombine the burden of having to optimize the select later.

llvm-svn: 290774
2016-12-30 23:06:28 +00:00
Simon Pilgrim c5fde8d748 [X86][AVX512DQ] Add truncated math tests for AVX512DQ.
llvm-svn: 290772
2016-12-30 22:43:41 +00:00
Simon Pilgrim 85af973506 [X86][SSE] Fix truncated math test names.
Inconsistent naming convention and wrong name for some input/output types.

llvm-svn: 290771
2016-12-30 22:40:32 +00:00
Simon Pilgrim 712374169d [X86][AVX512] Regenerate test - missing shuffle comments
llvm-svn: 290770
2016-12-30 22:31:33 +00:00
Philip Reames fac031a178 Add a comment for a todo in LoopUnroll post cleanup
llvm-svn: 290769
2016-12-30 22:10:19 +00:00
Philip Reames fdbb05b469 [LVI] Remove count/erase idiom in favor of checking result value of erase
Minor compile time win.  Avoids an additional O(N) scan in the case where we are removing an element and costs nothing when we aren't.

llvm-svn: 290768
2016-12-30 22:09:10 +00:00
Florian Hahn e7407ba1ef [doc] Clarify steps for contributors without commit access.
Summary: Update the Phabricator docs to clarify how changes are merged for contributors without commit access. 

Reviewers: delcypher, aaron.ballman

Subscribers: aaron.ballman, anmol, llvm-commits

Differential Revision: https://reviews.llvm.org/D28184

llvm-svn: 290767
2016-12-30 21:28:30 +00:00
Saleem Abdulrasool de9f00eecd DebugInfo: change the PDB UniqueId type to uint8_t
Since we type-erase the Windows GUID structure, use unsigned bytes
rather than char, which may be signed (-fsigned-char).  NFC

llvm-svn: 290765
2016-12-30 19:42:13 +00:00
Piotr Padlewski da36215017 [MemDep] Handle gep with zeros for invariant.group
Summary:
gep 0, 0 is equivalent to bitcast. LLVM canonicalizes it
to getelementptr because it make SROA can then handle it.

Simple case like

    void g(A &a) {
        z(a);
        if (glob)
            a.foo();
    }
    void testG() {
        A a;
        g(a);
    }

was not devirtualized with -fstrict-vtable-pointers because luck of
handling for gep 0 in Memory Dependence Analysis

Reviewers: dberlin, nlewycky, chandlerc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28126

llvm-svn: 290763
2016-12-30 18:45:07 +00:00
Philip Reames a570a2303c [CVP] Adjust iteration order to reduce the amount of work required
CVP doesn't care about the order of blocks visited, but by using a pre-order traversal over the graph we can a) not visit unreachable blocks and b) optimize as we go so that analysis of later blocks produce slightly more precise results.

I noticed this via inspection and don't have a concrete example which points to the issue.  

llvm-svn: 290760
2016-12-30 18:00:55 +00:00
Philip Reames 1e48efcfc5 [LVI] Manually hoist computation from loop
Minor compile time win.  Not known to be a hot spot, just something I noticed while reading.

llvm-svn: 290759
2016-12-30 17:56:47 +00:00
Aaron Ballman 58a61e723e Caught a simple typo. I do not know of a way to test this, but it seems like an unlikely thing to regress in the future.
llvm-svn: 290757
2016-12-30 15:57:56 +00:00
Davide Italiano 75e39f9790 [NewGVN] Remove unneeded newline from assertion message.
llvm-svn: 290755
2016-12-30 15:01:17 +00:00
Abhilash Bhandari a8d45de6ce [ADT] Fix for compilation error when operator++(int) (post-increment function) of SmallPtrSetIterator is used.
The bug was introduced in r289619.

Reviewers: Mehdi Amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28134

llvm-svn: 290749
2016-12-30 12:34:36 +00:00
David Majnemer 5ec5f278c9 [InstCombine] Address post-commit feedback
llvm-svn: 290741
2016-12-30 03:36:17 +00:00
Mehdi Amini e2770c0b80 Fix test change in r290736: restore index generation
I remove one extra line, but because annoyingly llvm-lit does not
clean the output directory before running the test, it didn't fail
locally (the file was present from a previous run).

llvm-svn: 290740
2016-12-30 01:15:50 +00:00
Kostya Serebryany 11a22bc39d [libFuzzer] cleaner implementation of -print_pcs=1
llvm-svn: 290739
2016-12-30 01:13:07 +00:00
Michael Kuperstein 76e06c8858 [LICM] When promoting scalars, allow inserting stores to thread-local allocas.
This is similar to the allocfn case - if an alloca is not captured, then it's
necessarily thread-local.

Differential Revision: https://reviews.llvm.org/D28170

llvm-svn: 290738
2016-12-30 01:03:17 +00:00
Dehao Chen cc76344ef5 Use continuous boosting factor for complete unroll.
Summary:
The current loop complete unroll algorithm checks if unrolling complete will reduce the runtime by a certain percentage. If yes, it will apply a fixed boosting factor to the threshold (by discounting cost). The problem for this approach is that the threshold abruptly. This patch makes the boosting factor a function of runtime reduction percentage, capped by a fixed threshold. In this way, the threshold changes continuously.

The patch also simplified the code by reducing one parameter in UP.

The patch only affects code-gen of two speccpu2006 benchmark:

445.gobmk binary size decreases 0.08%, no performance change.
464.h264ref binary size increases 0.24%, no performance change.

Reviewers: mzolotukhin, chandlerc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26989

llvm-svn: 290737
2016-12-30 00:50:28 +00:00
Mehdi Amini 30a9b6bb4e Replace test from using llvm-lto to use llvm-link (NFC)
Some incoming changes in ThinLTO will break this test.
Instead of relying on the heuristic to import, we
force the importing to happen with llvm-link.

llvm-svn: 290736
2016-12-30 00:45:26 +00:00
Michael Kuperstein 4a86a1921a [LICM] Remove unneeded tracking of whether changes were made. NFC.
"Changed" doesn't actually change within the loop, so there's
no reason to keep track of it - we always return false during
analysis and true after the transformation is made.

llvm-svn: 290735
2016-12-30 00:43:22 +00:00
Michael Kuperstein 62b98c3977 [LICM] Make logic in promoteLoopAccessesToScalars easier to follow. NFC.
llvm-svn: 290734
2016-12-30 00:39:00 +00:00
David Majnemer a1cfd7c5f8 [InstCombine] More thoroughly canonicalize the position of zexts
We correctly canonicalized (add (sext x), (sext y)) to (sext (add x, y))
where possible.  However, we didn't perform the same canonicalization
for zexts or for muls.

llvm-svn: 290733
2016-12-30 00:28:58 +00:00
Dylan McKay 453d042969 [AVR] Optimize 16-bit ORs with '0'
Summary: Fixes PR 31344

Authored by Anmol P. Paralkar

Reviewers: dylanmckay

Subscribers: fhahn, llvm-commits

Differential Revision: https://reviews.llvm.org/D28121

llvm-svn: 290732
2016-12-30 00:21:56 +00:00
Reid Kleckner 0e7c84c682 Simplify FunctionLoweringInfo.cpp with range for loops
I'm preparing to add some pattern matching code here, so simplify the
code before I do. NFC

llvm-svn: 290731
2016-12-30 00:21:38 +00:00
Reid Kleckner e8ee89f8b0 Include <algorithm> for std::max etc
llvm-svn: 290730
2016-12-30 00:15:40 +00:00
Michael Kuperstein ff36baefe7 [LICM] Compute exit blocks for promotion eagerly. NFC.
This moves the exit block and insertion point computation to be eager,
instead of after seeing the first scalar we can promote.

The cost is relatively small (the computation happens anyway, see discussion
on D28147), and the code is easier to follow, and can bail out earlier
if there's a catchswitch present.

llvm-svn: 290729
2016-12-29 23:11:19 +00:00
Michael Kuperstein 5566092963 [LICM] Don't try to promote in loops where we have no chance to promote. NFC.
We would check whether we have a prehader *or* dedicated exit blocks,
and go into the promotion loop. Then, for each alias set we'd check
if we have a preheader *and* dedicated exit blocks, and bail if not.

Instead, bail immediately if we don't have both.

llvm-svn: 290728
2016-12-29 22:51:22 +00:00
Michael Kuperstein b6da9cf3b7 [LICM] Only recompute LCSSA when we actually promoted something.
We want to recompute LCSSA only when we actually promoted a value.
This means we only need to look at changes made by promotion when
deciding whether to recompute it or not, not at regular sinking/hoisting.

(This was what the code was documented as doing, just not what it did)

Hopefully NFC.

llvm-svn: 290726
2016-12-29 22:37:13 +00:00
Daniel Berlin e0bd37e78f NewGVN: Fix PR 31491 by ensuring that we touch the right instructions. Change to one based numbering so we can assert we don't cause the same bug again.
llvm-svn: 290724
2016-12-29 22:15:12 +00:00
Craig Topper ea03513332 [Analysis] Remove repeated text from a comment. NFC
llvm-svn: 290723
2016-12-29 21:48:28 +00:00
Bryant Wong 507256b287 Fix indentation in r290716.
Use two-space indentation like the rest of the file.

llvm-svn: 290722
2016-12-29 20:05:51 +00:00
Justin Lebar 7cc6059058 [ADT] Rewrite IntrusiveRefCntPtr's comments. NFC
Edit for voice, and also add examples.  In particular, add an
explanation for why you might want to specialize IntrusiveRefCntPtrInfo,
which is not obvious.

llvm-svn: 290720
2016-12-29 19:59:38 +00:00
Justin Lebar 2d5622596a [ADT] Rename RefCountedBase::ref_cnt to RefCount. NFC
This makes it comply with the LLVM style guide, and also makes it
consistent with ThreadSafeRefCountedBase below.

llvm-svn: 290719
2016-12-29 19:59:34 +00:00
Justin Lebar a27accfe03 [ADT] clang-format IntrusiveRefCntrPtr.h. NFC
This file had some strange indentation.

Also remove some unnecessary whitespace between one-line member
functions.

llvm-svn: 290718
2016-12-29 19:59:30 +00:00
Justin Lebar 175ab74dc5 [ADT] Delete RefCountedBaseVPTR.
Summary:
This class is unnecessary.

Its comment indicated that it was a compile error to allocate an
instance of a class that inherits from RefCountedBaseVPTR on the stack.
This may have been true at one point, but it's not today.

Moreover you really do not want to allocate *any* refcounted object on
the stack, vptrs or not, so if we did have a way to prevent these
objects from being stack-allocated, we'd want to apply it to regular
RefCountedBase too, obviating the need for a separate RefCountedBaseVPTR
class.

It seems that the main way RefCountedBaseVPTR provides safety is by
making its subclass's destructor virtual.  This may have been helpful at
one point, but these days clang will emit an error if you define a class
with virtual functions that inherits from RefCountedBase but doesn't
have a virtual destructor.

Reviewers: compnerd, dblaikie

Subscribers: cfe-commits, klimek, llvm-commits, mgorny

Differential Revision: https://reviews.llvm.org/D28162

llvm-svn: 290717
2016-12-29 19:59:26 +00:00
Bryant Wong 291264b612 Correctly handle multi-lined RUN lines.
`utils/update_{llc_test,test}_checks` ought to be able to handle RUN commands
that span multiple lines, as shown in the example at
http://llvm.org/docs/CommandGuide/FileCheck.html#the-filecheck-check-prefix-option

Differential Revision: https://reviews.llvm.org/D26523

llvm-svn: 290716
2016-12-29 19:32:34 +00:00
Justin Lebar 25eeb38acc [ADT] Use memcpy for type punning in MathExtras.
Summary: Previously we type-punned through a union, which is not safe.

Reviewers: rnk

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28161

llvm-svn: 290715
2016-12-29 18:15:34 +00:00
Reid Kleckner cd46c1df80 Revert "[COFF] Use 32-bit jump table entries in .rdata for Win64"
This reverts commit r290694. It broke sanitizer tests on Win64. I'll
probably bring this back, but the jump tables will just live in .text
like they do for MSVC.

llvm-svn: 290714
2016-12-29 17:07:10 +00:00
Sanjoy Das 00d76a5754 [TBAAVerifier] Be stricter around verifying scalar nodes
This fixes the issue exposed in PR31393, where we weren't trying
sufficiently hard to diagnose bad TBAA metadata.

This does reduce the variety in the error messages we print out, but I
think the tradeoff of verifying more, simply and quickly overrules the
need for more helpful error messags here.

llvm-svn: 290713
2016-12-29 15:47:05 +00:00
Sanjoy Das 600d2a5a6b [TBAAVerifier] Make things const-consistent; NFC
llvm-svn: 290712
2016-12-29 15:47:01 +00:00
Sanjoy Das 55f12d9de9 [TBAAVerifier] Memoize validity of scalar tbaa nodes; NFCI
llvm-svn: 290711
2016-12-29 15:46:57 +00:00
Artem Tamazov 25478d821b [AMDGPU][mc] Enable absolute expressions in .hsa_code_object_isa directive
Among other stuff, this allows to use predefined .option.machine_version_major
/minor/stepping symbols in the directive.

Relevant test expanded at once (also file renamed for clarity).

Differential Revision: https://reviews.llvm.org/D28140

llvm-svn: 290710
2016-12-29 15:41:52 +00:00
Igor Laevsky fedab1572d Fix documentation generator warnings after rL290708.
llvm-svn: 290709
2016-12-29 15:08:57 +00:00
Igor Laevsky 4f31e52f94 Introduce element-wise atomic memcpy intrinsic
This change adds a new intrinsic which is intended to provide memcpy functionality
with additional atomicity guarantees. Please refer to the review thread
or language reference for further details.

Differential Revision: https://reviews.llvm.org/D27133

llvm-svn: 290708
2016-12-29 14:31:07 +00:00
Craig Topper 17b5568bc7 [InstCombine] Use getVectorNumElements instead of explicitly casting to VectorType and calling getNumElements. NFC
llvm-svn: 290707
2016-12-29 07:03:18 +00:00
Craig Topper 62f06e241b [InstCombine] Fix typo in comment. NFC
llvm-svn: 290706
2016-12-29 05:38:31 +00:00
Craig Topper 2e18bcfc60 [InstCombine] Use a 32-bits instead of 64-bits for storing the number of elements in VectorType for a ShuffleVector. While there getVectorNumElements to avoid an explicit cast. NFC
llvm-svn: 290705
2016-12-29 04:24:32 +00:00
Craig Topper 1a8a3377cc [InstCombine][X86] If the lowest element of a scalar intrinsic isn't used make sure we add it to the worklist so we can DCE it sooner.
We bypassed the intrinsic and returned the passthru operand, but we should also add the intrinsic to the worklist since its now dead. This can allow DCE to find it sooner and remove it. Similar was done for InsertElement when the inserted element isn't demanded.

llvm-svn: 290704
2016-12-29 03:30:17 +00:00
Kostya Serebryany d723804fa2 [libFuzzer] make __sanitizer_cov_trace_switch more predictable
llvm-svn: 290703
2016-12-29 02:50:35 +00:00
Craig Topper b57a84dace [InstCombine] Fix some of the AVX-512 scalar arithmetic test cases to do a better job of testing what they intended to test.
The accidentally had trivially dead code. Also needed to adjust the rounding mode to not CUR_DIRECTION so the intrinsics don't get converted to native operations before going through SimplifyDemandedVectorElts.

llvm-svn: 290702
2016-12-29 02:29:04 +00:00
Mehdi Amini fce3af0192 Remove BitstreamWriter::Emit64(), it was never called (NFC)
llvm-svn: 290701
2016-12-29 01:40:53 +00:00
Reid Kleckner 32f171fec4 Fix mingw build by moving the static const data member before the bitfields
Apparently GCC targeting Windows breaks bitfields on static data members:
  struct Foo {
    unsigned X : 16;
    static const int M = 42;
    unsigned Y : 16;
  };
  static_assert(sizeof(Foo) == 4, "asdf"); // fails

Who knew.

llvm-svn: 290700
2016-12-29 01:14:41 +00:00
Daniel Berlin 6658cc9ead NewGVN: Sort Dominator Tree in RPO order, and use that for generating order.
Summary:
The optimal iteration order for this problem is RPO order. We want to
process as many preds of a backedge as we can before we process the
backedge.

At the same time, as we add predicate handling, we want to be able to
touch instructions that are dominated by a given block by
ranges (because a change in value numbering a predicate possibly
affects all users we dominate that are using that predicate).
If we don't do it this way, we can't do value inference over
backedges (the paper covers this in depth).

The newgvn branch currently overshoots the last part, and guarantees
that it will touch *at least* the right set of instructions, but it
does touch more.  This is because the bitvector instruction ranges are
currently generated in RPO order (so we take the max and the min of
the ranges of dominated blocks, which means there are some in the
middle we didn't have to touch that we did).

We can do better by sorting the dominator tree, and then just using
dominator tree order.

As a preliminary, the dominator tree has some RPO guarantees, but not
enough. It guarantees that for a given node, your idom must come
before you in the RPO ordering. It guarantees no relative RPO ordering
for siblings.  We add siblings in whatever order they appear in the module.

So that is what we fix.

We sort the children array of the domtree into RPO order, and then use
the dominator tree for ordering, instead of RPO, since the dominator
tree is now a valid RPO ordering.

Note: This would help any other pass that iterates a forward problem
in dominator tree order.  Most of them are single pass.  It will still
maximize whatever result they compute.  We could also build the
dominator tree in this order, but our incremental updates would still
put it out of sort order, and recomputing the sort order is almost as
hard as general incremental updates of the domtree.

Also note that the sorting does not affect any tests, etc. Nothing
depends on domtree order, including the verifier, the equals
functions for domtree nodes, etc.

How much could this matter, you ask?
Here are the current numbers.
This is generated by running NewGVN over all files in LLVM.

Note that once we propagate equalities, the differences go up by an
order of magnitude or two (IE instead of 29, the max ends up in the
thousands, since the worst case we add a factor of N, where N is the
number of branch predicates).  So while it doesn't look that stark for
the default ordering, it gets *much much* worse.  There are also
programs in the wild where the difference is already pretty stark
(2 iterations vs hundreds).

RPO ordering:
759040 Number of iterations is 1
112908 Number of iterations is 2

Default dominator tree ordering:
755081 Number of iterations is 1
116234 Number of iterations is 2
   603 Number of iterations is 3
    27 Number of iterations is 4
     2 Number of iterations is 5
     1 Number of iterations is 7

Dominator tree sorted:
759040 Number of iterations is 1
112908 Number of iterations is 2
<yay!>

Really bad ordering (sort domtree siblings in postorder. not quite the
worst possible, but yeah):
754008 Number of iterations is 1
    21 Number of iterations is 10
     8 Number of iterations is 11
     6 Number of iterations is 12
     5 Number of iterations is 13
     2 Number of iterations is 14
     2 Number of iterations is 15
     3 Number of iterations is 16
     1 Number of iterations is 17
     2 Number of iterations is 18
 96642 Number of iterations is 2
     1 Number of iterations is 20
     2 Number of iterations is 21
     1 Number of iterations is 22
     1 Number of iterations is 29
 17266 Number of iterations is 3
  2598 Number of iterations is 4
   798 Number of iterations is 5
   273 Number of iterations is 6
   186 Number of iterations is 7
    80 Number of iterations is 8
    42 Number of iterations is 9

Reviewers: chandlerc, davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28129

llvm-svn: 290699
2016-12-29 01:12:36 +00:00
Reid Kleckner e9c8d7f87b Add a static_assert about the sizeof(GlobalValue)
I added one for Value back in r262045, and I'm starting to think we
should have these for any class with bitfields whose memory efficiency
really matters.

llvm-svn: 290698
2016-12-29 00:55:51 +00:00
Daniel Berlin 7ad1ea0984 Update equalsStoreHelper for the fact that only one branch can be true
llvm-svn: 290697
2016-12-29 00:49:32 +00:00
Justin Lebar ddece375a1 [GlobalValue] Move HasLLVMReservedName into existing bitfield. NFC
Summary:
Follow-up to r290691, where I introduced HasLLVMReservedName.  rnk
pointed out that that patch added an extra word to GlobalValue on MSVC,
because it doesn't pack bitfields with different types.

This patch moves HasLLVMReservedName into the existing bitfield, where
we appear to have plenty of bits to spare.

Reviewers: rnk

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28149

llvm-svn: 290696
2016-12-29 00:30:46 +00:00
Justin Lebar 23a53501a4 [IR] Clarify that Value::getName() is not actually cheap.
It involves a hashtable lookup when the Value has a name.

llvm-svn: 290695
2016-12-29 00:30:42 +00:00
Reid Kleckner c9e0a153cf [COFF] Use 32-bit jump table entries in .rdata for Win64
Summary:
We were already using 32-bit jump table entries, but this was a
consequence of the default PIC model on Win64, and not an intentional
design decision. This patch ensures that we always use 32-bit label
difference jump table entries on Win64 regardless of the PIC model. This
is a good idea because it saves executable size and object file size.

Moving the jump tables to .rdata cleans up the disassembled object code
and reduces the available ROP targets, but it requires adding one more
RIP-relative lea to the code.  COFF doesn't have relocations to express
the difference between two arbitrary symbols, so we can't use the jump
table label in the label difference like we do elsewhere.

Fixes PR31488

Reviewers: majnemer, compnerd

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28141

llvm-svn: 290694
2016-12-29 00:12:39 +00:00
Mehdi Amini 5022bb7238 Change Metadata Index emission in the bitcode to use 2x32 bits for the placeholder
The Bitstream reader and writer are limited to handle a "size_t" at
most, which means that we can't backpatch and read back a 64bits
value on 32 bits platform.

llvm-svn: 290693
2016-12-28 23:45:54 +00:00
Piotr Padlewski 6c37d298d9 Revert "[NewGVN] replace emplace_back with push_back"
llvm-svn: 290692
2016-12-28 23:24:02 +00:00
Justin Lebar 291abd3ebb Speed up Function::isIntrinsic() by adding a bit to GlobalValue. NFC
Summary:
Previously isIntrinsic() called getName().  This involves a hashtable
lookup, so is nontrivially expensive.  And isIntrinsic() is called
frequently, particularly by dyn_cast<IntrinsicInstr>.

This patch steals a bit of IntID and uses that to store whether or not
getName() starts with "llvm."

Reviewers: bogner, arsenm, joker-eph

Subscribers: sanjoy, llvm-commits

Differential Revision: https://reviews.llvm.org/D22949

llvm-svn: 290691
2016-12-28 22:59:45 +00:00
Mehdi Amini e98f925834 Add an index for Module Metadata record in the bitcode
This index record the position for each metadata record in
the bitcode, so that the reader will be able to lazy-load
on demand each individual record.

We also make sure that every abbrev is emitted upfront so
that the block can be skipped while reading.

I don't plan to commit this before having the reader
counterpart, but I figured this can be reviewed mostly
independently.

Recommit r290684 (was reverted in r290686 because a test
was broken) after adding a threshold to avoid emitting
the index when unnecessary (little amount of metadata).
This optimization "hides" a limitation of the ability
to backpatch in the bitstream: we can only backpatch
safely when the position has been flushed. So if we emit
an index for one metadata, it is possible that (part of)
the offset placeholder hasn't been flushed and the backpatch
will fail.

Differential Revision: https://reviews.llvm.org/D28083

llvm-svn: 290690
2016-12-28 22:30:28 +00:00
Saleem Abdulrasool 2b59eca1f7 Revert "Add an index for Module Metadata record in the bitcode"
This reverts commit a0ca6ae2d38339e4ede0dfa588086fc23d87e836.  Revert at
Mehdi's request as it is breaking bots.

llvm-svn: 290686
2016-12-28 20:37:22 +00:00
Piotr Padlewski 629a7f2cc0 [NewGVN] replace emplace_back with push_back
emplace_back is not faster if it is equivalent to push_back. In this cases emplaced value had the
same type that the one stored in container. It is ugly and it might be even slower (see
Scott Meyers presentation about emplacement).

llvm-svn: 290685
2016-12-28 20:36:08 +00:00
Mehdi Amini 32ca148198 Add an index for Module Metadata record in the bitcode
Summary:
This index record the position for each metadata record in
the bitcode, so that the reader will be able to lazy-load
on demand each individual record.

We also make sure that every abbrev is emitted upfront so
that the block can be skipped while reading.

I don't plan to commit this before having the reader
counterpart, but I figured this can be reviewed mostly
independently.

Reviewers: pcc, tejohnson

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28083

llvm-svn: 290684
2016-12-28 19:44:19 +00:00
Piotr Padlewski 26dada79ff [NewGVN] Simplyfy loop NFC
llvm-svn: 290683
2016-12-28 19:42:49 +00:00
Mehdi Amini cc7fbf718d [ThinLTO] Honor -O{0,1,2,4} passed through the libLTO interface for ThinLTO
This was hardcoded to be O3 till now, without any way to change it
without changing the code.

llvm-svn: 290682
2016-12-28 19:37:16 +00:00
Piotr Padlewski e4047b89ad [NewGVN] replace typedefs with usings
llvm-svn: 290680
2016-12-28 19:29:26 +00:00
Piotr Padlewski fc5727b2a2 [NewGVN] NFC fixes
llvm-svn: 290679
2016-12-28 19:17:17 +00:00
Reid Kleckner 92647369fc [WinEH] Don't assume endFunction is called while in .text
Jump table emission can switch to .rdata before
WinException::endFunction gets called. Just remember the appropriate
text section we started in and reset back to it when we end the
function. We were already switching sections back from .xdata anyway.

Fixes the first problem in PR31488, so that now COFF switch tables can
live in .rdata if we want them to.

llvm-svn: 290678
2016-12-28 19:05:12 +00:00
Davide Italiano 0e71480523 [NewGVN] Global sweep replacing NULL with nullptr. NFCI.
llvm-svn: 290670
2016-12-28 14:00:11 +00:00
Davide Italiano 0fb3c7cde5 [NewGVN] Remove redundant code. NFCI.
llvm-svn: 290669
2016-12-28 13:54:16 +00:00
Davide Italiano b111409015 [NewGVN] equals() for loads/stores is the same. Unify.
Differential Revision:  https://reviews.llvm.org/D28116

llvm-svn: 290667
2016-12-28 13:37:17 +00:00
Chandler Carruth 05ca5acc9e [PM] Introduce a devirtualization iteration layer for the new PM.
This is an orthogonal and separated layer instead of being embedded
inside the pass manager. While it adds a small amount of complexity, it
is fairly minimal and the composability and control seems worth the
cost.

The logic for this ends up being nicely isolated and targeted. It should
be easy to experiment with different iteration strategies wrapped around
the CGSCC bottom-up walk using this kind of facility.

The mechanism used to track devirtualization is the simplest one I came
up with. I think it handles most of the cases the existing iteration
machinery handles, but I haven't done a *very* in depth analysis. It
does however match the basic intended semantics, and we can tweak or
tune its exact behavior incrementally as necessary. One thing that we
may want to revisit is freshly building the value handle set on each
iteration. While I don't think this will be a significant cost (it is
strictly fewer value handles but more churn of value handes than the old
call graph), it is conceivable that we'll want a somewhat more clever
tracking mechanism. My hope is to layer that on as a follow up patch
with data supporting any implementation complexity it adds.

This code also provides for a basic count heuristic: if the number of
indirect calls decreases and the number of direct calls increases for
a given function in the SCC, we assume devirtualization is responsible.
This matches the heuristics currently used in the legacy pass manager.

Differential Revision: https://reviews.llvm.org/D23114

llvm-svn: 290665
2016-12-28 11:07:33 +00:00
Chandler Carruth 443e57e01d [PM] Teach the CGSCC's CG update utility to more carefully invalidate
analyses when we're about to break apart an SCC.

We can't wait until after breaking apart the SCC to invalidate things:
1) Which SCC do we then invalidate? All of them?
2) Even if we invalidate all of them, a newly created SCC may not have
   a proxy that will convey the invalidation to functions!

Previously we only invalidated one of the SCCs and too late. This led to
stale analyses remaining in the cache. And because the caching strategy
actually works, they would get used and chaos would ensue.

Doing invalidation early is somewhat pessimizing though if we *know*
that the SCC structure won't change. So it turns out that the design to
make the mutation API force the caller to know the *kind* of mutation in
advance was indeed 100% correct and we didn't do enough of it. So this
change also splits two cases of switching a call edge to a ref edge into
two separate APIs so that callers can clearly test for this and take the
easy path without invalidating when appropriate. This is particularly
important in this case as we expect most inlines to be between functions
in separate SCCs and so the common case is that we don't have to so
aggressively invalidate analyses.

The LCG API change in turn needed some basic cleanups and better testing
in its unittest. No interesting functionality changed there other than
more coverage of the returned sequence of SCCs.

While this seems like an obvious improvement over the current state, I'd
like to revisit the core concept of invalidating within the CG-update
layer at all. I'm wondering if we would be better served forcing the
callers to handle the invalidation beforehand in the cases that they
can handle it. An interesting example is when we want to teach the
inliner to *update and preserve* analyses. But we can cross that bridge
when we get there.

With this patch, the new pass manager an build all of the LLVM test
suite at -O3 and everything passes. =D I haven't bootstrapped yet and
I'm sure there are still plenty of bugs, but this gives a nice baseline
so I'm going to increasingly focus on fleshing out the missing
functionality, especially the bits that are just turned off right now in
order to let us establish this baseline.

llvm-svn: 290664
2016-12-28 10:34:50 +00:00
Gadi Haber 19c4fc5e62 This is a large patch for X86 AVX-512 of an optimization for reducing code size by encoding EVEX AVX-512 instructions using the shorter VEX encoding when possible.
There are cases of AVX-512 instructions that have two possible encodings. This is the case with instructions that use vector registers with low indexes of 0 - 15 and do not use the zmm registers or the mask k registers.
The EVEX encoding prefix requires 4 bytes whereas the VEX prefix can take only up to 3 bytes. Consequently, using the VEX encoding for these instructions results in a code size reduction of ~2 bytes even though it is compiled with the AVX-512 features enabled.

Reviewers: Craig Topper, Zvi Rackoover, Elena Demikhovsky 
Differential Revision: https://reviews.llvm.org/D27901

llvm-svn: 290663
2016-12-28 10:12:48 +00:00
Chandler Carruth 9900d18bab [PM] Teach the inliner's call graph update to handle inserting new edges
when they are call edges at the leaf but may (transitively) be reached
via ref edges.

It turns out there is a simple rule: insert everything as a ref edge
which is a safe conservative default. Then we let the existing update
logic handle promoting some of those to call edges.

Note that it would be fairly cheap to make these call edges right away
if that is desirable by testing whether there is some existing call path
from the source to the target. It just seemed like slightly more
complexity in this code path that isn't strictly necessary. If anyone
feels strongly about handling this differently I'm happy to change it.

llvm-svn: 290649
2016-12-28 03:13:12 +00:00
Craig Topper 28ec3460e4 [InstCombine] Remove a piece of a comment that said that InstCombiner contains pass infrastructure. That hasn't been true since r226618. NFC
llvm-svn: 290648
2016-12-28 03:12:42 +00:00
Chandler Carruth 69c5cc69ed [PM] Actually commit the test update that was supposed to accompany
r290644. Sorry for this.

llvm-svn: 290646
2016-12-28 02:31:24 +00:00
Chandler Carruth c6334579e9 [LCG] Teach the ref edge removal to handle a ref edge that is trivial
due to a call cycle.

This actually crashed the ref removal before.

I've added a unittest that covers this kind of interesting graph
structure and mutation.

llvm-svn: 290645
2016-12-28 02:24:58 +00:00
Chandler Carruth e635289ee2 [PM] Disable the loop vectorizer from the new PM's pipeline as it
currenty relies on the old PM's dependency system forming LCSSA.

The new PM will require a different design for this, and for now this is
causing most of the issues I'm currently seeing in testing. I'd like to
get to a testable baseline and then work on re-enabling things one at
a time.

llvm-svn: 290644
2016-12-28 02:24:55 +00:00
Michael Kuperstein cd7ad7130f [InstCombine] Canonicalize insert splat sequences into an insert + shuffle
This adds a combine that canonicalizes a chain of inserts which broadcasts
a value into a single insert + a splat shufflevector.

This fixes PR31286.

Differential Revision: https://reviews.llvm.org/D27992

llvm-svn: 290641
2016-12-28 00:18:08 +00:00
Kostya Serebryany 2a8440df70 [libFuzzer] add an experimental flag -experimental_len_control=1 that sets max_len to 1M and tries to increases the actual max sizes of mutations very gradually (second attempt)
llvm-svn: 290637
2016-12-27 23:24:55 +00:00
Eric Fiselier aa54e50105 Mark comparator call operator as const
llvm-svn: 290636
2016-12-27 23:15:58 +00:00
Kostya Serebryany 8d75c78d4c [libFuzzer] don't create large random mutations when given an empty seed
llvm-svn: 290634
2016-12-27 22:15:04 +00:00
Kostya Serebryany f24e52c0c2 [sanitizer-coverage] sort the switch cases
llvm-svn: 290628
2016-12-27 21:20:06 +00:00
Hemant Kulkarni e60b294be8 llvm-readobj: ELF: Make DT tags machine aware
llvm-svn: 290623
2016-12-27 19:59:29 +00:00
Kostya Serebryany 823c18147d [libFuzzer] fix UB and simplify the computation of the RNG seed (https://llvm.org/bugs/show_bug.cgi?id=31456)
llvm-svn: 290622
2016-12-27 19:51:34 +00:00
Chandler Carruth e14524ca30 [PM] Teach MemDep to invalidate its result object when its cached
analysis handles become invalid.

Add a test case for its invalidation logic.

llvm-svn: 290620
2016-12-27 19:33:04 +00:00
Saleem Abdulrasool ecf13bbd84 DebugInfo: add explicit casts for -Wqual-cast
Fix a warning detected by gcc 6:
  warning: cast from type 'const void*' to type 'uint8_t* {aka unsigned char*}' casts away qualifiers [-Wcast-qual]

llvm-svn: 290618
2016-12-27 18:35:24 +00:00
Saleem Abdulrasool 1799567f12 ASMParser: use range-based for loops (NFC)
Convert the verify method to use a few more range based for loops,
converting to const iterators in the process.

llvm-svn: 290617
2016-12-27 18:35:22 +00:00
Saleem Abdulrasool 0ce0dc250c test: modernise ARM CodeGen tests
Replace the use of grep with FileCheck.  Tidy up some of the tests.  A
few of the tests have been left as weak as previously, though some have
been made more stringent.

llvm-svn: 290616
2016-12-27 18:35:19 +00:00
Davide Italiano b222549dc5 [NewGVN] Simplify a bit removing else after return. NFCI.
llvm-svn: 290615
2016-12-27 18:15:39 +00:00
Chandler Carruth 56fe48b7e4 [PM] Remove a pointless optimization.
There is no need to do this within an analysis. That method shouldn't
even be reached if this predicate holds as the actual useful
optimization is in the analysis manager itself.

llvm-svn: 290614
2016-12-27 18:04:11 +00:00
Chad Rosier b1ea99a956 Attempt to make the Windows bots green after r290609.
llvm-svn: 290613
2016-12-27 18:02:27 +00:00
Chandler Carruth 7a73eabf64 [PM] Add more dedicated testing to cover the invalidation logic added to
BasicAA in r290603.

I've kept the basic testing in the new PM test file as that also covers
the AAManager invalidation logic. If/when there is a good place for
broader AA testing it could move there.

This test is somewhat unsatisfying as I can't get it to fail even with
ASan outside of explicit checks of the invalidation. Apparently we don't
yet have any test coverage of the BasicAA code paths using either the
domtree or loopinfo -- I made both of them always be null and check-llvm
passed.

llvm-svn: 290612
2016-12-27 17:59:22 +00:00
Bryant Wong 7cb744621b [MemCpyOpt] Don't sink LoadInst below possible clobber.
Differential Revision: https://reviews.llvm.org/D26811

llvm-svn: 290611
2016-12-27 17:58:12 +00:00
Teresa Johnson e0ee5cf7c8 [ThinLTO] Fix "||" vs "|" mixup.
The effect of the bug was that we would incorrectly create summaries
for global and weak values defined in module asm (since we were
essentially testing for bit 1 which is SF_Undefined, and the
RecordStreamer ignores local undefined references). This would have
resulted in conservatively disabling importing of anything referencing
globals and weaks defined in module asm. Added these cases to the test
which now fails without this bug fix.

Fixes PR31459.

llvm-svn: 290610
2016-12-27 17:45:09 +00:00
Chad Rosier 2ff37b8615 [AArch64][AsmParser] Add support for parsing shift/extend operands with symbols.
Differential Revision: https://reviews.llvm.org/D27953

llvm-svn: 290609
2016-12-27 16:58:09 +00:00
Artem Tamazov a01cce8887 [AMDGPU][llvm-mc] Predefined symbols to access register counts (.kernel.{v|s}gpr_count)
The feature allows for conditional assembly, filling the entries
of .amd_kernel_code_t etc.

Symbols are defined with value 0 at the beginning of each kernel scope.
After each register usage, the respective symbol is set to:
	value = max( value, ( register index + 1 ) )
Thus, at the end of scope the value represents a count of used registers.

Kernel scopes begin at .amdgpu_hsa_kernel directive, end at the
next .amdgpu_hsa_kernel (or EOF, whichever comes first). There is also
dummy scope that lies from the beginning of source file til the
first .amdgpu_hsa_kernel.

Test added.

Differential Revision: https://reviews.llvm.org/D27859

llvm-svn: 290608
2016-12-27 16:00:11 +00:00
Piotr Padlewski 2202aa9765 [MemDep] Operand visited twice bugfix
Because operand was not marked as seen it was visited twice.
It doesn't change behavior of optimization, it just saves redudant
visit, so no test changes.

llvm-svn: 290607
2016-12-27 15:06:07 +00:00
Eugene Leviant 5240a305a4 RuntimeDyldELF: refactor AArch64 relocations. NFC.
llvm-svn: 290606
2016-12-27 13:33:32 +00:00
Eugene Leviant c2f5408ef7 Fix unit test in NDEBUG build
llvm-svn: 290604
2016-12-27 11:07:53 +00:00
Chandler Carruth aa35167578 [PM] Teach BasicAA how to invalidate its result object.
This requires custom handling because BasicAA caches handles to other
analyses and so it needs to trigger indirect invalidation.

This fixes one of the common crashes when using the new PM in real
pipelines. I've also tweaked a regression test to check that we are at
least handling the most immediate case.

I'm going to work at re-structuring this test some to both scale better
(rather than all being in one file) and check more invalidation paths in
a follow-up commit, but I wanted to get the basic bug fix in place.

llvm-svn: 290603
2016-12-27 10:30:45 +00:00
Eugene Leviant 687d4024b5 Attempt to fix build bot after r290597
llvm-svn: 290602
2016-12-27 10:24:58 +00:00
Chandler Carruth 81c8edaf5c [PM] Disable more of the loop passes -- LCSSA and LoopSimplify are also
not really wired into the loop pass manager in a way that will let us
productively use these passes yet.

This lets the new PM get farther in basic testing which is useful for
establishing a good baseline of "doesn't explode". There are still
plenty of crashers in basic testing though, this just gets rid of some
noise that is well understood and not representing a specific or narrow
bug.

llvm-svn: 290601
2016-12-27 10:16:46 +00:00
Sam Kolton e66365e07d [AMDGPU] Assembler: support SDWA and DPP for VOP2b instructions
Reviewers: nhaustov, artem.tamazov, vpykhtin, tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D28051

llvm-svn: 290599
2016-12-27 10:06:42 +00:00
Eugene Leviant 920908352a RuntimeDyldELF: add R_AARCH64_ADD_ABS_LO12_NC reloc
Differential revision: https://reviews.llvm.org/D28115

llvm-svn: 290598
2016-12-27 09:51:38 +00:00
Eugene Leviant c089e406b9 Allow setting multiple debug types
Differential revision: https://reviews.llvm.org/D28109

llvm-svn: 290597
2016-12-27 09:31:20 +00:00
Daniel Berlin 1f31fe529e Change a std::vector to SmallVector in NewGVN
llvm-svn: 290596
2016-12-27 09:20:36 +00:00
Chandler Carruth 17c630a09c [PM] Teach the AAManager and AAResults layer (the worst offender for
inter-analysis dependencies) to use the new invalidation infrastructure.

This teaches it to invalidate itself when any of the peer function
AA results that it uses become invalid. We do this by just tracking the
originating IDs. I've kept it in a somewhat clunky API since some users
of AAResults are outside the new PM right now. We can clean this API up
if/when those users go away.

Secondly, it uses the registration on the outer analysis manager proxy
to trigger deferred invalidation when a module analysis result becomes
invalid.

I've included test cases that specifically try to trigger use-after-free
in both of these cases and they would crash or hang pretty horribly for
me even without ASan. Now they work nicely.

The `InvalidateAnalysis` utility pass required some tweaking to be
useful in this context and it still is pretty garbage. I'd like to
switch it back to the previous implementation and teach the explicit
invalidate method on the AnalysisManager to take care of correctly
triggering indirect invalidation, but I wanted to go ahead and send this
out so folks could see how all of this stuff works together in practice.
And, you know, that it does actually work. =]

Differential Revision: https://reviews.llvm.org/D27205

llvm-svn: 290595
2016-12-27 08:44:39 +00:00
Chandler Carruth ba90ae969c [PM] Introduce the facilities for registering cross-IR-unit dependencies
that require deferred invalidation.

This handles the other real-world invalidation scenario that we have
cases of: a function analysis which caches references to a module
analysis. We currently do this in the AA aggregation layer and might
well do this in other places as well.

Since this is relative rare, the technique is somewhat more cumbersome.
Analyses need to register themselves when accessing the outer analysis
manager's proxy. This proxy is already necessarily present to allow
access to the outer IR unit's analyses. By registering here we can track
and trigger invalidation when that outer analysis goes away.

To make this work we need to enhance the PreservedAnalyses
infrastructure to support a (slightly) more explicit model for "sets" of
analyses, and allow abandoning a single specific analyses even when
a set covering that analysis is preserved. That allows us to describe
the scenario of preserving all Function analyses *except* for the one
where deferred invalidation has triggered.

We also need to teach the invalidator API to support direct ID calls
instead of always going through a template to dispatch so that we can
just record the ID mapping.

I've introduced testing of all of this both for simple module<->function
cases as well as for more complex cases involving a CGSCC layer.

Much like the previous patch I've not tried to fully update the loop
pass management layer because that layer is due to be heavily reworked
to use similar techniques to the CGSCC to handle updates. As that
happens, we'll have a better testing basis for adding support like this.

Many thanks to both Justin and Sean for the extensive reviews on this to
help bring the API design and documentation into a better state.

Differential Revision: https://reviews.llvm.org/D27198

llvm-svn: 290594
2016-12-27 08:40:39 +00:00
Chandler Carruth 625038d5d5 [PM] Turn on the new PM's inliner in addition to the current one for
most of the inliner test cases.

The inliner involves a bunch of interesting code and tends to be where
most of the issues I've seen experimenting with the new PM lie. All of
these test cases pass, but I'd like to keep some more thorough coverage
here so doing a fairly blanket enabling.

There are a handful of interesting tests I've not enabled yet because
they're focused on the always inliner, or on functionality that doesn't
(yet) exist in the inliner.

llvm-svn: 290592
2016-12-27 07:18:43 +00:00
Craig Topper e77e901130 [AVX-512] Add all forms of VPALIGNR, VALIGND, and VALIGNQ to the load folding tables.
llvm-svn: 290591
2016-12-27 06:51:09 +00:00
Chandler Carruth 141bf5d14d [PM] Add one of the features left out of the initial inliner patch:
skipping indirectly recursive inline chains.

To do this, we implicitly build an inline stack for each callsite and
check prior to inlining that doing so would not form a cycle. This uses
the exact same technique and even shares some code with the legacy PM
inliner.

This solution remains deeply unsatisfying to me because it means we
cannot actually iterate the inliner externally. Doing so would not be
able to easily detect and avoid such cycles. Some day I would very much
like to have a solution that works without this internal state to detect
cycles, but this is not that day.

llvm-svn: 290590
2016-12-27 06:46:20 +00:00
Chandler Carruth db6ced8484 [PM] Wire up another test to the new pass manager.
Nothing really interesting here, but I had to improve the test to use
variables rather than hard coding value names as we happen to end up
with different value names in the new PM.

llvm-svn: 290589
2016-12-27 06:46:16 +00:00
George Burgess IV ed16024a9b [Analysis] Ignore `nobuiltin` on `allocsize` function calls.
We currently ignore the `allocsize` attribute on functions calls with
the `nobuiltin` attribute when trying to lower `@llvm.objectsize`. We
shouldn't care about `nobuiltin` here: `allocsize` is explicitly added
by the user, not inferred based on a function's symbol.

llvm-svn: 290588
2016-12-27 06:32:14 +00:00
George Burgess IV ce04489515 [Analysis] Refactor as promised in r290397.
This also makes us no longer check for `allocsize` on intrinsic calls.
This shouldn't matter, since intrinsics should provide the information
we get from `allocsize` on their own.

llvm-svn: 290585
2016-12-27 06:10:50 +00:00
Craig Topper 2da265b7bf [AVX-512] Remove masked pmuldq and pmuludq intrinsics and autoupgrade them to unmasked intrinsics plus a select.
llvm-svn: 290583
2016-12-27 05:30:14 +00:00
Craig Topper 72f2d4e8d6 [InstCombine][X86] Add DemandedElts support for 512-bit PMULDQ/PMULUDQ instructions
PMULDQ/PMULUDQ vXi64 instructions only use the even numbered v2Xi32 input elements which SimplifyDemandedVectorElts should try and use.

This builds on r290554 which added supported for 128 and 256-bit.

llvm-svn: 290582
2016-12-27 05:30:09 +00:00
Chandler Carruth 162504578b [LCG] Teach the LazyCallGraph to handle visiting the blockaddress
constant expression and to correctly form function reference edges
through them without crashing because one of the operands (the
`BasicBlock` isn't actually a constant despite being an operand of
a constant).

llvm-svn: 290581
2016-12-27 05:00:45 +00:00
Craig Topper 89b3e0223f [AVX-512] Add 512-bit unmasked intrinsics for pmuldq and pmuludq so we can add them to InstCombine with the 128 and 256 bit versions.
The 128 and 256 bit masked intrinsics are currently unused by clang. The sse and avx2 unmasked intrinsics are used instead. The new 512-bit intrinsic will be used to do the same. Then all masked versions will removed and autoupgraded.

llvm-svn: 290573
2016-12-27 03:46:05 +00:00
Chandler Carruth 03130d981c [PM] Teach the inliner in the new PM to merge attributes after inlining.
Also enable the new PM in the attributes test case which caught this
issue.

llvm-svn: 290572
2016-12-27 03:39:54 +00:00
Chandler Carruth 62c8b81ea8 [Inliner] Modernize all of the inliner tests that were using grep.
This mostly involved converting from grep to FileCheck and tidying up
the IR used.

In one case (invoke_test-3.ll) the test had become completely pointless
as we use 'resume' rather than 'unwind' now, and even then it did not
occur at the end of the line.

llvm-svn: 290570
2016-12-27 02:47:37 +00:00
Craig Topper 7f8540b5e7 [AVX-512][InstCombine] Teach InstCombine to turn masked scalar add/sub/mul/div with rounding intrinsics into normal IR operations if the rounding mode is CUR_DIRECTION.
An earlier commit added support for unmasked scalar operations. At that time isel wouldn't generate an optimal sequence for masked operations, but that has now been fixed.

llvm-svn: 290566
2016-12-27 01:56:30 +00:00
Craig Topper a0439377e6 [InstCombine][AVX-512] Add masked scalar add/sub/mul/div intrinsic test cases that don't have a CUR_DIRECTION rounding mode.
The CUR_DIRECTION case will be optimized in a future commit so this provides coverage for the other cases.

llvm-svn: 290565
2016-12-27 01:56:27 +00:00
Craig Topper 83f2145c18 [AVX-512] Add isel patterns to turn native masked scalar add/sub/mul/div into masked instructions.
llvm-svn: 290564
2016-12-27 01:56:24 +00:00
Craig Topper 5035b1212b [AVX-512] Add tests to show missed opportunities for combining masking with scalar arithmetic operations.
These particular sequences will be generated after a future change to teach InstCombine to turn masked scalar arithmetic intrinsics into native IR.

llvm-svn: 290563
2016-12-27 01:56:22 +00:00
Chandler Carruth 0ee8bb11c3 [PM] Move the collection of call sites to a more appropriate place
inside of `InlineFunction`. Prior to this, call instructions are
specifically being rewritten and replaced within the inlined region,
invalidating some of the call sites.

Several of these regions are using the same technique to walk the
inlined region so this seems clearly safe up to this point.

I've also added a short circuit to the scan for call sites based on what
other code is doing.

With this, the most common crash I've found in the new inliner code is
fixed. I've turned it on for another test case that covers this
scenario.

I'll make my way through most of the other inliner test cases
just to get some easy coverage next.

llvm-svn: 290562
2016-12-27 01:24:50 +00:00
Craig Topper 020b228155 [AVX-512][InstCombine] Teach InstCombine to turn packed add/sub/mul/div with rounding intrinsics into normal IR operations if the rounding mode is CUR_DIRECTION.
llvm-svn: 290559
2016-12-27 00:23:16 +00:00
Chandler Carruth 6e9bb7e064 [PM] Teach the always inliner in the new pass manager to support
removing fully-dead comdats without removing dead entries in comdats
with live members.

This factors the core logic out of the current inliner's internals to
a reusable utility and leverages that in both places. The factored out
code should also be (minorly) more efficient in cases where we have very
few dead functions or dead comdats to consider.

I've added a test case to cover this behavior of the always inliner.
This is the last significant bug in the new PM's always inliner I've
found (so far).

llvm-svn: 290557
2016-12-26 23:43:27 +00:00
Mehdi Amini 4506e447c1 [doc] Add mention of the difference in optimization level between Release and RelWithDebInfo in Cmake.rst
This is surprising to many people.

llvm-svn: 290556
2016-12-26 23:42:12 +00:00
Chandler Carruth cc44ab63b6 [ADT] Add an llvm::erase_if utility to make the standard erase+remove_if
pattern easier to write.

Differential Revision: https://reviews.llvm.org/D28120

llvm-svn: 290555
2016-12-26 23:30:44 +00:00
Simon Pilgrim c9cf7fc7a4 [InstCombine][X86] Add DemandedElts support for PMULDQ/PMULUDQ instructions
PMULDQ/PMULUDQ vXi64 instructions only use the even numbered v2Xi32 input elements which SimplifyDemandedVectorElts should try and use.

Differential Revision: https://reviews.llvm.org/D28119

llvm-svn: 290554
2016-12-26 23:28:17 +00:00
Chandler Carruth d9eaa54ef4 [ADT] Add a boring std::partition wrapper similar to our std::remove_if
wrapper.

llvm-svn: 290553
2016-12-26 23:10:40 +00:00
Daniel Berlin 85f91b0ec3 clang-format NewGVN files
llvm-svn: 290551
2016-12-26 20:06:58 +00:00
Daniel Berlin 85cbc8c097 Misc cleanups and simplifications for NewGVN.
Mostly use a bit more idiomatic C++ where we can,
so we can combine some things later.

Reviewers: davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28111

llvm-svn: 290550
2016-12-26 19:57:25 +00:00
Daniel Berlin d59e8010c5 Don't use our own incorrect version of isTriviallyDeadInstruction in NewGVN. Fixes PR/31472
llvm-svn: 290549
2016-12-26 18:44:36 +00:00
Davide Italiano fe7a3ee51e [NewGVN] Add a flag to enable the pass via `-mllvm`.
NewGVN can be tested passing `-mllvm -enable-newgvn` to clang.

Differential Revision:  https://reviews.llvm.org/D28059

llvm-svn: 290548
2016-12-26 18:26:19 +00:00
Davide Italiano 8ea5e4fcae [NewGVN] Change test to reflect difference between GVN and NewGVN.
The current GVN algorithm folds unconditional branches to, it claims,
expose more PRE oportunities. The folding, if really needed,
(which is not sure, as it's not really proved it improves analysis)
can be done by an earlier cleanup pass instead of GVN itself.
Ack'ed/SGTM'd by Daniel Berlin.

Differential Revision:  https://reviews.llvm.org/D28117

llvm-svn: 290546
2016-12-26 18:10:09 +00:00
Simon Pilgrim cd9d729461 Wdocumentation fix
llvm-svn: 290545
2016-12-26 17:48:19 +00:00
Simon Pilgrim e8a5ab35ca [X86][AVX512] Added v64i8 reverse shuffle test (PR31470)
llvm-svn: 290544
2016-12-26 17:38:58 +00:00
Davide Italiano a312ca845c [NewGVN] Fold lookupOperandLeader() when there's only one use. NFCI.
llvm-svn: 290543
2016-12-26 16:19:34 +00:00
Bryant Wong b5e03b61e2 [InstCombiner] Simplify lib calls to `round{,f}`
Differential Revision: https://reviews.llvm.org/D28110

llvm-svn: 290542
2016-12-26 14:29:29 +00:00
Chandler Carruth 80db76d556 Test the different scenarios of GlobalDCE and comdats more
systematically and document in the test what all is going on.

This replaces the PR-named test that was the only coverage for GlobalDCE
and comdats previously. I wrote this because I wasn't certain how
comdat DCE was supposed to work and wanted to step through what
GlobalDCE did to fully understand it. After talking to folks and reading
the code and really staring at things it all makes sense but it seemed
good to help write down some of this in a more explicit and fully
covering test case.

For example, it seemed like a bug that GlobalDCE didn't consider comdat
participation of ifuncs. Specifically it seemed like an accident because
testing didn't really cover that case. But in fact, ifuncs specifically
cannot participate in a comdat despite having that API. The new test
case covers this and explicitly documents that DCE gets to fire here
even though there are comdats involved.

Also, we didn't have any positive tests for the challenging cases such
as usage cycles between comdat participants that might make them seem
alive except that there is no external edge into the cycle.

llvm-svn: 290537
2016-12-26 08:54:01 +00:00
Craig Topper 5ef13ba18b [AVX-512] Fix some patterns to use extended register classes.
llvm-svn: 290536
2016-12-26 07:26:07 +00:00
Craig Topper 7b788ada2d [AVX-512][InstCombine] Teach InstCombine to turn scalar add/sub/mul/div with rounding intrinsics into normal IR operations if the rounding mode is CUR_DIRECTION.
Summary:
I only do this for unmasked cases for now because isel is failing to fold the mask. I'll try to fix that soon.

I'll do the same thing for packed add/sub/mul/div in a future patch.

Reviewers: delena, RKSimon, zvi, craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27879

llvm-svn: 290535
2016-12-26 06:33:19 +00:00
Craig Topper f56d985f77 [AVX-512] Don't assume that the rounding mode argument to intrinsics is a constant. While clang will guarantee this, nothing in the backend will.
A non-constant value will now result in an isel error instead of just asserting or crashing due to a bad cast during lowering.

llvm-svn: 290532
2016-12-26 01:40:17 +00:00
Chandler Carruth 0cf829c171 Fix some bad indentation that I or another introduced somehow.
llvm-svn: 290531
2016-12-26 01:20:59 +00:00
Craig Topper e328045711 [AVX-512][InstCombine] Teach InstCombine to converted masked vpermv intrinsics into shufflevector instructions
Summary:
This patch adds support for converting the masked vpermv intrinsics into shufflevector instructions if the indices are constants.

We also need to wrap a select instruction around the shuffle to take care of the masking part. InstCombine will take care of optimizing the select if the mask is constant so I didn't bother checking for that.

Reviewers: zvi, delena, spatel, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27825

llvm-svn: 290530
2016-12-25 23:58:57 +00:00
Bryant Wong c6b46d80c8 Fix `update_test_checks.py` bug that incorrectly truncates IR body.
Differential Revision: https://reviews.llvm.org/D26619

llvm-svn: 290529
2016-12-25 23:46:55 +00:00
Chandler Carruth cb22b89f3f [ADT] Add a generic concatenating iterator and range (take 2).
This recommits r290512 that was reverted when MSVC failed to compile it. Since
then I've played with various approaches using rextester.com (where I was able
to reproduce the failure) and think that I have a solution thanks in part to
the help of Dave Blaikie! It seems MSVC just has a defective `decltype` in this
version. Manually writing out the type seems to do the trick, even though it is
.... quite complicated.

Original commit message:
This allows both defining convenience iterator/range accessors on types
which walk across N different independent ranges within the object, and
more direct and simple usages with range based for loops such as shown
in the unittest. The same facilities are used for both. They end up
quite small and simple as it happens.

I've also switched an iterator on `Module` to use this. I would like to
add another convenience iterator that includes even more sequences as
part of it and seeing this one already present motivated me to actually
abstract it away and introduce a general utility.

Differential Revision: https://reviews.llvm.org/D28093

llvm-svn: 290528
2016-12-25 23:41:14 +00:00
Bryant Wong 4213d94142 [MemorySSA] Define a restricted upward AccessList splice.
Differential Revision: https://reviews.llvm.org/D26661

llvm-svn: 290527
2016-12-25 23:34:07 +00:00
Bryant Wong a07d9b1460 [AliasAnalysis] Teach BasicAA about memcpy.
Differential Revision: https://reviews.llvm.org/D27034

llvm-svn: 290526
2016-12-25 22:42:27 +00:00
Daniel Berlin d7c12ee54c Value number stores and memory states so we can detect when memory states are equivalent (IE store of same value to memory).
Reviewers: davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28084

llvm-svn: 290525
2016-12-25 22:23:49 +00:00
Daniel Berlin 65f5f0d728 Rename GVNExpression *ops_ members to *op_* to match conventions in the rest of LLVM
llvm-svn: 290524
2016-12-25 22:10:37 +00:00
Lang Hames c9d0ff1302 [Orc][RPC] Add a ParallelCallGroup utility for dispatching and waiting on
multiple asynchronous RPC calls.

ParallelCallGroup allows multiple asynchronous calls to be dispatched,
and provides a wait method that blocks until all asynchronous calls have
been executed on the remote and all return value handlers run on the
local machine.

This will allow, for example, the JIT client to issue memory allocation calls
for all sections in parallel, then block until all memory has been allocated
on the remote and the allocated addresses registered with the client, at which
point the JIT client can proceed to applying relocations.

llvm-svn: 290523
2016-12-25 21:55:05 +00:00
Lang Hames aac390ee85 [Orc][RPC] Clang-format RPCUtils header.
Some of the recent RPC call type-checking changes weren't formatted prior to
commit.

llvm-svn: 290520
2016-12-25 19:55:59 +00:00
Greg Clayton 1eb0bca178 Add newline to end of file to quiet warnings.
llvm-svn: 290519
2016-12-25 18:41:47 +00:00
Michael Zuckerman 86602e85dd revert commit 290516
llvm-svn: 290517
2016-12-25 12:45:18 +00:00
Michael Zuckerman 45aa420640 Commit try added new empty line
llvm-svn: 290516
2016-12-25 12:01:34 +00:00
Amjad Aboud 7faeecc8f7 [DebugInfo] Added support for Checksum debug info feature.
Differential Revision: https://reviews.llvm.org/D27642

llvm-svn: 290514
2016-12-25 10:12:09 +00:00
Chandler Carruth 5dc0bba4e4 Revert r290512: [ADT] Add a generic concatenating iterator and range.
This code doesn't work on MSVC for reasons that elude me and I've not
yet covinced a workaround to compile cleanly so reverting for now while
I play with it.

llvm-svn: 290513
2016-12-25 09:36:24 +00:00
Chandler Carruth fba73aec72 [ADT] Add a generic concatenating iterator and range.
This allows both defining convenience iterator/range accessors on types
which walk across N different independent ranges within the object, and
more direct and simple usages with range based for loops such as shown
in the unittest. The same facilities are used for both. They end up
quite small and simple as it happens.

I've also switched an iterator on `Module` to use this. I would like to
add another convenience iterator that includes even more sequences as
part of it and seeing this one already present motivated me to actually
abstract it away and introduce a general utility.

Differential Revision: https://reviews.llvm.org/D28093

llvm-svn: 290512
2016-12-25 08:22:50 +00:00
Mehdi Amini 690952d15e MetadataLoader: replace the tracking of ForwardReferences and UnresolvedNodes with a set-based solution (NFC)
This makes it explicit what is the exact list to handle, and it
looks much more easy to manipulate and understand that the
previous custom tracking of min/max to express the range where
to look for.

Differential Revision: https://reviews.llvm.org/D28089

llvm-svn: 290507
2016-12-25 04:22:54 +00:00
Mehdi Amini 4f90ee0010 MetadataLoader: add an extra assertion in Placeholders flush (NFC)
We don't expect any forward reference at this point.

llvm-svn: 290506
2016-12-25 03:55:53 +00:00
Daniel Berlin a7b624ec6a Add range iterator for blocks in MemoryPhi
llvm-svn: 290504
2016-12-24 21:52:10 +00:00
Simon Pilgrim 3265d951b6 [InstCombine][X86] Add tests showing missed opportunities to simplify PMULUDQ/PMULDQ inputs.
PMULUDQ/PMULDQ - only the even elements (0, 2, 4, 6) of the vXi32 inputs are required.

llvm-svn: 290502
2016-12-24 17:30:19 +00:00
Bryant Wong 430f98a58b Test commit.
llvm-svn: 290501
2016-12-24 17:26:38 +00:00
Davide Italiano 463c32eaf6 [NewGVN] Prefer `auto` to explicit type when the latter is obvious.
llvm-svn: 290499
2016-12-24 17:17:21 +00:00
Davide Italiano 4f84764e32 [NewGVN] Simplify several equals() member functions. NFCI.
llvm-svn: 290498
2016-12-24 17:14:19 +00:00
Davide Italiano d42deb4014 [PM] Remove vestiges of NoAA. NFCI.
llvm-svn: 290496
2016-12-24 16:14:05 +00:00
Ed Maste 178a4e5f8d llvm-objdump: sort phdr type strings in advance of adding new ones
llvm-svn: 290494
2016-12-24 14:53:45 +00:00
Simon Pilgrim 0d66d29678 [SelectionDAG] Early out from computeKnownBits when we know we will have no common bits.
Avoid extra (recursive) calls to computeKnownBits if we already know that there are no common known bits.

llvm-svn: 290490
2016-12-24 12:59:35 +00:00
Chandler Carruth 534d644b86 [PM] Try to improve the comments here to make what's going on more
clear.

Based on post-commit review suggestion from Sean. (Thanks!)

llvm-svn: 290488
2016-12-24 05:11:17 +00:00
Daniel Berlin 8a6a86146c Mark isOnlyReachableViaThisEdge as const
llvm-svn: 290468
2016-12-24 00:04:07 +00:00
Mehdi Amini 4fe6a8c826 Add an assertion for cl::opt names: they can't start with '-'
llvm-svn: 290467
2016-12-23 23:55:26 +00:00
Mehdi Amini 70a02b0923 llvm-size: remove leading dash in '-radix' option
cl::opt does not accept such option

llvm-svn: 290466
2016-12-23 23:55:08 +00:00
Mehdi Amini a9d7aacd4d llvm-readobj: remove leading dash in '-a' option (ARMAttributesShort)
cl::opt does not accept such option

llvm-svn: 290465
2016-12-23 23:54:52 +00:00
Mehdi Amini 95c1f91545 llvm-lto2: remove leading '-' for cl::opt declaration
llvm-svn: 290464
2016-12-23 23:54:34 +00:00
Mehdi Amini 14f19bd012 llvm-lto2: Print diagnostics before exiting (NFC)
llvm-svn: 290463
2016-12-23 23:54:17 +00:00
Mehdi Amini 4c80946850 llvm-lto: pass errs() to the module verifier (NFC)
It is more friendly to have the actual diagnostic when the
verifier fails.

llvm-svn: 290462
2016-12-23 23:53:57 +00:00
Chandler Carruth cdfdd4330a [PM] Remove a bunch of junk that snuck in when I failed at manipulating
my editor to close and commit the patch. Sorry for the noise.

llvm-svn: 290460
2016-12-23 23:39:31 +00:00
Chandler Carruth 4eaff12ba2 [PM] Teach the always inlining test case to be much more strict about
whether functions are removed, and fix the new PM's always inliner to
actually pass this test.

Without this, the new PM's always inliner leaves all the functions
kicking around which won't work out very well given the semantics of
always inline.

Doing this really highlights how frustrating the current alwaysinline
semantic contract is though -- why can we put it on *external*
functions, etc?

Also I've added a number of tricky and interesting test cases for
removing functions with the always inliner. There is one remaining case
not handled -- fully removing comdats -- and I've left a FIXME about
this.

llvm-svn: 290457
2016-12-23 23:33:35 +00:00
Chandler Carruth f32f63f222 [PM] Clean up test case and comments a bit. NFC.
llvm-svn: 290456
2016-12-23 23:33:32 +00:00
Chandler Carruth 060ad61fbe [PM] Add support for building a default AA pipeline to the PassBuilder.
Pretty boring and lame as-is but necessary. This is definitely a place
we'll end up with extension hooks longer term. =]

Differential Revision: https://reviews.llvm.org/D28076

llvm-svn: 290449
2016-12-23 20:38:19 +00:00
Mehdi Amini 94f86ad4e0 Function-import: Disable IRVerifier on lazy-loaded modules: the ODR TypeUniquing generates invalid debug info.
llvm-svn: 290442
2016-12-23 19:19:44 +00:00
Mehdi Amini fc06b83ee7 Fix build after r290437 (missing include)
llvm-svn: 290438
2016-12-23 18:04:51 +00:00
Mehdi Amini 9a9077fdad FunctionImport: fix typo '#ifndef NDEBUG' instead of '#ifndef DEBUG'
llvm-svn: 290437
2016-12-23 17:59:24 +00:00
Jan Vesely 206a510e54 AMDGPU: split ret/noret patterns for global atomics
Differential Revision: https://reviews.llvm.org/D27989

llvm-svn: 290435
2016-12-23 15:34:51 +00:00
Davide Italiano b9ff23a402 [LICM] Plug a leak freeing the ASTs before clearing the map.
llvm-svn: 290433
2016-12-23 15:02:35 +00:00
Piotr Padlewski 383edba1fd [MemDep] NFC changes
llvm-svn: 290428
2016-12-23 13:13:32 +00:00
Davide Italiano 34f94384a5 [LICM] Work around LICM needs to maintain state across loops.
The pass creates some state which expects to be cleaned up by
a later instance of the same pass. opt-bisect happens to expose
this not ideal design because calling skipLoop() will result in
this state not being cleaned up at times and an assertion firing
in `doFinalization()`. Chandler tells me the new pass manager will
give us options to avoid these design traps, but until it's not ready,
we need a workaround for the current pass infrastructure. Fix provided
by Andy Kaylor, see the review for a complete discussion.

Differential Revision:  https://reviews.llvm.org/D25848

llvm-svn: 290427
2016-12-23 13:12:50 +00:00
Renato Golin 21da340f7a [AArch64] Cortex-A57 FDIV/FSQRT scheduling fix (W-unit)
According to the Cortex-A57 doc, FDIV/FSQRT instructions should use F0 unit
(W-unit in AArch64SchedA57.td, the same as cryptography instructions),
not F1 unit (X-unit in td, like ASIMD absolute diff accum SABA/UABA).

This patch changes FDIV/FSQRT scheduling declarations to use A57UnitW
instead of A57UnitX. Also, latencies for those instructions are
corrected.

Patch by Andrew Zhogin.

llvm-svn: 290426
2016-12-23 12:51:41 +00:00
Florian Hahn 898127fe36 Revert r290423 because it broke the sanitizer-x86_64-linux-autoconf buildbot.
llvm-svn: 290425
2016-12-23 12:26:11 +00:00
Florian Hahn 1d6b1a7b79 [framelowering] Skip dbg values when getting next/previous instruction.
Summary:
In mergeSPUpdates, debug values need to be ignored when getting the
previous element, otherwise debug data could have an impact on codegen.

In eliminateCallFramePseudoInstr, debug values after the erased element
could have an impact on codegen and should be skipped.

Closes PR31319 (https://llvm.org/bugs/show_bug.cgi?id=31319)

Reviewers: mkuper, MatzeB, aprantl

Subscribers: gbedwell, llvm-commits

Differential Revision: https://reviews.llvm.org/D27688

llvm-svn: 290423
2016-12-23 11:35:00 +00:00
Davide Italiano 0ff941620c [NewGVN] Remove (for now) unused code. NFCI.
llvm-svn: 290420
2016-12-23 10:28:30 +00:00
Mehdi Amini 96cdc49305 [ThinLTO] Verify lazy-loaded source module for function importing when assertions are enabled (NFC)
llvm-svn: 290416
2016-12-23 05:16:19 +00:00
Mehdi Amini 9f926f70c1 MetadataLoader: split the creation of a single metadata out of a Record into its own function (NFC)
This is pure code motion, will just make it more reusable when I'll
attempt to lazy-load Metadats on-demand.

llvm-svn: 290414
2016-12-23 03:59:18 +00:00
Dan Gohman 00d734d89b [WebAssembly] Annotate call and load/store immediates.
These will be used to guide the binary encoding of these immediates.

llvm-svn: 290412
2016-12-23 03:23:52 +00:00
Zijiao Ma bf6007bd1b Make the canonicalisation on shifts benifit to more case.
1.Fix pessimized case in FIXME.
2.Add tests for it.
3.The canonicalisation on shifts results in different sequence for
  tests of machine-licm.Correct some check lines.

Differential Revision: https://reviews.llvm.org/D27916

llvm-svn: 290410
2016-12-23 02:56:07 +00:00
Mehdi Amini 37c178b6f5 MetadataLoader: Reinitialize MinFwdRef/MaxFwdRef after resolving cycles (NFC)
This put the Loader back in a consistent state.

llvm-svn: 290409
2016-12-23 02:20:12 +00:00
Mehdi Amini 5ae6170fc2 MetadataLoader: Add an assertion for the implicit invariant of PlaceHolder while loading Metadata (NFC)
llvm-svn: 290408
2016-12-23 02:20:09 +00:00
Mehdi Amini 70a9cd4cbe MetadataLoader: Make sure every member of MetadataLoader are initialized (NFC)
llvm-svn: 290407
2016-12-23 02:20:07 +00:00
Mehdi Amini ec68dd49bf MetadataLoader: Refactor "IsImporting" into the Pimpl for the MetadataLoader (NFC)
Keeping all the state together will make it easier to handle.

llvm-svn: 290406
2016-12-23 02:20:02 +00:00
Chandler Carruth eb119ece4a Fix some DOS-style line endings that I suspect snuck in from one of the
frustrating Subversion clients that fails to do line ending translation
of text files.

llvm-svn: 290404
2016-12-23 02:02:26 +00:00
NAKAMURA Takumi f931d66e0d KillTheDoctor.cpp: Appease cases on case-senstitive host, like mingw on linux.
llvm-svn: 290402
2016-12-23 01:39:26 +00:00
NAKAMURA Takumi 3166854827 KillTheDoctor: Add a required system lib, psapi. KillTheDoctor itself uses Win32 API directly.
llvm-svn: 290401
2016-12-23 01:39:20 +00:00
Chandler Carruth ee08676102 Enable '-Wstring-conversion' and fix some bad asserts that it helped
find.

Notable is the assert in NewGVN which had no effect because of the bug.

llvm-svn: 290400
2016-12-23 01:38:06 +00:00
George Burgess IV ccae43a247 Don't consider allocsize functions to be allocation functions.
This patch fixes some ASAN unittest failures on FreeBSD. See the
cfe-commits email thread for r290169 for more on those.

According to the LangRef, the allocsize attribute only tells us about
the number of bytes that exist at the memory location pointed to by the
return value of a function. It does not necessarily mean that the
function will only ever allocate. So, we need to be very careful about
treating functions with allocsize as general allocation functions. This
patch makes us fully conservative in this regard, though I suspect that
we have room to be a bit more aggressive if we want.

This has a FIXME that can be fixed by a relatively straightforward
refactor; I just wanted to keep this patch minimal. If this sticks, I'll
come back and fix it in a few days.

llvm-svn: 290397
2016-12-23 01:18:09 +00:00
Sanjoy Das 50fef4321b NFC code motion in ImplicitNullChecks
Extract out two large lambdas into top level member functions.

llvm-svn: 290395
2016-12-23 00:41:24 +00:00
Sanjoy Das 9a129807f3 Reimplement depedency tracking in the ImplicitNullChecks pass
Summary:
This change rewrites a core component in the ImplicitNullChecks pass for
greater simplicity since the original design was over-complicated for no
good reason.  Please review this as essentially a new pass.  The change
is almost NFC and I've added a test case for a scenario that this new
code handles that wasn't handled earlier.

The implicit null check pass, at its core, is a code hoisting transform.
It differs from "normal" code transforms in that it speculates
potentially faulting instructions (by design), but a lot of the usual
hazard detection logic (register read-after-write etc.) still applies.
We previously detected hazards by keeping track of registers defined and
used by machine instructions over an instruction range, but that was
unwieldy and did not actually confer any performance benefits.  The
intent was to have linear time complexity over the number of machine
instructions considered, but it ended up being N^2 is practice.

This new version is more obviously O(N^2) (with N capped to 8 by
default) in hazard detection.  It does not attempt to be clever in
tracking register uses or defs (the previous cleverness here was a
source of bugs).

Once this is checked in, I'll extract out the `IsSuitableMemoryOp` and
`CanHoistLoadInst` lambda into member functions (they're too complicated
to be inline lambdas) and do some other related NFC cleanups.

Reviewers: reames, anna, atrick

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D27592

llvm-svn: 290394
2016-12-23 00:41:21 +00:00
Chris Bieneman 7e98468f1e [ObjectYAML] Fixing a compiler warning
Accidentally re-defined the variable instead of setting it. Oops!

llvm-svn: 290388
2016-12-22 22:58:07 +00:00
Quentin Colombet 3749f33888 [GlobalISel] More fix for the size vs. type typo. NFC.
I missed those in my previous commit (r290378).

llvm-svn: 290387
2016-12-22 22:50:34 +00:00
Chris Bieneman e0e451d927 [ObjectYAML] Support for DWARF debug_info section
This patch adds support for YAML<->DWARF for debug_info sections.

This re-lands r290147, reverted in 290148, re-landed in r290204 after fixing the issue that caused bots to fail (thank you UBSan!), and reverted again in r290209 due to failures on big endian systems.

After adding support for preserving endianness, this should be good now.

llvm-svn: 290386
2016-12-22 22:44:27 +00:00
Ahmed Bougacha 1277833aa6 [AArch64] Simplify indexed-memory testcase. NFC.
We're only testing the addressing mode on the stores; we don't
need to load/store pointers we can simply pass/return.

llvm-svn: 290385
2016-12-22 22:27:05 +00:00
Evgeniy Stepanov 27d4c9b71b [cfi] Emit jump tables as a function-level inline asm.
Use a dummy private function with inline asm calls instead of module
level asm blocks for CFI jumptables.

The main advantage is that now jumptable codegen can be affected by
the function attributes (like target_cpu on ARM). Module level asm
gets the default subtarget based on the target triple, which is often
not good enough.

This change also uses asm constraints/arguments to reference
jumptable targets and aliases directly. We no longer do asm name
mangling in an IR pass.

Differential Revision: https://reviews.llvm.org/D28012

llvm-svn: 290384
2016-12-22 22:22:35 +00:00
Chris Bieneman e477fb9591 [ObjectYAML] Fixing big endian bots from r290381
Bot URL:
http://lab.llvm.org:8011/builders/clang-s390x-linux/builds/2505

llvm-svn: 290383
2016-12-22 22:16:04 +00:00
Chris Bieneman 55de3a2449 [ObjectYAML] MachO support for endianness
This patch adds support to the macho<->yaml tools for preserving endianness in MachO structures and DWARF data.

llvm-svn: 290381
2016-12-22 21:58:03 +00:00
Quentin Colombet fa5960a28b [MachineVerifier] Check that even generic vregs comply to regclass constraints.
We used to not check generic vregs, but that is actually a mistake given
nothing in the GlobalISel pipeline is going to fix the constraints on
target specific instructions. Therefore, the target has to have them
right from the start.

llvm-svn: 290380
2016-12-22 21:56:39 +00:00
Quentin Colombet f372150f73 [AArch64] Change a test to use a generic instr instead of a target specific one.
Target specific instructions have requirements that are not compatible
with what we want to test here. Namely, target specific instructions
must have their operands properly mapped on register classes.

llvm-svn: 290379
2016-12-22 21:56:37 +00:00
Quentin Colombet e08cc599b8 [MIRParser] Fix a typo in comment and error message.
We have long switched from size to type.

llvm-svn: 290378
2016-12-22 21:56:35 +00:00
Quentin Colombet f38015e5fe [AArch64][CallLowering] Constraint registers on target specific instruction
The InstructionSelect pass will not look at target specific instructions
since they are already selected. As a result, the operands of target
specific instructions must be properly constrained, because it is not
going to fix them.

This fixes invalid register classes on call instruction.

llvm-svn: 290377
2016-12-22 21:56:31 +00:00
Quentin Colombet 9751e61fe1 [MIRParser] Non-generic virtual register may have a type.
When generic virtual registers get constrained, because of a use on a
target specific operation for instance, we end up with regular virtual
registers with a type and that's perfectly fine.

llvm-svn: 290376
2016-12-22 21:56:29 +00:00
Quentin Colombet 7e1f66d6f5 [RegisterBankInfo] Allow to set a register class when nothing else is set
This is going to be needed to be able to constraint register class on
target specific instruction while the RegBankSelect pass did not run
yet.

llvm-svn: 290375
2016-12-22 21:56:26 +00:00
Quentin Colombet b4e71185b2 [GlobalISel] Refactor the logic to constraint registers.
Move the logic to constraint register from InstructionSelector to a
utility function. It will be required by other passes in the GlobalISel
pipeline.

llvm-svn: 290374
2016-12-22 21:56:19 +00:00
Matt Arsenault 0b26e47345 AMDGPU: Invert cmp + select with constant
Canonicalize a select with a constant to the false side. This
enables more instruction shrinking opportunities since an
inline immediate can be used for the false side of v_cndmask_b32_e32.

This seems to usually be better but causes some code size regressions
in some tests.

llvm-svn: 290372
2016-12-22 21:40:08 +00:00
Tim Shen 53ddc1d0f4 [PowerPC] Add ppc support to update_llc_test_checks.py, and ppc tests. NFC.
Reviewers: chandlerc, hfinkel, echristo, iteratee

Subscribers: mehdi_amini, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D28036

llvm-svn: 290370
2016-12-22 20:59:39 +00:00
Krzysztof Parzyszek 3885d87c60 [Hexagon] Add DAG mutations for machine pipeliner
llvm-svn: 290366
2016-12-22 19:44:55 +00:00
Wei Mi a2f0b594c2 Redo store splitting in CodeGenPrepare.
This is a succeeding patch of https://reviews.llvm.org/D22840 to address the
issue when a value to be merged into an int64 pair is in a different BB. Redoing
the store splitting in CodeGenPrepare so we can match the pattern across multiple
BBs and move some instructions into the same BB. We still keep the code in dag
combine so that we can catch cases that show up after DAG combining runs.

Differential Revision: https://reviews.llvm.org/D25914

llvm-svn: 290365
2016-12-22 19:44:45 +00:00
Wei Mi f3f01aba48 Change the interface of TLI.isMultiStoresCheaperThanBitsMerge.
This is for splitMergedValStore in DAG Combine to share the target query interface
with similar logic in CodeGenPrepare.

Differential Revision: https://reviews.llvm.org/D24707

llvm-svn: 290363
2016-12-22 19:38:22 +00:00
Petar Jovanovic 8a4e63994e [mips] Fix compact branch hazard detection, part 2
Follow up to D27209 fix, this patch now properly handles single transient
instruction in basic block.

Patch by Aleksandar Beserminji.

Differential Revision: https://reviews.llvm.org/D27856

llvm-svn: 290361
2016-12-22 19:29:50 +00:00
Krzysztof Parzyszek 8839124848 Add the DAG mutation interface to the software pipeliner
llvm-svn: 290360
2016-12-22 19:21:20 +00:00
Reid Kleckner c2b56634cf Pass -Wa,-mbig-obj in 64-bit mingw builds
COFF has a 2**16 section limit, and on Win64, every COMDAT function
creates at least 3 sections: .text, .pdata, and .xdata. For MSVC, we
enable bigobj on a file-by-file basis, but GCC appears to hit the limit
on different files.

Fixes PR25953

llvm-svn: 290358
2016-12-22 19:12:14 +00:00
Reid Kleckner 143a937f79 Build KillTheDoctor with mingw-w64
compiler-rt uses it in its lit tests.

llvm-svn: 290357
2016-12-22 19:11:42 +00:00
Krzysztof Parzyszek df24da221e Fix two bugs in the pipeliner in renaming phis in the prolog and epilog
When the pipeliner is renaming phi values, it may need to iterate through
the phi operands to check for other phis. However, the pipeliner should
stop once it reaches a phi that is outside the pipelined loop.

Also, when the generateExistingPhis code is unable to reuse an existing
phi, the default code that computes the PhiOp2 is only to be used when
the pipeliner is generating the kernel. Otherwise, the phi may be a value
computed earlier in the same epilog.

Patch by Brendon Cahoon.

llvm-svn: 290355
2016-12-22 18:49:55 +00:00
Matt Arsenault 941632839f AMDGPU: Use i16 for i16 shift amount
llvm-svn: 290351
2016-12-22 16:36:25 +00:00
Davide Italiano e05e3306a3 [NewGVN] Add the pass to PassRegistry.def.
We need to hook up here to get it working with the new PM.
Add a test while here (and remove a typo).

llvm-svn: 290350
2016-12-22 16:35:02 +00:00
Matt Arsenault 3c97e2030a AMDGPU: Fix missing 16-bit cmpx instructions
llvm-svn: 290349
2016-12-22 16:27:14 +00:00
Matt Arsenault 18f56be3d2 AMDGPU: Use i16 comparison instructions
llvm-svn: 290348
2016-12-22 16:27:11 +00:00
Matt Arsenault fef7beb6a6 AMDGPU: Fixed '!NodePtr->isKnownSentinel()' assert
Caused by dereferencing end iterator when trying to const cast the iterator.

Patch by Martin Sherburn

llvm-svn: 290347
2016-12-22 16:06:32 +00:00
Davide Italiano 7e274e02ae [GVN] Initial check-in of a new global value numbering algorithm.
The code have been developed by Daniel Berlin over the years, and
the new implementation goal is that of addressing shortcomings of
the current GVN infrastructure, i.e. long compile time for large
testcases, lack of phi predication, no load/store value numbering
etc...

The current code just implements the "core" GVN algorithm, although
other pieces (load coercion, phi handling, predicate system) are
already implemented in a branch out of tree. Once the core is stable,
we'll start adding pieces on top of the base framework.
The test currently living in test/Transform/NewGVN are a copy
of the ones in GVN, with proper `XFAIL` (missing features in NewGVN).
A flag will be added in a future commit to enable NewGVN, so that
interested parties can exercise this code easily.

Differential Revision:  https://reviews.llvm.org/D26224

llvm-svn: 290346
2016-12-22 16:03:48 +00:00
Dan Gohman 8b4340a5dd [WebAssembly] Add an "explicit" keyword to a constructor.
llvm-svn: 290345
2016-12-22 16:03:02 +00:00
Dan Gohman 207ed22660 [WebAssembly] Don't use variadic operand indices in the MCOperandInfo array.
llvm-svn: 290344
2016-12-22 16:00:55 +00:00
Dan Gohman 728926ac59 [WebAssembly] Don't old negative load/store offsets in fast-isel.
WebAssembly's load/store offsets are unsigned and don't wrap, so it's not
valid to fold in a negative offset.

llvm-svn: 290342
2016-12-22 15:15:10 +00:00
Sam Kolton a568e3dde7 [AMDGPU] Add pseudo SDWA instructions
Summary: This is needed for later SDWA support in CodeGen.

Reviewers: vpykhtin, tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D27412

llvm-svn: 290338
2016-12-22 12:57:41 +00:00
Sam Kolton a6792a39c4 [AMDGPU] Disassembler: fix for disaasembling v_mac_f32/16_dpp/sdwa
Summary: Real instruction should copy constraints from real instruction. This allows auto-generated disassembler to correctly process tied operands.

Reviewers: nhaustov, vpykhtin, tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D27847

llvm-svn: 290336
2016-12-22 11:30:48 +00:00
Ayman Musa 9ff608cdc6 [X86][AVX2] Passing the appropriate memory operand class to VPMADDWD instruction.
Replacing the memory operand in the ymm version of VPMADDWD from i128mem to i256mem.

Differential Revision: https://reviews.llvm.org/D28024

llvm-svn: 290333
2016-12-22 08:42:46 +00:00
Chandler Carruth 0d1d49507b [PM] Loosen the check ever so slightly -- MSVC appears to not include
a space after the comma in template arguments with our hacky type name
system.

llvm-svn: 290331
2016-12-22 07:53:20 +00:00
Chandler Carruth ee6865f425 [PM] Make a couple of CHECK lines a bit more precise, NFC.
I was staring at these and didn't realize these were module-layer
proxies as opposed to some other layer. Justin and I have a plan to
rename things to make the names themselves much easier to reason about,
but I at least want the CHECK lines to be precise for now.

llvm-svn: 290328
2016-12-22 07:14:35 +00:00
Chandler Carruth 9c36c922d9 [PM] Remove now-dead extern template and explicit instantiation
declarations.

We're using a custom class here instead of the helper template, these
bits just didn't get deleted when the other bits did get deleted. This
was found by a really nice MSVC warning about explicitly instantiating
a template where some member functions aren't defined and thus can't be
instantiatied.

llvm-svn: 290327
2016-12-22 07:14:33 +00:00
Chandler Carruth e3f5064b72 [PM] Introduce a reasonable port of the main per-module pass pipeline
from the old pass manager in the new one.

I'm not trying to support (initially) the numerous options that are
currently available to customize the pass pipeline. If we end up really
wanting them, we can add them later, but I suspect many are no longer
interesting. The simplicity of omitting them will help a lot as we sort
out what the pipeline should look like in the new PM.

I've also documented to the best of my ability *why* each pass or group
of passes is used so that reading the pipeline is more helpful. In many
cases I think we have some questionable choices of ordering and I've
left FIXME comments in place so we know what to come back and revisit
going forward. But for now, I've left it as similar to the current
pipeline as I could.

Lastly, I've had to comment out several places where passes are not
ported to the new pass manager or where the loop pass infrastructure is
not yet ready. I did at least fix a few bugs in the loop pass
infrastructure uncovered by running the full pipeline, but I didn't want
to go too far in this patch -- I'll come back and re-enable these as the
infrastructure comes online. But I'd like to keep the comments in place
because I don't want to lose track of which passes need to be enabled
and where they go.

One thing that seemed like a significant API improvement was to require
that we don't build pipelines for O0. It seems to have no real benefit.

I've also switched back to returning pass managers by value as at this
API layer it feels much more natural to me for composition. But if
others disagree, I'm happy to go back to an output parameter.

I'm not 100% happy with the testing strategy currently, but it seems at
least OK. I may come back and try to refactor or otherwise improve this
in subsequent patches but I wanted to at least get a good starting point
in place.

Differential Revision: https://reviews.llvm.org/D28042

llvm-svn: 290325
2016-12-22 06:59:15 +00:00
Adrian Prantl 5542da4bbc Fix an assertion in DwarfExpression when emitting fragments in vector registers
When DwarfExpression is emitting a fragment that is located in a
register and that fragment is smaller than the register, and the
register must be composed from sub-registers (are you still with me?)
the last DW_OP_piece operation must not be larger than the size of the
fragment itself, since the last piece of the fragment could be smaller
than the last subregister that is being emitted.

rdar://problem/29779065

llvm-svn: 290324
2016-12-22 06:10:41 +00:00
Adrian Prantl 49797ca6be Refactor the DIExpression fragment query interface (NFC)
... so it becomes available to DIExpressionCursor.

llvm-svn: 290322
2016-12-22 05:27:12 +00:00
Matt Arsenault 485dacd90c DAG: Add helper for testing constant values
There are helpers for testing for constant or constant build_vector,
and for splat ConstantFP vectors, but not for a constantfp or
non-splat ConstantFP vector.

llvm-svn: 290317
2016-12-22 04:39:45 +00:00
Matt Arsenault 3de76b9dc8 AMDGPU: Fix missing commute table entries for cmpx
No tests because these aren't currently used anywhere.

llvm-svn: 290316
2016-12-22 04:39:41 +00:00
Mehdi Amini 9d3248b765 [ThinLTO] Save 8B per summary entry by rearranging the fields (NFC)
Size goes from 72B to 64B per entry.

Differential Revision: https://reviews.llvm.org/D27970

llvm-svn: 290314
2016-12-22 04:09:29 +00:00
Matt Arsenault e7d8ed32f9 AMDGPU: Swap order of operands in fadd/fsub combine
FMA is canonicalized to constant in the middle operand. Do
the same so fmad matches and avoid an extra combine step.

llvm-svn: 290313
2016-12-22 04:03:40 +00:00
Matt Arsenault 46e6b7adef AMDGPU: Check fast math flags in fadd/fsub combines
llvm-svn: 290312
2016-12-22 04:03:35 +00:00
Matt Arsenault 770ec8680a AMDGPU: Form more FMAs if fusion is allowed
Extend the existing fadd/fsub->fmad combines to produce
FMA if allowed.

llvm-svn: 290311
2016-12-22 03:55:35 +00:00
Matt Arsenault d8b73d5304 AMDGPU: Move combines into separate functions
llvm-svn: 290309
2016-12-22 03:44:42 +00:00
Matt Arsenault ef82ad94ea AMDGPU: Enable some f32 fadd/fsub combines for f16
llvm-svn: 290308
2016-12-22 03:40:39 +00:00
Matt Arsenault 9e22bc2cd3 AMDGPU: Implement isFMAFasterThanFMulAndFAdd for f16
llvm-svn: 290307
2016-12-22 03:21:48 +00:00
Matt Arsenault 2920f62423 AMDGPU: setcc test cleanup
llvm-svn: 290306
2016-12-22 03:21:45 +00:00
Matt Arsenault cdff21b14e AMDGPU: Allow rcp and rsq usage with f16
llvm-svn: 290302
2016-12-22 03:05:44 +00:00
Matt Arsenault 4052a576c0 AMDGPU: Custom lower f16 fdiv
llvm-svn: 290301
2016-12-22 03:05:41 +00:00
Matt Arsenault ce84130f85 AMDGPU: Implement f16 fcanonicalize
llvm-svn: 290300
2016-12-22 03:05:37 +00:00
Matt Arsenault 4e55c1ec11 AMDGPU: Update isFPImmLegal for f16
I don't think this matters because ConstantFP is legal.

llvm-svn: 290299
2016-12-22 03:05:30 +00:00
Peter Collingbourne 704f814a5e Clear the PendingTypeTests vector after moving from it.
This is to put the vector into a well defined state. Apparently the state of a
vector after being moved from is valid but unspecified. Found with clang-tidy.

llvm-svn: 290298
2016-12-22 02:52:23 +00:00
Haicheng Wu 9ac20a1e10 [AArch64] Correct the check of signed 9-bit imm in getIndexedAddressParts().
-256 is a legal indexed address part.

Differential Revision: https://reviews.llvm.org/D27537

llvm-svn: 290296
2016-12-22 01:39:24 +00:00
Easwaran Raman 180bd9f6b3 Pass GetAssumptionCache to InlineFunctionInfo constructor
Differential revision: https://reviews.llvm.org/D28038

llvm-svn: 290295
2016-12-22 01:07:01 +00:00
David Majnemer 5fa7d48bb8 [NVVMIntrRange] Only set range metadata if none is already present
The range metadata inserted by NVVMIntrRange is pessimistic, range
metadata already present could be more precise.

llvm-svn: 290294
2016-12-22 00:51:59 +00:00
Adrian Prantl 1eadba1c8c Renumber testcase metadata nodes after r290153.
This patch renumbers the metadata nodes in debug info testcases after
https://reviews.llvm.org/D26769. This is a separate patch because it
causes so much churn. This was implemented with a python script that
pipes the testcases through llvm-as - | llvm-dis - and then goes
through the original and new output side-by side to insert all
comments at a close-enough location.

Differential Revision: https://reviews.llvm.org/D27765

llvm-svn: 290292
2016-12-22 00:45:21 +00:00
Adrian Prantl 58c1910642 [LLParser] Make the line field of DIMacro(File) optional.
Otherwise these records do not survive roundtrips.

llvm-svn: 290291
2016-12-22 00:29:00 +00:00
Adrian Prantl ec9ebba778 Legalize metadata in legacy testcases
llvm-svn: 290288
2016-12-21 23:38:17 +00:00
Adrian Prantl 762e4b72c6 Legalize metadata in legacy testcases
llvm-svn: 290287
2016-12-21 23:36:06 +00:00
Adrian Prantl aad5df484c Legalize metadata in legacy testcases
llvm-svn: 290286
2016-12-21 23:30:35 +00:00
Adrian Prantl b767f31290 Legalize metadata in legacy testcases
llvm-svn: 290285
2016-12-21 23:28:49 +00:00
Ahmed Bougacha 36f7035bd7 [GlobalISel] Add basic Selector-emitter tblgen backend.
This adds a basic tablegen backend that analyzes the SelectionDAG
patterns to find simple ones that are eligible for GlobalISel-emission.

That's similar to FastISel, with one notable difference: we're not fed
ISD opcodes, so we need to map the SDNode operators to generic opcodes.
That's done using GINodeEquiv in TargetGlobalISel.td.

Otherwise, this is mostly boilerplate, and lots of filtering of any kind
of "complicated" pattern. On AArch64, this is sufficient to match G_ADD
up to s64 (to ADDWrr/ADDXrr) and G_BR (to B).

Differential Revision: https://reviews.llvm.org/D26878

llvm-svn: 290284
2016-12-21 23:26:20 +00:00
Ahmed Bougacha aa9fe53278 [AsmWriter] Remove redundant cast<>s. NFC.
llvm-svn: 290283
2016-12-21 23:26:13 +00:00
Dan Gohman a2b9b349e7 [WebAssembly] Fix the opcode value for i64.rotr.
llvm-svn: 290281
2016-12-21 23:09:42 +00:00
Peter Collingbourne 1b4137a7f9 IR: Function summary representation for type tests.
Each function summary has an attached list of type identifier GUIDs. The
idea is that during the regular LTO phase we would match these GUIDs to type
identifiers defined by the regular LTO module and store the resolutions in
a top-level "type identifier summary" (which will be implemented separately).

Differential Revision: https://reviews.llvm.org/D27967

llvm-svn: 290280
2016-12-21 23:03:45 +00:00
Mike Aizatsky bfe5045b9c [sancov] skip duplicated points
llvm-svn: 290278
2016-12-21 22:10:01 +00:00
Mike Aizatsky 987f6420ac [sancov] hash prefix results in huge merge files, use shorter prefix
llvm-svn: 290277
2016-12-21 22:09:57 +00:00
Haicheng Wu 6bb0e39321 [AArch64] Remove a redundant check. NFC.
The case AM.Scale == 0 is already handled by the code right above.

Differential Revision: https://reviews.llvm.org/D28003

llvm-svn: 290275
2016-12-21 21:40:47 +00:00
Greg Clayton 78a07bfa66 Add the ability for DWARFDie objects to get the parent DWARFDie.
In order for the llvm DWARF parser to be used in LLDB we will need to be able to get the parent of a DIE. This patch adds that functionality by changing the DWARFDebugInfoEntry class to store a depth field instead of a sibling index. Using a depth field allows us to easily calculate the sibling and the parent without increasing the size of DWARFDebugInfoEntry.

I tested llvm-dsymutil on a debug version of clang where this fully parses DWARF in over 1200 .o files to verify there was no serious regression in performance.

Added a full suite of unit tests to test this functionality.

Differential Revision: https://reviews.llvm.org/D27995

llvm-svn: 290274
2016-12-21 21:37:06 +00:00
Justin Bogner c11760d4ed cmake: Don't build llvm-config and tblgen concurrently in cross builds
This sets USES_TERMINAL for the native llvm-config build, so that it
doesn't run at the same time as builds of other native tools (namely,
tablegen). Without this, if you're very unlucky with the timing it's
possible to be relinking libSupport as one of the tools is linking,
causing a spurious failure.

The tablegen build adopted USES_TERMINAL for this same reason in
r280748.

llvm-svn: 290271
2016-12-21 21:19:00 +00:00
Ed Maste 084062803e Update mailing list post URL and add libunwind reference
RTDyldMemoryManager.cpp describes the differing __register_frame
API between libunwind and libgcc, with a mailing list posting URL.

The original link was 404; replace it with what I believe is the
intended post, as well as a reference to the "OS X" implementation in
libunwind.

Differential Revision:	https://reviews.llvm.org/D27965

llvm-svn: 290269
2016-12-21 20:51:42 +00:00
Simon Pilgrim 081abbb164 [X86][SSE] Improve lowering of vXi64 multiplies
As mentioned on PR30845, we were performing our vXi64 multiplication as:

AloBlo = pmuludq(a, b);
AloBhi = pmuludq(a, psrlqi(b, 32));
AhiBlo = pmuludq(psrlqi(a, 32), b);
return AloBlo + psllqi(AloBhi, 32)+ psllqi(AhiBlo, 32);

when we could avoid one of the upper shifts with:

AloBlo = pmuludq(a, b);
AloBhi = pmuludq(a, psrlqi(b, 32));
AhiBlo = pmuludq(psrlqi(a, 32), b);
return AloBlo + psllqi(AloBhi + AhiBlo, 32);

This matches the lowering on gcc/icc.

Differential Revision: https://reviews.llvm.org/D27756

llvm-svn: 290267
2016-12-21 20:00:10 +00:00
David Majnemer b0761a0c1b Revert "[InstCombine] New opportunities for FoldAndOfICmp and FoldXorOfICmp"
This reverts commit r289813, it caused PR31449.

llvm-svn: 290266
2016-12-21 19:21:59 +00:00
Tom Stellard d8ea85aced AMDGPU/SI: Fix file header
llvm-svn: 290265
2016-12-21 19:06:24 +00:00
Peter Collingbourne 35f3f7cdc7 TypeMetadataUtils: Simplify; spotted by Mehdi.
llvm-svn: 290264
2016-12-21 19:00:47 +00:00
Zachary Turner ab266cf95b Add missing includes on Windows.
Patch by Andrey Khalyavin
Differential Revision: https://reviews.llvm.org/D27915

llvm-svn: 290263
2016-12-21 18:50:52 +00:00
Michael Kuperstein 88f15eedbb [LLParser] Parse vector GEP constant expression correctly
The constantexpr parsing was too constrained and rejected legal vector GEPs.
This relaxes it to be similar to the ones for instruction parsing.

This fixes PR30816.

Differential Revision: https://reviews.llvm.org/D28013

llvm-svn: 290261
2016-12-21 18:29:47 +00:00
Michael Kuperstein dd92c78669 [ConstantFolding] Fix vector GEPs harder
For vector GEPs, CastGEPIndices can end up in an infinite recursion, because
we compare the vector type to the scalar pointer type, find them different,
and then try to cast a type to itself.

Differential Revision: https://reviews.llvm.org/D28009

llvm-svn: 290260
2016-12-21 17:34:21 +00:00