Commit Graph

41882 Commits

Author SHA1 Message Date
Dylan McKay 97cf837b46 [AVR] Optimize 16-bit ANDs with '1'
Summary: Fixes PR 31345

Reviewers: dylanmckay

Subscribers: fhahn, llvm-commits

Differential Revision: https://reviews.llvm.org/D28186

llvm-svn: 290778
2016-12-31 01:07:14 +00:00
Craig Topper d00db69227 [InstCombine][AVX-512] Teach InstCombine that llvm.x86.avx512.vcomi.sd and llvm.x86.avx512.vcomi.ss don't use the upper elements of their input.
This was already done for the SSE/SSE2 version of the intrinsics.

llvm-svn: 290776
2016-12-31 00:45:06 +00:00
Simon Pilgrim c5fde8d748 [X86][AVX512DQ] Add truncated math tests for AVX512DQ.
llvm-svn: 290772
2016-12-30 22:43:41 +00:00
Simon Pilgrim 85af973506 [X86][SSE] Fix truncated math test names.
Inconsistent naming convention and wrong name for some input/output types.

llvm-svn: 290771
2016-12-30 22:40:32 +00:00
Simon Pilgrim 712374169d [X86][AVX512] Regenerate test - missing shuffle comments
llvm-svn: 290770
2016-12-30 22:31:33 +00:00
Piotr Padlewski da36215017 [MemDep] Handle gep with zeros for invariant.group
Summary:
gep 0, 0 is equivalent to bitcast. LLVM canonicalizes it
to getelementptr because it make SROA can then handle it.

Simple case like

    void g(A &a) {
        z(a);
        if (glob)
            a.foo();
    }
    void testG() {
        A a;
        g(a);
    }

was not devirtualized with -fstrict-vtable-pointers because luck of
handling for gep 0 in Memory Dependence Analysis

Reviewers: dberlin, nlewycky, chandlerc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28126

llvm-svn: 290763
2016-12-30 18:45:07 +00:00
Mehdi Amini e2770c0b80 Fix test change in r290736: restore index generation
I remove one extra line, but because annoyingly llvm-lit does not
clean the output directory before running the test, it didn't fail
locally (the file was present from a previous run).

llvm-svn: 290740
2016-12-30 01:15:50 +00:00
Michael Kuperstein 76e06c8858 [LICM] When promoting scalars, allow inserting stores to thread-local allocas.
This is similar to the allocfn case - if an alloca is not captured, then it's
necessarily thread-local.

Differential Revision: https://reviews.llvm.org/D28170

llvm-svn: 290738
2016-12-30 01:03:17 +00:00
Dehao Chen cc76344ef5 Use continuous boosting factor for complete unroll.
Summary:
The current loop complete unroll algorithm checks if unrolling complete will reduce the runtime by a certain percentage. If yes, it will apply a fixed boosting factor to the threshold (by discounting cost). The problem for this approach is that the threshold abruptly. This patch makes the boosting factor a function of runtime reduction percentage, capped by a fixed threshold. In this way, the threshold changes continuously.

The patch also simplified the code by reducing one parameter in UP.

The patch only affects code-gen of two speccpu2006 benchmark:

445.gobmk binary size decreases 0.08%, no performance change.
464.h264ref binary size increases 0.24%, no performance change.

Reviewers: mzolotukhin, chandlerc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26989

llvm-svn: 290737
2016-12-30 00:50:28 +00:00
Mehdi Amini 30a9b6bb4e Replace test from using llvm-lto to use llvm-link (NFC)
Some incoming changes in ThinLTO will break this test.
Instead of relying on the heuristic to import, we
force the importing to happen with llvm-link.

llvm-svn: 290736
2016-12-30 00:45:26 +00:00
Dylan McKay 453d042969 [AVR] Optimize 16-bit ORs with '0'
Summary: Fixes PR 31344

Authored by Anmol P. Paralkar

Reviewers: dylanmckay

Subscribers: fhahn, llvm-commits

Differential Revision: https://reviews.llvm.org/D28121

llvm-svn: 290732
2016-12-30 00:21:56 +00:00
Daniel Berlin e0bd37e78f NewGVN: Fix PR 31491 by ensuring that we touch the right instructions. Change to one based numbering so we can assert we don't cause the same bug again.
llvm-svn: 290724
2016-12-29 22:15:12 +00:00
Reid Kleckner cd46c1df80 Revert "[COFF] Use 32-bit jump table entries in .rdata for Win64"
This reverts commit r290694. It broke sanitizer tests on Win64. I'll
probably bring this back, but the jump tables will just live in .text
like they do for MSVC.

llvm-svn: 290714
2016-12-29 17:07:10 +00:00
Sanjoy Das 00d76a5754 [TBAAVerifier] Be stricter around verifying scalar nodes
This fixes the issue exposed in PR31393, where we weren't trying
sufficiently hard to diagnose bad TBAA metadata.

This does reduce the variety in the error messages we print out, but I
think the tradeoff of verifying more, simply and quickly overrules the
need for more helpful error messags here.

llvm-svn: 290713
2016-12-29 15:47:05 +00:00
Artem Tamazov 25478d821b [AMDGPU][mc] Enable absolute expressions in .hsa_code_object_isa directive
Among other stuff, this allows to use predefined .option.machine_version_major
/minor/stepping symbols in the directive.

Relevant test expanded at once (also file renamed for clarity).

Differential Revision: https://reviews.llvm.org/D28140

llvm-svn: 290710
2016-12-29 15:41:52 +00:00
Igor Laevsky 4f31e52f94 Introduce element-wise atomic memcpy intrinsic
This change adds a new intrinsic which is intended to provide memcpy functionality
with additional atomicity guarantees. Please refer to the review thread
or language reference for further details.

Differential Revision: https://reviews.llvm.org/D27133

llvm-svn: 290708
2016-12-29 14:31:07 +00:00
Craig Topper b57a84dace [InstCombine] Fix some of the AVX-512 scalar arithmetic test cases to do a better job of testing what they intended to test.
The accidentally had trivially dead code. Also needed to adjust the rounding mode to not CUR_DIRECTION so the intrinsics don't get converted to native operations before going through SimplifyDemandedVectorElts.

llvm-svn: 290702
2016-12-29 02:29:04 +00:00
Reid Kleckner c9e0a153cf [COFF] Use 32-bit jump table entries in .rdata for Win64
Summary:
We were already using 32-bit jump table entries, but this was a
consequence of the default PIC model on Win64, and not an intentional
design decision. This patch ensures that we always use 32-bit label
difference jump table entries on Win64 regardless of the PIC model. This
is a good idea because it saves executable size and object file size.

Moving the jump tables to .rdata cleans up the disassembled object code
and reduces the available ROP targets, but it requires adding one more
RIP-relative lea to the code.  COFF doesn't have relocations to express
the difference between two arbitrary symbols, so we can't use the jump
table label in the label difference like we do elsewhere.

Fixes PR31488

Reviewers: majnemer, compnerd

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28141

llvm-svn: 290694
2016-12-29 00:12:39 +00:00
Mehdi Amini e98f925834 Add an index for Module Metadata record in the bitcode
This index record the position for each metadata record in
the bitcode, so that the reader will be able to lazy-load
on demand each individual record.

We also make sure that every abbrev is emitted upfront so
that the block can be skipped while reading.

I don't plan to commit this before having the reader
counterpart, but I figured this can be reviewed mostly
independently.

Recommit r290684 (was reverted in r290686 because a test
was broken) after adding a threshold to avoid emitting
the index when unnecessary (little amount of metadata).
This optimization "hides" a limitation of the ability
to backpatch in the bitstream: we can only backpatch
safely when the position has been flushed. So if we emit
an index for one metadata, it is possible that (part of)
the offset placeholder hasn't been flushed and the backpatch
will fail.

Differential Revision: https://reviews.llvm.org/D28083

llvm-svn: 290690
2016-12-28 22:30:28 +00:00
Saleem Abdulrasool 2b59eca1f7 Revert "Add an index for Module Metadata record in the bitcode"
This reverts commit a0ca6ae2d38339e4ede0dfa588086fc23d87e836.  Revert at
Mehdi's request as it is breaking bots.

llvm-svn: 290686
2016-12-28 20:37:22 +00:00
Mehdi Amini 32ca148198 Add an index for Module Metadata record in the bitcode
Summary:
This index record the position for each metadata record in
the bitcode, so that the reader will be able to lazy-load
on demand each individual record.

We also make sure that every abbrev is emitted upfront so
that the block can be skipped while reading.

I don't plan to commit this before having the reader
counterpart, but I figured this can be reviewed mostly
independently.

Reviewers: pcc, tejohnson

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28083

llvm-svn: 290684
2016-12-28 19:44:19 +00:00
Reid Kleckner 92647369fc [WinEH] Don't assume endFunction is called while in .text
Jump table emission can switch to .rdata before
WinException::endFunction gets called. Just remember the appropriate
text section we started in and reset back to it when we end the
function. We were already switching sections back from .xdata anyway.

Fixes the first problem in PR31488, so that now COFF switch tables can
live in .rdata if we want them to.

llvm-svn: 290678
2016-12-28 19:05:12 +00:00
Chandler Carruth 05ca5acc9e [PM] Introduce a devirtualization iteration layer for the new PM.
This is an orthogonal and separated layer instead of being embedded
inside the pass manager. While it adds a small amount of complexity, it
is fairly minimal and the composability and control seems worth the
cost.

The logic for this ends up being nicely isolated and targeted. It should
be easy to experiment with different iteration strategies wrapped around
the CGSCC bottom-up walk using this kind of facility.

The mechanism used to track devirtualization is the simplest one I came
up with. I think it handles most of the cases the existing iteration
machinery handles, but I haven't done a *very* in depth analysis. It
does however match the basic intended semantics, and we can tweak or
tune its exact behavior incrementally as necessary. One thing that we
may want to revisit is freshly building the value handle set on each
iteration. While I don't think this will be a significant cost (it is
strictly fewer value handles but more churn of value handes than the old
call graph), it is conceivable that we'll want a somewhat more clever
tracking mechanism. My hope is to layer that on as a follow up patch
with data supporting any implementation complexity it adds.

This code also provides for a basic count heuristic: if the number of
indirect calls decreases and the number of direct calls increases for
a given function in the SCC, we assume devirtualization is responsible.
This matches the heuristics currently used in the legacy pass manager.

Differential Revision: https://reviews.llvm.org/D23114

llvm-svn: 290665
2016-12-28 11:07:33 +00:00
Chandler Carruth 443e57e01d [PM] Teach the CGSCC's CG update utility to more carefully invalidate
analyses when we're about to break apart an SCC.

We can't wait until after breaking apart the SCC to invalidate things:
1) Which SCC do we then invalidate? All of them?
2) Even if we invalidate all of them, a newly created SCC may not have
   a proxy that will convey the invalidation to functions!

Previously we only invalidated one of the SCCs and too late. This led to
stale analyses remaining in the cache. And because the caching strategy
actually works, they would get used and chaos would ensue.

Doing invalidation early is somewhat pessimizing though if we *know*
that the SCC structure won't change. So it turns out that the design to
make the mutation API force the caller to know the *kind* of mutation in
advance was indeed 100% correct and we didn't do enough of it. So this
change also splits two cases of switching a call edge to a ref edge into
two separate APIs so that callers can clearly test for this and take the
easy path without invalidating when appropriate. This is particularly
important in this case as we expect most inlines to be between functions
in separate SCCs and so the common case is that we don't have to so
aggressively invalidate analyses.

The LCG API change in turn needed some basic cleanups and better testing
in its unittest. No interesting functionality changed there other than
more coverage of the returned sequence of SCCs.

While this seems like an obvious improvement over the current state, I'd
like to revisit the core concept of invalidating within the CG-update
layer at all. I'm wondering if we would be better served forcing the
callers to handle the invalidation beforehand in the cases that they
can handle it. An interesting example is when we want to teach the
inliner to *update and preserve* analyses. But we can cross that bridge
when we get there.

With this patch, the new pass manager an build all of the LLVM test
suite at -O3 and everything passes. =D I haven't bootstrapped yet and
I'm sure there are still plenty of bugs, but this gives a nice baseline
so I'm going to increasingly focus on fleshing out the missing
functionality, especially the bits that are just turned off right now in
order to let us establish this baseline.

llvm-svn: 290664
2016-12-28 10:34:50 +00:00
Gadi Haber 19c4fc5e62 This is a large patch for X86 AVX-512 of an optimization for reducing code size by encoding EVEX AVX-512 instructions using the shorter VEX encoding when possible.
There are cases of AVX-512 instructions that have two possible encodings. This is the case with instructions that use vector registers with low indexes of 0 - 15 and do not use the zmm registers or the mask k registers.
The EVEX encoding prefix requires 4 bytes whereas the VEX prefix can take only up to 3 bytes. Consequently, using the VEX encoding for these instructions results in a code size reduction of ~2 bytes even though it is compiled with the AVX-512 features enabled.

Reviewers: Craig Topper, Zvi Rackoover, Elena Demikhovsky 
Differential Revision: https://reviews.llvm.org/D27901

llvm-svn: 290663
2016-12-28 10:12:48 +00:00
Chandler Carruth 9900d18bab [PM] Teach the inliner's call graph update to handle inserting new edges
when they are call edges at the leaf but may (transitively) be reached
via ref edges.

It turns out there is a simple rule: insert everything as a ref edge
which is a safe conservative default. Then we let the existing update
logic handle promoting some of those to call edges.

Note that it would be fairly cheap to make these call edges right away
if that is desirable by testing whether there is some existing call path
from the source to the target. It just seemed like slightly more
complexity in this code path that isn't strictly necessary. If anyone
feels strongly about handling this differently I'm happy to change it.

llvm-svn: 290649
2016-12-28 03:13:12 +00:00
Chandler Carruth 69c5cc69ed [PM] Actually commit the test update that was supposed to accompany
r290644. Sorry for this.

llvm-svn: 290646
2016-12-28 02:31:24 +00:00
Michael Kuperstein cd7ad7130f [InstCombine] Canonicalize insert splat sequences into an insert + shuffle
This adds a combine that canonicalizes a chain of inserts which broadcasts
a value into a single insert + a splat shufflevector.

This fixes PR31286.

Differential Revision: https://reviews.llvm.org/D27992

llvm-svn: 290641
2016-12-28 00:18:08 +00:00
Kostya Serebryany f24e52c0c2 [sanitizer-coverage] sort the switch cases
llvm-svn: 290628
2016-12-27 21:20:06 +00:00
Chandler Carruth e14524ca30 [PM] Teach MemDep to invalidate its result object when its cached
analysis handles become invalid.

Add a test case for its invalidation logic.

llvm-svn: 290620
2016-12-27 19:33:04 +00:00
Saleem Abdulrasool 0ce0dc250c test: modernise ARM CodeGen tests
Replace the use of grep with FileCheck.  Tidy up some of the tests.  A
few of the tests have been left as weak as previously, though some have
been made more stringent.

llvm-svn: 290616
2016-12-27 18:35:19 +00:00
Chad Rosier b1ea99a956 Attempt to make the Windows bots green after r290609.
llvm-svn: 290613
2016-12-27 18:02:27 +00:00
Chandler Carruth 7a73eabf64 [PM] Add more dedicated testing to cover the invalidation logic added to
BasicAA in r290603.

I've kept the basic testing in the new PM test file as that also covers
the AAManager invalidation logic. If/when there is a good place for
broader AA testing it could move there.

This test is somewhat unsatisfying as I can't get it to fail even with
ASan outside of explicit checks of the invalidation. Apparently we don't
yet have any test coverage of the BasicAA code paths using either the
domtree or loopinfo -- I made both of them always be null and check-llvm
passed.

llvm-svn: 290612
2016-12-27 17:59:22 +00:00
Bryant Wong 7cb744621b [MemCpyOpt] Don't sink LoadInst below possible clobber.
Differential Revision: https://reviews.llvm.org/D26811

llvm-svn: 290611
2016-12-27 17:58:12 +00:00
Teresa Johnson e0ee5cf7c8 [ThinLTO] Fix "||" vs "|" mixup.
The effect of the bug was that we would incorrectly create summaries
for global and weak values defined in module asm (since we were
essentially testing for bit 1 which is SF_Undefined, and the
RecordStreamer ignores local undefined references). This would have
resulted in conservatively disabling importing of anything referencing
globals and weaks defined in module asm. Added these cases to the test
which now fails without this bug fix.

Fixes PR31459.

llvm-svn: 290610
2016-12-27 17:45:09 +00:00
Chad Rosier 2ff37b8615 [AArch64][AsmParser] Add support for parsing shift/extend operands with symbols.
Differential Revision: https://reviews.llvm.org/D27953

llvm-svn: 290609
2016-12-27 16:58:09 +00:00
Artem Tamazov a01cce8887 [AMDGPU][llvm-mc] Predefined symbols to access register counts (.kernel.{v|s}gpr_count)
The feature allows for conditional assembly, filling the entries
of .amd_kernel_code_t etc.

Symbols are defined with value 0 at the beginning of each kernel scope.
After each register usage, the respective symbol is set to:
	value = max( value, ( register index + 1 ) )
Thus, at the end of scope the value represents a count of used registers.

Kernel scopes begin at .amdgpu_hsa_kernel directive, end at the
next .amdgpu_hsa_kernel (or EOF, whichever comes first). There is also
dummy scope that lies from the beginning of source file til the
first .amdgpu_hsa_kernel.

Test added.

Differential Revision: https://reviews.llvm.org/D27859

llvm-svn: 290608
2016-12-27 16:00:11 +00:00
Chandler Carruth aa35167578 [PM] Teach BasicAA how to invalidate its result object.
This requires custom handling because BasicAA caches handles to other
analyses and so it needs to trigger indirect invalidation.

This fixes one of the common crashes when using the new PM in real
pipelines. I've also tweaked a regression test to check that we are at
least handling the most immediate case.

I'm going to work at re-structuring this test some to both scale better
(rather than all being in one file) and check more invalidation paths in
a follow-up commit, but I wanted to get the basic bug fix in place.

llvm-svn: 290603
2016-12-27 10:30:45 +00:00
Chandler Carruth 81c8edaf5c [PM] Disable more of the loop passes -- LCSSA and LoopSimplify are also
not really wired into the loop pass manager in a way that will let us
productively use these passes yet.

This lets the new PM get farther in basic testing which is useful for
establishing a good baseline of "doesn't explode". There are still
plenty of crashers in basic testing though, this just gets rid of some
noise that is well understood and not representing a specific or narrow
bug.

llvm-svn: 290601
2016-12-27 10:16:46 +00:00
Sam Kolton e66365e07d [AMDGPU] Assembler: support SDWA and DPP for VOP2b instructions
Reviewers: nhaustov, artem.tamazov, vpykhtin, tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D28051

llvm-svn: 290599
2016-12-27 10:06:42 +00:00
Eugene Leviant 920908352a RuntimeDyldELF: add R_AARCH64_ADD_ABS_LO12_NC reloc
Differential revision: https://reviews.llvm.org/D28115

llvm-svn: 290598
2016-12-27 09:51:38 +00:00
Chandler Carruth 17c630a09c [PM] Teach the AAManager and AAResults layer (the worst offender for
inter-analysis dependencies) to use the new invalidation infrastructure.

This teaches it to invalidate itself when any of the peer function
AA results that it uses become invalid. We do this by just tracking the
originating IDs. I've kept it in a somewhat clunky API since some users
of AAResults are outside the new PM right now. We can clean this API up
if/when those users go away.

Secondly, it uses the registration on the outer analysis manager proxy
to trigger deferred invalidation when a module analysis result becomes
invalid.

I've included test cases that specifically try to trigger use-after-free
in both of these cases and they would crash or hang pretty horribly for
me even without ASan. Now they work nicely.

The `InvalidateAnalysis` utility pass required some tweaking to be
useful in this context and it still is pretty garbage. I'd like to
switch it back to the previous implementation and teach the explicit
invalidate method on the AnalysisManager to take care of correctly
triggering indirect invalidation, but I wanted to go ahead and send this
out so folks could see how all of this stuff works together in practice.
And, you know, that it does actually work. =]

Differential Revision: https://reviews.llvm.org/D27205

llvm-svn: 290595
2016-12-27 08:44:39 +00:00
Chandler Carruth 625038d5d5 [PM] Turn on the new PM's inliner in addition to the current one for
most of the inliner test cases.

The inliner involves a bunch of interesting code and tends to be where
most of the issues I've seen experimenting with the new PM lie. All of
these test cases pass, but I'd like to keep some more thorough coverage
here so doing a fairly blanket enabling.

There are a handful of interesting tests I've not enabled yet because
they're focused on the always inliner, or on functionality that doesn't
(yet) exist in the inliner.

llvm-svn: 290592
2016-12-27 07:18:43 +00:00
Craig Topper e77e901130 [AVX-512] Add all forms of VPALIGNR, VALIGND, and VALIGNQ to the load folding tables.
llvm-svn: 290591
2016-12-27 06:51:09 +00:00
Chandler Carruth 141bf5d14d [PM] Add one of the features left out of the initial inliner patch:
skipping indirectly recursive inline chains.

To do this, we implicitly build an inline stack for each callsite and
check prior to inlining that doing so would not form a cycle. This uses
the exact same technique and even shares some code with the legacy PM
inliner.

This solution remains deeply unsatisfying to me because it means we
cannot actually iterate the inliner externally. Doing so would not be
able to easily detect and avoid such cycles. Some day I would very much
like to have a solution that works without this internal state to detect
cycles, but this is not that day.

llvm-svn: 290590
2016-12-27 06:46:20 +00:00
Chandler Carruth db6ced8484 [PM] Wire up another test to the new pass manager.
Nothing really interesting here, but I had to improve the test to use
variables rather than hard coding value names as we happen to end up
with different value names in the new PM.

llvm-svn: 290589
2016-12-27 06:46:16 +00:00
George Burgess IV ed16024a9b [Analysis] Ignore `nobuiltin` on `allocsize` function calls.
We currently ignore the `allocsize` attribute on functions calls with
the `nobuiltin` attribute when trying to lower `@llvm.objectsize`. We
shouldn't care about `nobuiltin` here: `allocsize` is explicitly added
by the user, not inferred based on a function's symbol.

llvm-svn: 290588
2016-12-27 06:32:14 +00:00
Craig Topper 2da265b7bf [AVX-512] Remove masked pmuldq and pmuludq intrinsics and autoupgrade them to unmasked intrinsics plus a select.
llvm-svn: 290583
2016-12-27 05:30:14 +00:00
Craig Topper 72f2d4e8d6 [InstCombine][X86] Add DemandedElts support for 512-bit PMULDQ/PMULUDQ instructions
PMULDQ/PMULUDQ vXi64 instructions only use the even numbered v2Xi32 input elements which SimplifyDemandedVectorElts should try and use.

This builds on r290554 which added supported for 128 and 256-bit.

llvm-svn: 290582
2016-12-27 05:30:09 +00:00
Craig Topper 89b3e0223f [AVX-512] Add 512-bit unmasked intrinsics for pmuldq and pmuludq so we can add them to InstCombine with the 128 and 256 bit versions.
The 128 and 256 bit masked intrinsics are currently unused by clang. The sse and avx2 unmasked intrinsics are used instead. The new 512-bit intrinsic will be used to do the same. Then all masked versions will removed and autoupgraded.

llvm-svn: 290573
2016-12-27 03:46:05 +00:00
Chandler Carruth 03130d981c [PM] Teach the inliner in the new PM to merge attributes after inlining.
Also enable the new PM in the attributes test case which caught this
issue.

llvm-svn: 290572
2016-12-27 03:39:54 +00:00
Chandler Carruth 62c8b81ea8 [Inliner] Modernize all of the inliner tests that were using grep.
This mostly involved converting from grep to FileCheck and tidying up
the IR used.

In one case (invoke_test-3.ll) the test had become completely pointless
as we use 'resume' rather than 'unwind' now, and even then it did not
occur at the end of the line.

llvm-svn: 290570
2016-12-27 02:47:37 +00:00
Craig Topper 7f8540b5e7 [AVX-512][InstCombine] Teach InstCombine to turn masked scalar add/sub/mul/div with rounding intrinsics into normal IR operations if the rounding mode is CUR_DIRECTION.
An earlier commit added support for unmasked scalar operations. At that time isel wouldn't generate an optimal sequence for masked operations, but that has now been fixed.

llvm-svn: 290566
2016-12-27 01:56:30 +00:00
Craig Topper a0439377e6 [InstCombine][AVX-512] Add masked scalar add/sub/mul/div intrinsic test cases that don't have a CUR_DIRECTION rounding mode.
The CUR_DIRECTION case will be optimized in a future commit so this provides coverage for the other cases.

llvm-svn: 290565
2016-12-27 01:56:27 +00:00
Craig Topper 83f2145c18 [AVX-512] Add isel patterns to turn native masked scalar add/sub/mul/div into masked instructions.
llvm-svn: 290564
2016-12-27 01:56:24 +00:00
Craig Topper 5035b1212b [AVX-512] Add tests to show missed opportunities for combining masking with scalar arithmetic operations.
These particular sequences will be generated after a future change to teach InstCombine to turn masked scalar arithmetic intrinsics into native IR.

llvm-svn: 290563
2016-12-27 01:56:22 +00:00
Chandler Carruth 0ee8bb11c3 [PM] Move the collection of call sites to a more appropriate place
inside of `InlineFunction`. Prior to this, call instructions are
specifically being rewritten and replaced within the inlined region,
invalidating some of the call sites.

Several of these regions are using the same technique to walk the
inlined region so this seems clearly safe up to this point.

I've also added a short circuit to the scan for call sites based on what
other code is doing.

With this, the most common crash I've found in the new inliner code is
fixed. I've turned it on for another test case that covers this
scenario.

I'll make my way through most of the other inliner test cases
just to get some easy coverage next.

llvm-svn: 290562
2016-12-27 01:24:50 +00:00
Craig Topper 020b228155 [AVX-512][InstCombine] Teach InstCombine to turn packed add/sub/mul/div with rounding intrinsics into normal IR operations if the rounding mode is CUR_DIRECTION.
llvm-svn: 290559
2016-12-27 00:23:16 +00:00
Chandler Carruth 6e9bb7e064 [PM] Teach the always inliner in the new pass manager to support
removing fully-dead comdats without removing dead entries in comdats
with live members.

This factors the core logic out of the current inliner's internals to
a reusable utility and leverages that in both places. The factored out
code should also be (minorly) more efficient in cases where we have very
few dead functions or dead comdats to consider.

I've added a test case to cover this behavior of the always inliner.
This is the last significant bug in the new PM's always inliner I've
found (so far).

llvm-svn: 290557
2016-12-26 23:43:27 +00:00
Simon Pilgrim c9cf7fc7a4 [InstCombine][X86] Add DemandedElts support for PMULDQ/PMULUDQ instructions
PMULDQ/PMULUDQ vXi64 instructions only use the even numbered v2Xi32 input elements which SimplifyDemandedVectorElts should try and use.

Differential Revision: https://reviews.llvm.org/D28119

llvm-svn: 290554
2016-12-26 23:28:17 +00:00
Daniel Berlin d59e8010c5 Don't use our own incorrect version of isTriviallyDeadInstruction in NewGVN. Fixes PR/31472
llvm-svn: 290549
2016-12-26 18:44:36 +00:00
Davide Italiano 8ea5e4fcae [NewGVN] Change test to reflect difference between GVN and NewGVN.
The current GVN algorithm folds unconditional branches to, it claims,
expose more PRE oportunities. The folding, if really needed,
(which is not sure, as it's not really proved it improves analysis)
can be done by an earlier cleanup pass instead of GVN itself.
Ack'ed/SGTM'd by Daniel Berlin.

Differential Revision:  https://reviews.llvm.org/D28117

llvm-svn: 290546
2016-12-26 18:10:09 +00:00
Simon Pilgrim e8a5ab35ca [X86][AVX512] Added v64i8 reverse shuffle test (PR31470)
llvm-svn: 290544
2016-12-26 17:38:58 +00:00
Bryant Wong b5e03b61e2 [InstCombiner] Simplify lib calls to `round{,f}`
Differential Revision: https://reviews.llvm.org/D28110

llvm-svn: 290542
2016-12-26 14:29:29 +00:00
Chandler Carruth 80db76d556 Test the different scenarios of GlobalDCE and comdats more
systematically and document in the test what all is going on.

This replaces the PR-named test that was the only coverage for GlobalDCE
and comdats previously. I wrote this because I wasn't certain how
comdat DCE was supposed to work and wanted to step through what
GlobalDCE did to fully understand it. After talking to folks and reading
the code and really staring at things it all makes sense but it seemed
good to help write down some of this in a more explicit and fully
covering test case.

For example, it seemed like a bug that GlobalDCE didn't consider comdat
participation of ifuncs. Specifically it seemed like an accident because
testing didn't really cover that case. But in fact, ifuncs specifically
cannot participate in a comdat despite having that API. The new test
case covers this and explicitly documents that DCE gets to fire here
even though there are comdats involved.

Also, we didn't have any positive tests for the challenging cases such
as usage cycles between comdat participants that might make them seem
alive except that there is no external edge into the cycle.

llvm-svn: 290537
2016-12-26 08:54:01 +00:00
Craig Topper 7b788ada2d [AVX-512][InstCombine] Teach InstCombine to turn scalar add/sub/mul/div with rounding intrinsics into normal IR operations if the rounding mode is CUR_DIRECTION.
Summary:
I only do this for unmasked cases for now because isel is failing to fold the mask. I'll try to fix that soon.

I'll do the same thing for packed add/sub/mul/div in a future patch.

Reviewers: delena, RKSimon, zvi, craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27879

llvm-svn: 290535
2016-12-26 06:33:19 +00:00
Craig Topper e328045711 [AVX-512][InstCombine] Teach InstCombine to converted masked vpermv intrinsics into shufflevector instructions
Summary:
This patch adds support for converting the masked vpermv intrinsics into shufflevector instructions if the indices are constants.

We also need to wrap a select instruction around the shuffle to take care of the masking part. InstCombine will take care of optimizing the select if the mask is constant so I didn't bother checking for that.

Reviewers: zvi, delena, spatel, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27825

llvm-svn: 290530
2016-12-25 23:58:57 +00:00
Bryant Wong a07d9b1460 [AliasAnalysis] Teach BasicAA about memcpy.
Differential Revision: https://reviews.llvm.org/D27034

llvm-svn: 290526
2016-12-25 22:42:27 +00:00
Daniel Berlin d7c12ee54c Value number stores and memory states so we can detect when memory states are equivalent (IE store of same value to memory).
Reviewers: davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28084

llvm-svn: 290525
2016-12-25 22:23:49 +00:00
Amjad Aboud 7faeecc8f7 [DebugInfo] Added support for Checksum debug info feature.
Differential Revision: https://reviews.llvm.org/D27642

llvm-svn: 290514
2016-12-25 10:12:09 +00:00
Simon Pilgrim 3265d951b6 [InstCombine][X86] Add tests showing missed opportunities to simplify PMULUDQ/PMULDQ inputs.
PMULUDQ/PMULDQ - only the even elements (0, 2, 4, 6) of the vXi32 inputs are required.

llvm-svn: 290502
2016-12-24 17:30:19 +00:00
Chandler Carruth cdfdd4330a [PM] Remove a bunch of junk that snuck in when I failed at manipulating
my editor to close and commit the patch. Sorry for the noise.

llvm-svn: 290460
2016-12-23 23:39:31 +00:00
Chandler Carruth 4eaff12ba2 [PM] Teach the always inlining test case to be much more strict about
whether functions are removed, and fix the new PM's always inliner to
actually pass this test.

Without this, the new PM's always inliner leaves all the functions
kicking around which won't work out very well given the semantics of
always inline.

Doing this really highlights how frustrating the current alwaysinline
semantic contract is though -- why can we put it on *external*
functions, etc?

Also I've added a number of tricky and interesting test cases for
removing functions with the always inliner. There is one remaining case
not handled -- fully removing comdats -- and I've left a FIXME about
this.

llvm-svn: 290457
2016-12-23 23:33:35 +00:00
Chandler Carruth f32f63f222 [PM] Clean up test case and comments a bit. NFC.
llvm-svn: 290456
2016-12-23 23:33:32 +00:00
Chandler Carruth 060ad61fbe [PM] Add support for building a default AA pipeline to the PassBuilder.
Pretty boring and lame as-is but necessary. This is definitely a place
we'll end up with extension hooks longer term. =]

Differential Revision: https://reviews.llvm.org/D28076

llvm-svn: 290449
2016-12-23 20:38:19 +00:00
Davide Italiano 34f94384a5 [LICM] Work around LICM needs to maintain state across loops.
The pass creates some state which expects to be cleaned up by
a later instance of the same pass. opt-bisect happens to expose
this not ideal design because calling skipLoop() will result in
this state not being cleaned up at times and an assertion firing
in `doFinalization()`. Chandler tells me the new pass manager will
give us options to avoid these design traps, but until it's not ready,
we need a workaround for the current pass infrastructure. Fix provided
by Andy Kaylor, see the review for a complete discussion.

Differential Revision:  https://reviews.llvm.org/D25848

llvm-svn: 290427
2016-12-23 13:12:50 +00:00
Florian Hahn 898127fe36 Revert r290423 because it broke the sanitizer-x86_64-linux-autoconf buildbot.
llvm-svn: 290425
2016-12-23 12:26:11 +00:00
Florian Hahn 1d6b1a7b79 [framelowering] Skip dbg values when getting next/previous instruction.
Summary:
In mergeSPUpdates, debug values need to be ignored when getting the
previous element, otherwise debug data could have an impact on codegen.

In eliminateCallFramePseudoInstr, debug values after the erased element
could have an impact on codegen and should be skipped.

Closes PR31319 (https://llvm.org/bugs/show_bug.cgi?id=31319)

Reviewers: mkuper, MatzeB, aprantl

Subscribers: gbedwell, llvm-commits

Differential Revision: https://reviews.llvm.org/D27688

llvm-svn: 290423
2016-12-23 11:35:00 +00:00
Zijiao Ma bf6007bd1b Make the canonicalisation on shifts benifit to more case.
1.Fix pessimized case in FIXME.
2.Add tests for it.
3.The canonicalisation on shifts results in different sequence for
  tests of machine-licm.Correct some check lines.

Differential Revision: https://reviews.llvm.org/D27916

llvm-svn: 290410
2016-12-23 02:56:07 +00:00
Chandler Carruth eb119ece4a Fix some DOS-style line endings that I suspect snuck in from one of the
frustrating Subversion clients that fails to do line ending translation
of text files.

llvm-svn: 290404
2016-12-23 02:02:26 +00:00
Sanjoy Das 9a129807f3 Reimplement depedency tracking in the ImplicitNullChecks pass
Summary:
This change rewrites a core component in the ImplicitNullChecks pass for
greater simplicity since the original design was over-complicated for no
good reason.  Please review this as essentially a new pass.  The change
is almost NFC and I've added a test case for a scenario that this new
code handles that wasn't handled earlier.

The implicit null check pass, at its core, is a code hoisting transform.
It differs from "normal" code transforms in that it speculates
potentially faulting instructions (by design), but a lot of the usual
hazard detection logic (register read-after-write etc.) still applies.
We previously detected hazards by keeping track of registers defined and
used by machine instructions over an instruction range, but that was
unwieldy and did not actually confer any performance benefits.  The
intent was to have linear time complexity over the number of machine
instructions considered, but it ended up being N^2 is practice.

This new version is more obviously O(N^2) (with N capped to 8 by
default) in hazard detection.  It does not attempt to be clever in
tracking register uses or defs (the previous cleverness here was a
source of bugs).

Once this is checked in, I'll extract out the `IsSuitableMemoryOp` and
`CanHoistLoadInst` lambda into member functions (they're too complicated
to be inline lambdas) and do some other related NFC cleanups.

Reviewers: reames, anna, atrick

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D27592

llvm-svn: 290394
2016-12-23 00:41:21 +00:00
Quentin Colombet 3749f33888 [GlobalISel] More fix for the size vs. type typo. NFC.
I missed those in my previous commit (r290378).

llvm-svn: 290387
2016-12-22 22:50:34 +00:00
Chris Bieneman e0e451d927 [ObjectYAML] Support for DWARF debug_info section
This patch adds support for YAML<->DWARF for debug_info sections.

This re-lands r290147, reverted in 290148, re-landed in r290204 after fixing the issue that caused bots to fail (thank you UBSan!), and reverted again in r290209 due to failures on big endian systems.

After adding support for preserving endianness, this should be good now.

llvm-svn: 290386
2016-12-22 22:44:27 +00:00
Ahmed Bougacha 1277833aa6 [AArch64] Simplify indexed-memory testcase. NFC.
We're only testing the addressing mode on the stores; we don't
need to load/store pointers we can simply pass/return.

llvm-svn: 290385
2016-12-22 22:27:05 +00:00
Evgeniy Stepanov 27d4c9b71b [cfi] Emit jump tables as a function-level inline asm.
Use a dummy private function with inline asm calls instead of module
level asm blocks for CFI jumptables.

The main advantage is that now jumptable codegen can be affected by
the function attributes (like target_cpu on ARM). Module level asm
gets the default subtarget based on the target triple, which is often
not good enough.

This change also uses asm constraints/arguments to reference
jumptable targets and aliases directly. We no longer do asm name
mangling in an IR pass.

Differential Revision: https://reviews.llvm.org/D28012

llvm-svn: 290384
2016-12-22 22:22:35 +00:00
Chris Bieneman e477fb9591 [ObjectYAML] Fixing big endian bots from r290381
Bot URL:
http://lab.llvm.org:8011/builders/clang-s390x-linux/builds/2505

llvm-svn: 290383
2016-12-22 22:16:04 +00:00
Chris Bieneman 55de3a2449 [ObjectYAML] MachO support for endianness
This patch adds support to the macho<->yaml tools for preserving endianness in MachO structures and DWARF data.

llvm-svn: 290381
2016-12-22 21:58:03 +00:00
Quentin Colombet f372150f73 [AArch64] Change a test to use a generic instr instead of a target specific one.
Target specific instructions have requirements that are not compatible
with what we want to test here. Namely, target specific instructions
must have their operands properly mapped on register classes.

llvm-svn: 290379
2016-12-22 21:56:37 +00:00
Quentin Colombet e08cc599b8 [MIRParser] Fix a typo in comment and error message.
We have long switched from size to type.

llvm-svn: 290378
2016-12-22 21:56:35 +00:00
Quentin Colombet f38015e5fe [AArch64][CallLowering] Constraint registers on target specific instruction
The InstructionSelect pass will not look at target specific instructions
since they are already selected. As a result, the operands of target
specific instructions must be properly constrained, because it is not
going to fix them.

This fixes invalid register classes on call instruction.

llvm-svn: 290377
2016-12-22 21:56:31 +00:00
Quentin Colombet 9751e61fe1 [MIRParser] Non-generic virtual register may have a type.
When generic virtual registers get constrained, because of a use on a
target specific operation for instance, we end up with regular virtual
registers with a type and that's perfectly fine.

llvm-svn: 290376
2016-12-22 21:56:29 +00:00
Matt Arsenault 0b26e47345 AMDGPU: Invert cmp + select with constant
Canonicalize a select with a constant to the false side. This
enables more instruction shrinking opportunities since an
inline immediate can be used for the false side of v_cndmask_b32_e32.

This seems to usually be better but causes some code size regressions
in some tests.

llvm-svn: 290372
2016-12-22 21:40:08 +00:00
Tim Shen 53ddc1d0f4 [PowerPC] Add ppc support to update_llc_test_checks.py, and ppc tests. NFC.
Reviewers: chandlerc, hfinkel, echristo, iteratee

Subscribers: mehdi_amini, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D28036

llvm-svn: 290370
2016-12-22 20:59:39 +00:00
Wei Mi a2f0b594c2 Redo store splitting in CodeGenPrepare.
This is a succeeding patch of https://reviews.llvm.org/D22840 to address the
issue when a value to be merged into an int64 pair is in a different BB. Redoing
the store splitting in CodeGenPrepare so we can match the pattern across multiple
BBs and move some instructions into the same BB. We still keep the code in dag
combine so that we can catch cases that show up after DAG combining runs.

Differential Revision: https://reviews.llvm.org/D25914

llvm-svn: 290365
2016-12-22 19:44:45 +00:00
Petar Jovanovic 8a4e63994e [mips] Fix compact branch hazard detection, part 2
Follow up to D27209 fix, this patch now properly handles single transient
instruction in basic block.

Patch by Aleksandar Beserminji.

Differential Revision: https://reviews.llvm.org/D27856

llvm-svn: 290361
2016-12-22 19:29:50 +00:00
Krzysztof Parzyszek df24da221e Fix two bugs in the pipeliner in renaming phis in the prolog and epilog
When the pipeliner is renaming phi values, it may need to iterate through
the phi operands to check for other phis. However, the pipeliner should
stop once it reaches a phi that is outside the pipelined loop.

Also, when the generateExistingPhis code is unable to reuse an existing
phi, the default code that computes the PhiOp2 is only to be used when
the pipeliner is generating the kernel. Otherwise, the phi may be a value
computed earlier in the same epilog.

Patch by Brendon Cahoon.

llvm-svn: 290355
2016-12-22 18:49:55 +00:00
Matt Arsenault 941632839f AMDGPU: Use i16 for i16 shift amount
llvm-svn: 290351
2016-12-22 16:36:25 +00:00
Davide Italiano e05e3306a3 [NewGVN] Add the pass to PassRegistry.def.
We need to hook up here to get it working with the new PM.
Add a test while here (and remove a typo).

llvm-svn: 290350
2016-12-22 16:35:02 +00:00
Matt Arsenault 3c97e2030a AMDGPU: Fix missing 16-bit cmpx instructions
llvm-svn: 290349
2016-12-22 16:27:14 +00:00
Matt Arsenault 18f56be3d2 AMDGPU: Use i16 comparison instructions
llvm-svn: 290348
2016-12-22 16:27:11 +00:00
Davide Italiano 7e274e02ae [GVN] Initial check-in of a new global value numbering algorithm.
The code have been developed by Daniel Berlin over the years, and
the new implementation goal is that of addressing shortcomings of
the current GVN infrastructure, i.e. long compile time for large
testcases, lack of phi predication, no load/store value numbering
etc...

The current code just implements the "core" GVN algorithm, although
other pieces (load coercion, phi handling, predicate system) are
already implemented in a branch out of tree. Once the core is stable,
we'll start adding pieces on top of the base framework.
The test currently living in test/Transform/NewGVN are a copy
of the ones in GVN, with proper `XFAIL` (missing features in NewGVN).
A flag will be added in a future commit to enable NewGVN, so that
interested parties can exercise this code easily.

Differential Revision:  https://reviews.llvm.org/D26224

llvm-svn: 290346
2016-12-22 16:03:48 +00:00
Dan Gohman 728926ac59 [WebAssembly] Don't old negative load/store offsets in fast-isel.
WebAssembly's load/store offsets are unsigned and don't wrap, so it's not
valid to fold in a negative offset.

llvm-svn: 290342
2016-12-22 15:15:10 +00:00
Sam Kolton a6792a39c4 [AMDGPU] Disassembler: fix for disaasembling v_mac_f32/16_dpp/sdwa
Summary: Real instruction should copy constraints from real instruction. This allows auto-generated disassembler to correctly process tied operands.

Reviewers: nhaustov, vpykhtin, tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D27847

llvm-svn: 290336
2016-12-22 11:30:48 +00:00
Ayman Musa 9ff608cdc6 [X86][AVX2] Passing the appropriate memory operand class to VPMADDWD instruction.
Replacing the memory operand in the ymm version of VPMADDWD from i128mem to i256mem.

Differential Revision: https://reviews.llvm.org/D28024

llvm-svn: 290333
2016-12-22 08:42:46 +00:00
Chandler Carruth 0d1d49507b [PM] Loosen the check ever so slightly -- MSVC appears to not include
a space after the comma in template arguments with our hacky type name
system.

llvm-svn: 290331
2016-12-22 07:53:20 +00:00
Chandler Carruth ee6865f425 [PM] Make a couple of CHECK lines a bit more precise, NFC.
I was staring at these and didn't realize these were module-layer
proxies as opposed to some other layer. Justin and I have a plan to
rename things to make the names themselves much easier to reason about,
but I at least want the CHECK lines to be precise for now.

llvm-svn: 290328
2016-12-22 07:14:35 +00:00
Chandler Carruth e3f5064b72 [PM] Introduce a reasonable port of the main per-module pass pipeline
from the old pass manager in the new one.

I'm not trying to support (initially) the numerous options that are
currently available to customize the pass pipeline. If we end up really
wanting them, we can add them later, but I suspect many are no longer
interesting. The simplicity of omitting them will help a lot as we sort
out what the pipeline should look like in the new PM.

I've also documented to the best of my ability *why* each pass or group
of passes is used so that reading the pipeline is more helpful. In many
cases I think we have some questionable choices of ordering and I've
left FIXME comments in place so we know what to come back and revisit
going forward. But for now, I've left it as similar to the current
pipeline as I could.

Lastly, I've had to comment out several places where passes are not
ported to the new pass manager or where the loop pass infrastructure is
not yet ready. I did at least fix a few bugs in the loop pass
infrastructure uncovered by running the full pipeline, but I didn't want
to go too far in this patch -- I'll come back and re-enable these as the
infrastructure comes online. But I'd like to keep the comments in place
because I don't want to lose track of which passes need to be enabled
and where they go.

One thing that seemed like a significant API improvement was to require
that we don't build pipelines for O0. It seems to have no real benefit.

I've also switched back to returning pass managers by value as at this
API layer it feels much more natural to me for composition. But if
others disagree, I'm happy to go back to an output parameter.

I'm not 100% happy with the testing strategy currently, but it seems at
least OK. I may come back and try to refactor or otherwise improve this
in subsequent patches but I wanted to at least get a good starting point
in place.

Differential Revision: https://reviews.llvm.org/D28042

llvm-svn: 290325
2016-12-22 06:59:15 +00:00
Adrian Prantl 5542da4bbc Fix an assertion in DwarfExpression when emitting fragments in vector registers
When DwarfExpression is emitting a fragment that is located in a
register and that fragment is smaller than the register, and the
register must be composed from sub-registers (are you still with me?)
the last DW_OP_piece operation must not be larger than the size of the
fragment itself, since the last piece of the fragment could be smaller
than the last subregister that is being emitted.

rdar://problem/29779065

llvm-svn: 290324
2016-12-22 06:10:41 +00:00
Matt Arsenault e7d8ed32f9 AMDGPU: Swap order of operands in fadd/fsub combine
FMA is canonicalized to constant in the middle operand. Do
the same so fmad matches and avoid an extra combine step.

llvm-svn: 290313
2016-12-22 04:03:40 +00:00
Matt Arsenault 46e6b7adef AMDGPU: Check fast math flags in fadd/fsub combines
llvm-svn: 290312
2016-12-22 04:03:35 +00:00
Matt Arsenault 770ec8680a AMDGPU: Form more FMAs if fusion is allowed
Extend the existing fadd/fsub->fmad combines to produce
FMA if allowed.

llvm-svn: 290311
2016-12-22 03:55:35 +00:00
Matt Arsenault ef82ad94ea AMDGPU: Enable some f32 fadd/fsub combines for f16
llvm-svn: 290308
2016-12-22 03:40:39 +00:00
Matt Arsenault 9e22bc2cd3 AMDGPU: Implement isFMAFasterThanFMulAndFAdd for f16
llvm-svn: 290307
2016-12-22 03:21:48 +00:00
Matt Arsenault 2920f62423 AMDGPU: setcc test cleanup
llvm-svn: 290306
2016-12-22 03:21:45 +00:00
Matt Arsenault cdff21b14e AMDGPU: Allow rcp and rsq usage with f16
llvm-svn: 290302
2016-12-22 03:05:44 +00:00
Matt Arsenault 4052a576c0 AMDGPU: Custom lower f16 fdiv
llvm-svn: 290301
2016-12-22 03:05:41 +00:00
Matt Arsenault ce84130f85 AMDGPU: Implement f16 fcanonicalize
llvm-svn: 290300
2016-12-22 03:05:37 +00:00
Haicheng Wu 9ac20a1e10 [AArch64] Correct the check of signed 9-bit imm in getIndexedAddressParts().
-256 is a legal indexed address part.

Differential Revision: https://reviews.llvm.org/D27537

llvm-svn: 290296
2016-12-22 01:39:24 +00:00
David Majnemer 5fa7d48bb8 [NVVMIntrRange] Only set range metadata if none is already present
The range metadata inserted by NVVMIntrRange is pessimistic, range
metadata already present could be more precise.

llvm-svn: 290294
2016-12-22 00:51:59 +00:00
Adrian Prantl 1eadba1c8c Renumber testcase metadata nodes after r290153.
This patch renumbers the metadata nodes in debug info testcases after
https://reviews.llvm.org/D26769. This is a separate patch because it
causes so much churn. This was implemented with a python script that
pipes the testcases through llvm-as - | llvm-dis - and then goes
through the original and new output side-by side to insert all
comments at a close-enough location.

Differential Revision: https://reviews.llvm.org/D27765

llvm-svn: 290292
2016-12-22 00:45:21 +00:00
Adrian Prantl 58c1910642 [LLParser] Make the line field of DIMacro(File) optional.
Otherwise these records do not survive roundtrips.

llvm-svn: 290291
2016-12-22 00:29:00 +00:00
Adrian Prantl ec9ebba778 Legalize metadata in legacy testcases
llvm-svn: 290288
2016-12-21 23:38:17 +00:00
Adrian Prantl 762e4b72c6 Legalize metadata in legacy testcases
llvm-svn: 290287
2016-12-21 23:36:06 +00:00
Adrian Prantl aad5df484c Legalize metadata in legacy testcases
llvm-svn: 290286
2016-12-21 23:30:35 +00:00
Adrian Prantl b767f31290 Legalize metadata in legacy testcases
llvm-svn: 290285
2016-12-21 23:28:49 +00:00
Peter Collingbourne 1b4137a7f9 IR: Function summary representation for type tests.
Each function summary has an attached list of type identifier GUIDs. The
idea is that during the regular LTO phase we would match these GUIDs to type
identifiers defined by the regular LTO module and store the resolutions in
a top-level "type identifier summary" (which will be implemented separately).

Differential Revision: https://reviews.llvm.org/D27967

llvm-svn: 290280
2016-12-21 23:03:45 +00:00
Mike Aizatsky 987f6420ac [sancov] hash prefix results in huge merge files, use shorter prefix
llvm-svn: 290277
2016-12-21 22:09:57 +00:00
Simon Pilgrim 081abbb164 [X86][SSE] Improve lowering of vXi64 multiplies
As mentioned on PR30845, we were performing our vXi64 multiplication as:

AloBlo = pmuludq(a, b);
AloBhi = pmuludq(a, psrlqi(b, 32));
AhiBlo = pmuludq(psrlqi(a, 32), b);
return AloBlo + psllqi(AloBhi, 32)+ psllqi(AhiBlo, 32);

when we could avoid one of the upper shifts with:

AloBlo = pmuludq(a, b);
AloBhi = pmuludq(a, psrlqi(b, 32));
AhiBlo = pmuludq(psrlqi(a, 32), b);
return AloBlo + psllqi(AloBhi + AhiBlo, 32);

This matches the lowering on gcc/icc.

Differential Revision: https://reviews.llvm.org/D27756

llvm-svn: 290267
2016-12-21 20:00:10 +00:00
David Majnemer b0761a0c1b Revert "[InstCombine] New opportunities for FoldAndOfICmp and FoldXorOfICmp"
This reverts commit r289813, it caused PR31449.

llvm-svn: 290266
2016-12-21 19:21:59 +00:00
Michael Kuperstein 88f15eedbb [LLParser] Parse vector GEP constant expression correctly
The constantexpr parsing was too constrained and rejected legal vector GEPs.
This relaxes it to be similar to the ones for instruction parsing.

This fixes PR30816.

Differential Revision: https://reviews.llvm.org/D28013

llvm-svn: 290261
2016-12-21 18:29:47 +00:00
Michael Kuperstein dd92c78669 [ConstantFolding] Fix vector GEPs harder
For vector GEPs, CastGEPIndices can end up in an infinite recursion, because
we compare the vector type to the scalar pointer type, find them different,
and then try to cast a type to itself.

Differential Revision: https://reviews.llvm.org/D28009

llvm-svn: 290260
2016-12-21 17:34:21 +00:00
Elena Demikhovsky 7c7bf1b432 Added a template for building target specific memory node in DAG.
I added API for creation a target specific memory node in DAG. Today, all memory nodes are common for all targets and their constructors are located in SelectionDAG.cpp.
There are some cases in X86 where we need to create a special node - truncation-with-saturation store, float-to-half-store. 
In the current patch I added truncation-with-saturation nodes and I'm using them for intrinsics. In the future I plan to implement DAG lowering for truncation-with-saturation pattern.

Differential Revision: https://reviews.llvm.org/D27899

llvm-svn: 290250
2016-12-21 10:43:36 +00:00
Oren Ben Simhon cecc4af496 [X86] Vectorcall Calling Convention - Adding CodeGen Complete Support
Fixing failing test.

llvm-svn: 290246
2016-12-21 09:18:37 +00:00
Oren Ben Simhon 3b95157090 [X86] Vectorcall Calling Convention - Adding CodeGen Complete Support
The vectorcall calling convention specifies that arguments to functions are to be passed in registers, when possible.
vectorcall uses more registers for arguments than fastcall or the default x64 calling convention use. 
The vectorcall calling convention is only supported in native code on x86 and x64 processors that include Streaming SIMD Extensions 2 (SSE2) and above.

The current implementation does not handle Homogeneous Vector Aggregates (HVAs) correctly and this review attempts to fix it.
This aubmit also includes additional lit tests to cover better HVAs corner cases.

Differential Revision: https://reviews.llvm.org/D27392

llvm-svn: 290240
2016-12-21 08:31:45 +00:00
Adam Nemet 32e6a34c02 [LDist] Match behavior between invoking via optimization pipeline or opt -loop-distribute
In r267672, where the loop distribution pragma was introduced, I tried
it hard to keep the old behavior for opt: when opt is invoked
with -loop-distribute, it should distribute the loop (it's off by
default when ran via the optimization pipeline).

As MichaelZ has discovered this has the unintended consequence of
breaking a very common developer work-flow to reproduce compilations
using opt: First you print the pass pipeline of clang
with -debug-pass=Arguments and then invoking opt with the returned
arguments.

clang -debug-pass will include -loop-distribute but the pass is invoked
with default=off so nothing happens unless the loop carries the pragma.
While through opt (default=on) we will try to distribute all loops.

This changes opt's default to off as well to match clang.  The tests are
modified to explicitly enable the transformation.

llvm-svn: 290235
2016-12-21 04:07:40 +00:00
Sebastian Pop 1857800cb5 remove pretty-print test that requires debug
There is no need to test the pretty printer. Remove the boggus test to make the
build bots happy.

llvm-svn: 290234
2016-12-21 03:37:39 +00:00
Sebastian Pop 7779484313 machine combiner: fix pretty printer
we used to print UNKNOWN instructions when the instruction to be printer was not
yet inserted in any BB: in that case the pretty printer would not be able to
compute a TII as the instruction does not belong to any BB or function yet.
This patch explicitly passes the TII to the pretty-printer.

Differential Revision: https://reviews.llvm.org/D27645

llvm-svn: 290228
2016-12-21 01:41:12 +00:00
George Burgess IV 3f08914e7e [Analysis] Centralize objectsize lowering logic.
We're currently doing nearly the same thing for @llvm.objectsize in
three different places: two of them are missing checks for overflow,
and one of them could subtly break if InstCombine gets much smarter
about removing alloc sites. Seems like a good idea to not do that.

llvm-svn: 290214
2016-12-20 23:46:36 +00:00
Chris Bieneman abecaa2f8c Revert "[ObjectYAML] Support for DWARF debug_info section"
This reverts commit r290204.

Still breaking bots... In a meeting now, so I can't fix it immediately.

Bot URL:
http://lab.llvm.org:8011/builders/clang-s390x-linux/builds/2415

llvm-svn: 290209
2016-12-20 22:36:42 +00:00
Chris Bieneman ffc4aef542 [ObjectYAML] Support for DWARF debug_info section
This patch adds support for YAML<->DWARF for debug_info sections.

This re-lands r290147, after fixing the issue that caused bots to fail (thank you UBSan!).

llvm-svn: 290204
2016-12-20 21:35:31 +00:00
Peter Collingbourne 0c30f089d5 IR: Eliminate non-determinism in the module summary analysis.
Also make the summary ref and call graph vectors immutable. This means
a smaller API surface and fewer places to audit for non-determinism.

Differential Revision: https://reviews.llvm.org/D27875

llvm-svn: 290200
2016-12-20 21:12:28 +00:00
Eli Friedman d03df8145f [ARM] Implement isExtractSubvectorCheap.
See https://reviews.llvm.org/D6678 for the history of
isExtractSubvectorCheap. Essentially the same considerations apply
to ARM.

This temporarily breaks the formation of vpadd/vpaddl in certain cases;
AddCombineToVPADDL essentially assumes that we won't form VUZP shuffles.
See https://reviews.llvm.org/D27779 for followup fix.

Differential Revision: https://reviews.llvm.org/D27774

llvm-svn: 290198
2016-12-20 20:05:07 +00:00
Eli Friedman 7ce3d79986 [ARM] Generate checks for shuffle tests using update_llc_test_checks.py.
llvm-svn: 290196
2016-12-20 19:33:24 +00:00
Matt Arsenault 9e91014282 AMDGPU: Allow 16-bit types in inline asm constraints
llvm-svn: 290193
2016-12-20 19:06:12 +00:00
Matt Arsenault d1ceffcd5a AMDGPU: Run fp combine tests on VI
llvm-svn: 290192
2016-12-20 18:55:11 +00:00
Matt Arsenault 4c1e9ec008 AMDGPU: Don't add same instruction multiple times to worklist
When the instruction is processed the first time, it may be
deleted resulting in crashes. While the new test adds the same
user to the worklist twice, this particular case doesn't crash
but I'm not sure why.

llvm-svn: 290191
2016-12-20 18:55:06 +00:00
Tom Stellard 6f9ef14b9d AMDGPU/SI: Add a MachineMemOperand when lowering llvm.amdgcn.buffer.load.*
Reviewers: arsenm, nhaehnle, mareko

Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D27834

llvm-svn: 290184
2016-12-20 17:19:44 +00:00
Tom Stellard 244891d129 AMDGPU/SI: Add a MachineMemOperand to MIMG instructions
Summary:
Without a MachineMemOperand, the scheduler was assuming MIMG instructions
were ordered memory references, so no loads or stores could be reordered
across them.

Reviewers: arsenm

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D27536

llvm-svn: 290179
2016-12-20 15:52:17 +00:00
Chandler Carruth 1d96311447 [PM] Provide an initial, minimal port of the inliner to the new pass manager.
This doesn't implement *every* feature of the existing inliner, but
tries to implement the most important ones for building a functional
optimization pipeline and beginning to sort out bugs, regressions, and
other problems.

Notable, but intentional omissions:
- No alloca merging support. Why? Because it isn't clear we want to do
  this at all. Active discussion and investigation is going on to remove
  it, so for simplicity I omitted it.
- No support for trying to iterate on "internally" devirtualized calls.
  Why? Because it adds what I suspect is inappropriate coupling for
  little or no benefit. We will have an outer iteration system that
  tracks devirtualization including that from function passes and
  iterates already. We should improve that rather than approximate it
  here.
- Optimization remarks. Why? Purely to make the patch smaller, no other
  reason at all.

The last one I'll probably work on almost immediately. But I wanted to
skip it in the initial patch to try to focus the change as much as
possible as there is already a lot of code moving around and both of
these *could* be skipped without really disrupting the core logic.

A summary of the different things happening here:

1) Adding the usual new PM class and rigging.

2) Fixing minor underlying assumptions in the inline cost analysis or
   inline logic that don't generally hold in the new PM world.

3) Adding the core pass logic which is in essence a loop over the calls
   in the nodes in the call graph. This is a bit duplicated from the old
   inliner, but only a handful of lines could realistically be shared.
   (I tried at first, and it really didn't help anything.) All told,
   this is only about 100 lines of code, and most of that is the
   mechanics of wiring up analyses from the new PM world.

4) Updating the LazyCallGraph (in the new PM) based on the *newly
   inlined* calls and references. This is very minimal because we cannot
   form cycles.

5) When inlining removes the last use of a function, eagerly nuking the
   body of the function so that any "one use remaining" inline cost
   heuristics are immediately refined, and queuing these functions to be
   completely deleted once inlining is complete and the call graph
   updated to reflect that they have become dead.

6) After all the inlining for a particular function, updating the
   LazyCallGraph and the CGSCC pass manager to reflect the
   function-local simplifications that are done immediately and
   internally by the inline utilties. These are the exact same
   fundamental set of CG updates done by arbitrary function passes.

7) Adding a bunch of test cases to specifically target CGSCC and other
   subtle aspects in the new PM world.

Many thanks to the careful review from Easwaran and Sanjoy and others!

Differential Revision: https://reviews.llvm.org/D24226

llvm-svn: 290161
2016-12-20 03:15:32 +00:00
Adrian Prantl bceaaa9643 [IR] Remove the DIExpression field from DIGlobalVariable.
This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.

Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:

(1) The DIGlobalVariable should describe the source level variable,
    not how to get to its location.

(2) It makes it unsafe/hard to update the expressions when we call
    replaceExpression on the DIGLobalVariable.

(3) It makes it impossible to represent a global variable that is in
    more than one location (e.g., a variable with multiple
    DW_OP_LLVM_fragment-s).  We also moved away from attaching the
    DIExpression to DILocalVariable for the same reasons.

This reapplies r289902 with additional testcase upgrades and a change
to the Bitcode record for DIGlobalVariable, that makes upgrading the
old format unambiguous also for variables without DIExpressions.

<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769

llvm-svn: 290153
2016-12-20 02:09:43 +00:00
Chris Bieneman 891cbcc093 Revert "[ObjectYAML] Support for DWARF debug_info section"
This reverts commit r290147.

This commit is breaking a bot (http://lab.llvm.org:8011/builders/clang-atom-d525-fedora-rel/builds/621). I don't have time to investigate at the moment, so I'll revert for now.

llvm-svn: 290148
2016-12-20 00:42:06 +00:00
Chris Bieneman b5b0b23a25 [ObjectYAML] Support for DWARF debug_info section
This patch adds support for YAML<->DWARF for debug_info sections.

llvm-svn: 290147
2016-12-20 00:26:24 +00:00
Eli Friedman 1a9a887a29 Add ARM support to update_llc_test_checks.py
Just the minimal support to get it working at the moment.

Includes checks for test/CodeGen/ARM/vzip.ll as an example.

Differential Revision: https://reviews.llvm.org/D27829

llvm-svn: 290144
2016-12-19 23:09:51 +00:00
Chris Bieneman d9430944f4 [ObjectYAML] Support for DWARF Pub Sections
This patch adds support for YAML<->DWARF round tripping for pub* section data. The patch supports both GNU and non-GNU style entries.

llvm-svn: 290139
2016-12-19 22:22:12 +00:00
Sanjay Patel 5a443ac000 [InstCombine] use commutative matcher for pattern with commutative operators
This is a case that was missed in:
https://reviews.llvm.org/rL290067
...and it would regress if we fix operand complexity (PR28296).

llvm-svn: 290127
2016-12-19 18:35:37 +00:00
Sanjay Patel dd46b52942 [InstCombine] add folds for icmp (umin|umax X, Y), X
This is a follow-up to:
https://reviews.llvm.org/rL289855 (https://reviews.llvm.org/D27531)
https://reviews.llvm.org/rL290111

llvm-svn: 290118
2016-12-19 17:32:37 +00:00
Florian Hahn 2e03213f90 [LoopVersioning] Require loop-simplify form for loop versioning.
Summary:
Requiring loop-simplify form for loop versioning ensures that the
runtime check block always dominates the exit block.
    
This patch closes #30958 (https://llvm.org/bugs/show_bug.cgi?id=30958).

Reviewers: silviu.baranga, hfinkel, anemet, ashutosh.nema

Subscribers: ashutosh.nema, mzolotukhin, efriedma, hfinkel, llvm-commits

Differential Revision: https://reviews.llvm.org/D27469

llvm-svn: 290116
2016-12-19 17:13:37 +00:00
Konstantin Zhuravlyov 980688cdaf [AMDGPU] When unifying metadata, add operands to named metadata individually
Differential Revision: https://reviews.llvm.org/D27725

llvm-svn: 290114
2016-12-19 16:54:24 +00:00
Sanjay Patel 8296c6c96f [InstCombine] add folds for icmp (smax X, Y), X
This is a follow-up to:
https://reviews.llvm.org/rL289855 (D27531)

llvm-svn: 290111
2016-12-19 16:28:53 +00:00
Diana Picus 3bae486b7f [ARM] GlobalISel: Add more checks to test
llvm-svn: 290108
2016-12-19 14:08:11 +00:00
Diana Picus 58265c4788 [ARM] GlobalISel: Minor style fixup in test
llvm-svn: 290107
2016-12-19 14:08:06 +00:00
Diana Picus 97ae95c3a8 [ARM] GlobalISel: Lower i8 and i16 register args
This allows lowering i8 and i16 arguments if they can fit in the registers. Note
that the lowering is incomplete - ABI extensions are handled in a subsequent
patch.

(Last part of)
Differential Revision: https://reviews.llvm.org/D27704

llvm-svn: 290106
2016-12-19 14:08:02 +00:00
Diana Picus 5a724452a0 [ARM] GlobalISel: Allow i8 and i16 adds
Teach the instruction selector and legalizer that it's ok to have adds with 8 or
16-bit integers.

This is the second part of https://reviews.llvm.org/D27704

llvm-svn: 290105
2016-12-19 14:07:56 +00:00
Diana Picus 36aa09fa3c [ARM] GlobalISel: Select i8 and i16 copies
Teach the instruction selector that it's ok to copy small values from physical
registers.

First part of https://reviews.llvm.org/D27704

llvm-svn: 290104
2016-12-19 14:07:50 +00:00
Diana Picus 1437f6d710 [ARM] GlobalISel: Lower more than 4 arguments
This adds support for lowering more than 4 arguments (although still i32 only).
It uses the handleAssignments / ValueHandler infrastructure extracted from
the AArch64 backend in r288658.

Differential Revision: https://reviews.llvm.org/D27195

llvm-svn: 290098
2016-12-19 11:55:41 +00:00
Sam Kolton 69c8aa26d8 AMDGPU: [AMDGPU] Assembler: add .hsa_code_object_metadata directive for functime metadata V2.0
Summary:
Added pair of directives .hsa_code_object_metadata/.end_hsa_code_object_metadata.
Between them user can put YAML string that would be directly put to the generated note. E.g.:
'''
.hsa_code_object_metadata
    {
        amd.MDVersion: [ 2, 0 ]
    }
.end_hsa_code_object_metadata
'''
Based on D25046

Reviewers: vpykhtin, nhaustov, yaxunl, tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, mgorny, tony-tye

Differential Revision: https://reviews.llvm.org/D27619

llvm-svn: 290097
2016-12-19 11:43:15 +00:00
Diana Picus 519807f7be [ARM] GlobalISel: Support loading from the stack
Add support for selecting simple G_LOAD and G_FRAME_INDEX instructions (32-bit
scalars only). This will be useful for functions that need to pass arguments on
the stack.

First part of https://reviews.llvm.org/D27195.

llvm-svn: 290096
2016-12-19 11:26:31 +00:00
Dean Michael Berris 03b8be575e [XRay] Fix assertion failure on empty machine basic blocks (PR 31424)
The original version of the code in XRayInstrumentation.cpp assumed that
functions may not have empty machine basic blocks (or that the first one
couldn't be). This change addresses that by special-casing that specific
situation.

We provide two .mir test-cases to make sure we're handling this
appropriately.

Fixes llvm.org/PR31424.

Reviewers: chandlerc

Subscribers: varno, llvm-commits

Differential Revision: https://reviews.llvm.org/D27913

llvm-svn: 290091
2016-12-19 09:20:38 +00:00
Daniel Jasper f5123fecfe Add files I seem to have dropped in my revert (r290086).
Sorry!

llvm-svn: 290087
2016-12-19 08:32:13 +00:00
Daniel Jasper aec2fa352f Revert @llvm.assume with operator bundles (r289755-r289757)
This creates non-linear behavior in the inliner (see more details in
r289755's commit thread).

llvm-svn: 290086
2016-12-19 08:22:17 +00:00
Tom de Vries 601d5bafb2 [FileCheck] Fix --strict-whitespace --match-full-lines -- add test-case
Add test-case that was missing in "[FileCheck] Fix --strict-whitespace
--match-full-lines" commit.

llvm-svn: 290070
2016-12-18 21:04:47 +00:00
Sanjay Patel 2b9d4b4daf [InstCombine] use commutative matchers for patterns with commutative operators
Background/motivation - I was circling back around to:
https://llvm.org/bugs/show_bug.cgi?id=28296

I made a simple patch for that and noticed some regressions, so added test cases for
those with rL281055, and this is hopefully the minimal fix for just those cases.

But as you can see from the surrounding untouched folds, we are missing commuted patterns
all over the place, and of course there are no regression tests to cover any of those cases.

We could sprinkle "m_c_" dust all over this file and catch most of the missing folds, but 
then we still wouldn't have test coverage, and we'd still miss some fraction of commuted 
patterns because they require adjustments to the match order.

I'm aware of the concern about the potential compile-time performance impact of adding 
matches like this (currently being discussed on llvm-dev), but I don't think there's any
evidence yet to suggest that handling commutative pattern matching more thoroughly is not
a worthwhile goal of InstCombine.

Differential Revision: https://reviews.llvm.org/D24419

llvm-svn: 290067
2016-12-18 18:49:48 +00:00
Daniel Jasper 373f9a6a0c Revert r289955 and r289962. This is causing lots of ASAN failures for us.
Not sure whether it causes and ASAN false positive or whether it
actually leads to incorrect code or whether it even exposes bad code.
Hans, I'll get you instructions to reproduce this.

llvm-svn: 290066
2016-12-18 14:36:38 +00:00
Michael Zuckerman 4b88a770ef [X86] [AVX512] Minor fix in encoding of scalar EVEX instructions. NFC.
Commit on behalf of Gadi Haber  

Removed EVEX_V512 prefix from scalar EVEX instructions since HW ignores L'L bits anyway (LIG). 4 instructions are modified.
The changed encodings are validated with XED.
Rviewers: delena, igorb

Differential revision: https://reviews.llvm.org/D27802

llvm-svn: 290065
2016-12-18 14:29:00 +00:00
Simon Pilgrim e940daf532 [X86][SSE] Add support for combining target shuffles to SHUFPS.
As discussed on D27692, the next step will be to allow cross-domain shuffles once the combined shuffle depth passes a certain point.

llvm-svn: 290064
2016-12-18 14:26:02 +00:00
Craig Topper 7029db0eaa [X86][SSE][AVX-512] Convert FAND/FOR/FXOR/FANDN nodes to integer operations if they are available. This will allow a bunch of patterns to be removed.
These nodes are only emitted for lowering FABS/FNEG/FNABS/FCOPYSIGN. Ideally we just wouldn't create these nodes if SSE2 or higher is available, but it was simple to just convert them in DAG combine.

For SSE2, AVX, and AVX512 with DQI this is no functional change as the execution domain fixing pass ensures the right domain is selected regardless of the ISD opcode.

For AVX-512 without DQI we end up using integer instructions since the floating point versions aren't available. But we were already doing that for any logical operations in code that didn't come from FABS/FNEG/FNABS/FCOPYSIGN so this seems no worse. And we get the benefit of being able to fold broadcasts now.

llvm-svn: 290060
2016-12-18 07:54:23 +00:00
Craig Topper add9cc697a [AVX-512] Use EVEX encoded XOR instruction for zeroing scalar registers when DQI and VLX instructions are available.
This can give the register allocator more registers to use.

llvm-svn: 290057
2016-12-18 06:23:14 +00:00
Craig Topper 2baef8f466 [AVX-512] Make sure VLX is also enabled before using EVEX encoded logic ops for scalars. I missed this in r290049.
llvm-svn: 290055
2016-12-18 04:17:00 +00:00
Matt Arsenault 3aaf11fbd4 AMDGPU: Fix broken check prefix in test
llvm-svn: 290050
2016-12-17 20:03:59 +00:00
Craig Topper d3295c6a3a [AVX-512] Use EVEX encoded logic operations for scalar types when they are available. This gives the register allocator more registers to work with.
llvm-svn: 290049
2016-12-17 19:26:00 +00:00
Craig Topper 81b021e7c0 [AVX-512] Update scalar logic test to show missed opportunity to use EVEX encoded logic instructions to get more registers to use.
llvm-svn: 290048
2016-12-17 19:25:55 +00:00
Matthias Braun 15b56e6973 Revert "AArch64CollectLOH: Rewrite as block-local analysis."
It is still breaking Chrome. http://llvm.org/PR31361

This reverts commit r290026.

llvm-svn: 290047
2016-12-17 18:53:11 +00:00
Matthias Braun 9439d3dc2c Move test to correct directory
See also test/CodeGen/MIR/README

llvm-svn: 290032
2016-12-17 02:16:26 +00:00
Evgeniy Stepanov 95294127d0 Revert "[GVNHoist] Move GVNHoist to function simplification part of pipeline."
This reverts r289696, which caused TSan perf regression.

See PR31382.

llvm-svn: 290030
2016-12-17 01:53:15 +00:00
Matthias Braun e813cf457a AArch64CollectLOH: Rewrite as block-local analysis.
Re-apply r288561: Liveness tracking should be correct now after r290014.

Previously this pass was using up to 5% compile time in some cases which
is a bit much for what it is doing. The pass featured a full blown
data-flow analysis which in the default configuration was restricted to a
single block.

This rewrites the pass under the assumption that we only ever work on a
single block. This is done in a single pass maintaining a state machine
per general purpose register to catch LOH patterns.

Differential Revision: https://reviews.llvm.org/D27329

llvm-svn: 290026
2016-12-17 01:15:59 +00:00
Mike Aizatsky f07f9f8b5f [sancov] skip dead files from computations
Differential Revision: https://reviews.llvm.org/D27863

llvm-svn: 290017
2016-12-17 00:11:48 +00:00
Matthias Braun 76bb4139dc AArch64: Enable post-ra liveness updates
Differential Revision: https://reviews.llvm.org/D27559

llvm-svn: 290014
2016-12-16 23:55:43 +00:00
Paul Robinson 2dfb688214 Allow "line 0" to be the first explicit debug location in a function.
Feedback on r289468 from Adrian Prantl.

llvm-svn: 290012
2016-12-16 23:54:33 +00:00
Kevin Enderby 59343a9429 Fix a bugs with using some Mach-O command line flags like "-arch armv7m".
The Mach-O command line flag like "-arch armv7m" does not match the
arch name part of its llvm Triple which is "thumbv7m-apple-darwin”.

I think the best way to fix this is to have
llvm::object::MachOObjectFile::getArchTriple() optionally return the
name of the Mach-O arch flag that would be used with -arch that
matches the CPUType and CPUSubType.  Then change
llvm::object::MachOUniversalBinary::ObjectForArch::getArchTypeName()
to use that and change it to getArchFlagName() as the type name is
really part of the Triple and the -arch flag name is a Mach-O thing
for a specific Triple with a specific Mcpu value.

rdar://29663637

llvm-svn: 290001
2016-12-16 22:54:02 +00:00
Zachary Turner 46225b193f Resubmit "[CodeView] Hook CodeViewRecordIO for reading/writing symbols."
The original patch was broken due to some undefined behavior
as well as warnings that were triggering -Werror.

llvm-svn: 290000
2016-12-16 22:48:14 +00:00
Teresa Johnson a61f5e3796 [ThinLTO] Import composite types as declarations
Summary:
When reading the metadata bitcode, create a type declaration when
possible for composite types when we are importing. Doing this in
the bitcode reader saves memory. Also it works naturally in the case
when the type ODR map contains a definition for the same composite type
because it was used in the importing module (buildODRType will
automatically use the existing definition and not create a type
declaration).

For Chromium built with -g2, this reduces the aggregate size of the
generated native object files by 66% (from 31G to 10G). It reduced
the time through the ThinLTO link and backend phases by about 20% on
my machine.

Reviewers: mehdi_amini, dblaikie, aprantl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27775

llvm-svn: 289993
2016-12-16 21:25:01 +00:00
Michael Kuperstein 3ca147ea3d Preserve loop metadata when folding branches to a common destination.
Differential Revision: https://reviews.llvm.org/D27830

llvm-svn: 289992
2016-12-16 21:23:59 +00:00
Jun Bum Lim 90b6b5074a [CodeGenPrep] Skip merging empty case blocks
This is recommit of r287553 after fixing the invalid loop info after eliminating an empty block and unit test failures in AVR and WebAssembly :

Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting.

Reviewers: t.p.northover, mcrosier, manmanren, wmi, joerg, davidxl

Subscribers: joerg, qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D22696

llvm-svn: 289988
2016-12-16 20:38:39 +00:00
Adrian Prantl 73ec065604 Revert "[IR] Remove the DIExpression field from DIGlobalVariable."
This reverts commit 289920 (again).
I forgot to implement a Bitcode upgrade for the case where a DIGlobalVariable
has not DIExpression. Unfortunately it is not possible to safely upgrade
these variables without adding a flag to the bitcode record indicating which
version they are.
My plan of record is to roll the planned follow-up patch that adds a
unit: field to DIGlobalVariable into this patch before recomitting.
This way we only need one Bitcode upgrade for both changes (with a
version flag in the bitcode record to safely distinguish the record
formats).

Sorry for the churn!

llvm-svn: 289982
2016-12-16 19:39:01 +00:00
Zachary Turner d0fffd1d14 Revert "[CodeView] Hook CodeViewRecordIO for reading/writing symbols."
This reverts commit r289978, which is failing due to some rebase/merge
issues.

llvm-svn: 289981
2016-12-16 19:25:23 +00:00
Zachary Turner a4e7dfbc16 [CodeView] Hook CodeViewRecordIO for reading/writing symbols.
This is the 3rd of 3 patches to get reading and writing of
CodeView symbol and type records to use a single codepath.

Differential Revision: https://reviews.llvm.org/D26427

llvm-svn: 289978
2016-12-16 19:20:35 +00:00
Mehdi Amini 8662305bae Strip invalid TBAA when reading bitcode
This ensures backward compatibility on bitcode loading.

Differential Revision: https://reviews.llvm.org/D27839

llvm-svn: 289977
2016-12-16 19:16:29 +00:00
Matthew Simpson a4964f291a Reapply "[LV] Enable vectorization of loops with conditional stores by default"
This patch reapplies r289863. The original patch was reverted because it
exposed a bug causing the loop vectorizer to crash in the Python runtime on
PPC. The underlying issue was fixed with r289958.

llvm-svn: 289975
2016-12-16 19:12:02 +00:00
Sanjoy Das 089c699743 Fix CodeGenPrepare::stripInvariantGroupMetadata
`dropUnknownNonDebugMetadata` takes a list of "known" metadata IDs.  The
only reason it worked at all is that `getMetadataID` returns something
unrelated -- it returns the subclass ID of the receiver (which is used
in `dyn_cast` etc.).  That does not numerically match
`LLVMContext::MD_invariant_group` and ends up dropping `invariant_group`
along with every other metadata that does not numerically match
`LLVMContext::MD_invariant_group`.

llvm-svn: 289973
2016-12-16 18:52:33 +00:00
Eli Friedman f624ec27b7 [ARM] Add ARMISD::VLD1DUP to match vld1_dup more consistently.
Currently, there are substantial problems forming vld1_dup even if the
VDUP survives legalization. The lack of an actual node
leads to terrible results: not only can we not form post-increment vld1_dup
instructions, but we form scalar pre-increment and post-increment
loads which force the loaded value into a GPR. This patch fixes that
by combining the vdup+load into an ARMISD node before DAGCombine
messes it up.

Also includes a crash fix for vld2_dup (see testcase @vld2dupi8_postinc_variable).

Recommiting with fix to avoid forming vld1dup if the type of the load
doesn't match the type of the vdup (see
https://llvm.org/bugs/show_bug.cgi?id=31404).

Differential Revision: https://reviews.llvm.org/D27694

llvm-svn: 289972
2016-12-16 18:44:08 +00:00
Matt Arsenault 55e7d65b12 AMDGPU: Fix name for v_ashrrev_i16
llvm-svn: 289967
2016-12-16 17:40:11 +00:00
David Blaikie 7d4a5599da Revert "dwarfdump: Support/process relocations on a CU's abbrev_off"
Reverting because this breaks lld's gdb_index support - it's probably
double counting the abbrev relocation offset.

This reverts commit r289954.

llvm-svn: 289961
2016-12-16 17:10:17 +00:00
Jun Bum Lim f9416af191 Revert "[CodeGenPrep] Skip merging empty case blocks"
This reverts commit r289951.

llvm-svn: 289960
2016-12-16 17:06:14 +00:00
Sanjay Patel 2d82aa7af7 [InstCombine] auto-generate checks; NFC
llvm-svn: 289959
2016-12-16 16:58:54 +00:00
Matthew Simpson 099af810de [LV] Don't attempt to type-shrink scalarized instructions
After r288909, instructions feeding predicated instructions may be scalarized
if profitable. Since these instructions will remain scalar, we shouldn't
attempt to type-shrink them. We should only truncate vector types to their
minimal bit widths. This bug was exposed by enabling the vectorization of loops
containing conditional stores by default.

llvm-svn: 289958
2016-12-16 16:52:35 +00:00
Dehao Chen 2797800595 Pass sample pgo flags to thinlto.
Summary: ThinLTO needs to invoke SampleProfileLoader pass during link time in order to annotate profile correctly after module importing.

Reviewers: davidxl, mehdi_amini, tejohnson

Subscribers: pcc, davide, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D27790

llvm-svn: 289957
2016-12-16 16:48:46 +00:00
Hans Wennborg 35f21cba13 [X86] Fold (setcc (cmp (atomic_load_add x, -C) C), COND) to (setcc (LADD x, -C), COND) (PR31367)
atomic_load_add returns the value before addition, but sets EFLAGS based on the
result of the addition. That means it's setting the flags based on effectively
subtracting C from the value at x, which is also what the outer cmp does.

This targets a pattern that occurs frequently with reference counting pointers:

  void decrement(long volatile *ptr) {
    if (_InterlockedDecrement(ptr) == 0)
      release();
  }

Clang would previously compile it (for 32-bit at -Os) as:

00000000 <?decrement@@YAXPCJ@Z>:
   0:   8b 44 24 04             mov    0x4(%esp),%eax
   4:   31 c9                   xor    %ecx,%ecx
   6:   49                      dec    %ecx
   7:   f0 0f c1 08             lock xadd %ecx,(%eax)
   b:   83 f9 01                cmp    $0x1,%ecx
   e:   0f 84 00 00 00 00       je     14 <?decrement@@YAXPCJ@Z+0x14>
  14:   c3                      ret

and with this patch it becomes:

00000000 <?decrement@@YAXPCJ@Z>:
   0:   8b 44 24 04             mov    0x4(%esp),%eax
   4:   f0 ff 08                lock decl (%eax)
   7:   0f 84 00 00 00 00       je     d <?decrement@@YAXPCJ@Z+0xd>
   d:   c3                      ret

(Equivalent variants with _InterlockedExchangeAdd, std::atomic<>'s fetch_add
or pre-decrement operator generate the same code.)

Differential Revision: https://reviews.llvm.org/D27781

llvm-svn: 289955
2016-12-16 16:34:59 +00:00
David Blaikie e9fda9f201 dwarfdump: Support/process relocations on a CU's abbrev_off
Input can be produced by ld -r, for example (a normal LLVM workflow
never hits this - LLVM only ever produces a single abbrev table in an
object (shared by multiple CUs), so the reloc's always 0, and when it's
linked together the relocation's resolved so it doesn't need to be
handled)

llvm-svn: 289954
2016-12-16 16:31:10 +00:00
Jun Bum Lim 85347dde27 [CodeGenPrep] Skip merging empty case blocks
This is recommit of r287553 after fixing the invalid loop info after eliminating an empty block:

Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting.

Reviewers: t.p.northover, mcrosier, manmanren, wmi, joerg, davidxl

Subscribers: joerg, qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D22696

llvm-svn: 289951
2016-12-16 16:03:31 +00:00
Simon Pilgrim 9519bd9232 [X86][AVX512] use a single shufps for 512-bit vectors when it can save instructions
This is the 512-bit counterpart to the 128-bit transform checked in here:
https://reviews.llvm.org/rL289837

This patch is based on the draft by @sroland (Roland Scheidegger) that is attached to PR27885:
https://llvm.org/bugs/show_bug.cgi?id=27885

llvm-svn: 289946
2016-12-16 14:30:04 +00:00
Simon Pilgrim 224416a9e4 [X86][AVX512] Add tests showing missed opportunity to efficiently lower v16i32 to VSHUFPS (PR27885)
llvm-svn: 289945
2016-12-16 14:21:57 +00:00
Nico Weber c4d695e25b Speculatively revert r289925, see PR31407
llvm-svn: 289944
2016-12-16 14:02:28 +00:00
Diana Picus 812caee65a [ARM] GlobalISel: Select add i32, i32
Add the minimal support necessary to select a function that returns the sum of
two i32 values.

This includes some support for argument/return lowering of i32 values through
registers, as well as the handling of copy and add instructions throughout the
GlobalISel pipeline.

Differential Revision: https://reviews.llvm.org/D26677

llvm-svn: 289940
2016-12-16 12:54:46 +00:00
Simon Pilgrim f159a3414f [X86][SSE] Combine shuffles to MOVSS/MOVSD whatever the domain.
We already do the same thing in shuffle lowering; but don't do it if we have SSE41 (PBLEND) instead.

llvm-svn: 289937
2016-12-16 11:48:51 +00:00
Dylan McKay a81719fbfc [AVR] Add a test for 64-bit left shifts
llvm-svn: 289936
2016-12-16 11:40:00 +00:00
Chandler Carruth 48b4e614d8 Revert r289863: [LV] Enable vectorization of loops with conditional
stores by default

This uncovers a crasher in the loop vectorizer on PPC when building the
Python runtime. I'll send the testcase to the review thread for the
original commit.

llvm-svn: 289934
2016-12-16 11:31:39 +00:00
Andrew V. Tischenko 5f643ad847 Extra coverage tests to demonstrate fixes in D72618 and D26855
llvm-svn: 289931
2016-12-16 09:56:02 +00:00
Chandler Carruth 05e80d31bd Revert r289638: [PowerPC] Fix logic dealing with nop after calls (and tail-call eligibility)
This patch appears to result in trampolines in vtables being miscompiled
when they in turn tail call a method.

I've posted some preliminary details about the failure on the thread for
this commit and talked to Hal. He was comfortable going ahead and
reverting until we sort out what is wrong.

llvm-svn: 289928
2016-12-16 07:31:20 +00:00
Ekaterina Romanova 25da8a9b53 Update .debug_line section version information to match DWARF version.
One more attempt to re-commit the patch r285355, which I had to revert in r285362, because some tests were failing (the reason is because the size of the line_table varied depending on the full file name).

In the past the compiler always emitted .debug_line version 2, though some opcodes from DWARF 3 (e.g. DW_LNS_set_prologue_end, DW_LNS_set_epilogue_begin or DW_LNS_set_isa) and from DWARF 4 could be emitted by the compiler.

This patch changes version information of .debug_line to exactly match the DWARF version. For .debug_line version 4, a new field maximum_operations_per_instruction is emitted.

Differential Revision: https://reviews.llvm.org/D16697

llvm-svn: 289925
2016-12-16 05:10:11 +00:00
Nico Weber 11c2e6ee3a Revert 279703, it caused PR31404.
llvm-svn: 289923
2016-12-16 04:51:25 +00:00
Adrian Prantl 74a835cda0 [IR] Remove the DIExpression field from DIGlobalVariable.
This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.

Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:

(1) The DIGlobalVariable should describe the source level variable,
    not how to get to its location.

(2) It makes it unsafe/hard to update the expressions when we call
    replaceExpression on the DIGLobalVariable.

(3) It makes it impossible to represent a global variable that is in
    more than one location (e.g., a variable with multiple
    DW_OP_LLVM_fragment-s).  We also moved away from attaching the
    DIExpression to DILocalVariable for the same reasons.

This reapplies r289902 with additional testcase upgrades.

<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769

llvm-svn: 289920
2016-12-16 04:25:54 +00:00
Chandler Carruth 4154062b69 Revert patch series introducing the DAG combine to match a load-by-bytes
idiom.

r289538: Match load by bytes idiom and fold it into a single load
r289540: Fix a buildbot failure introduced by r289538
r289545: Use more detailed assertion messages in the code ...
r289646: Add a couple of assertions to the load combine code ...

This DAG combine has a bad crash in it that is quite hard to trigger
sadly -- it relies on sneaking code with UB through the SDAG build and
into this particular combine. I've responded to the original commit with
a test case that reproduces it.

However, the code also has other problems that will require substantial
changes to address and so I'm going ahead and reverting it for now. This
should unblock us and perhaps others that are hitting the crash in the
wild and will let a fresh patch with updated approach come in cleanly
afterward.

Sorry for any trouble or disruption!

llvm-svn: 289916
2016-12-16 04:05:22 +00:00
Adrian Prantl 03c6d31a3b Revert "[IR] Remove the DIExpression field from DIGlobalVariable."
This reverts commit 289902 while investigating bot berakage.

llvm-svn: 289906
2016-12-16 01:00:30 +00:00
Adrian Prantl ce13935776 [IR] Remove the DIExpression field from DIGlobalVariable.
This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.

Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:

(1) The DIGlobalVariable should describe the source level variable,
    not how to get to its location.

(2) It makes it unsafe/hard to update the expressions when we call
    replaceExpression on the DIGLobalVariable.

(3) It makes it impossible to represent a global variable that is in
    more than one location (e.g., a variable with multiple
    DW_OP_LLVM_fragment-s).  We also moved away from attaching the
    DIExpression to DILocalVariable for the same reasons.

<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769

llvm-svn: 289902
2016-12-16 00:36:43 +00:00
Ehsan Amiri 741b387563 [PPC] corrections in two testcases
Removing sensitivity to scheduling (by using CHECK-DAG instead of CHECK) and
some other minor corrections.

In preparation to commit Power9 processor model.

llvm-svn: 289900
2016-12-16 00:33:07 +00:00
Peter Collingbourne 1398a32e28 IPO: Introduce ThinLTOBitcodeWriter pass.
This pass prepares a module containing type metadata for ThinLTO by splitting
it into regular and thin LTO parts if possible, and writing both parts to
a multi-module bitcode file. Modules that do not contain type metadata are
written unmodified as a single module.

All globals with type metadata are added to the regular LTO module, and
the rest are added to the thin LTO module.

Differential Revision: https://reviews.llvm.org/D27324

llvm-svn: 289899
2016-12-16 00:26:30 +00:00
Teresa Johnson 19f2aa7891 [ThinLTO] Thin link efficiency improvement: don't re-export globals (NFC)
Summary:
We were reinvoking exportGlobalInModule numerous times redundantly.
No need to re-export globals referenced by a global that was already
imported from its module. This resulted in a large speedup in the thin
link for a big application, particularly when importing aggressiveness
was cranked up.

Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27687

llvm-svn: 289896
2016-12-15 23:50:06 +00:00
Davide Italiano 67e979e086 [SimplifyLibCalls] Add a test to make sure we lower fls(0) correctly.
llvm-svn: 289895
2016-12-15 23:48:07 +00:00
Davide Italiano 85ad36b0e0 [SimplifyLibCalls] Lower fls() to llvm.ctlz().
Differential Revision:  https://reviews.llvm.org/D14590

llvm-svn: 289894
2016-12-15 23:45:11 +00:00
David Blaikie 2e1626879e DebugInfo: Make a Generic test case actually generic (remove datalayout/triple)
llvm-svn: 289893
2016-12-15 23:39:25 +00:00
Quentin Colombet 327f942876 [IRTranslator] Merge the entry and ABI lowering blocks.
The IRTranslator uses an additional block before the LLVM-IR entry block
to perform all the ABI lowering and the constant hoisting. Thus, this
block is the actual entry block and it falls through the LLVM-IR entry
block. However, with such representation, we end up with two basic
blocks that are not maximal.

Therefore, this patch adds a bit of canonicalization by merging both the
LLVM-IR entry block and the ABI lowering/constants hoisting into one
block, making the resulting block more likely to be maximal (indeed the
LLVM-IR entry block might not have been maximal).

llvm-svn: 289891
2016-12-15 23:32:25 +00:00
David Blaikie 3e3eb33ed7 DebugInfo: Emit ranges for functions with DISubprograms but lacking locations on any instructions
This seems more consistent, and helps tidy up/simplify some other code
in this change.

llvm-svn: 289889
2016-12-15 23:17:52 +00:00
Eli Friedman 379294676d Don't combine splats with other shuffles.
We sometimes end up creating shuffles which are worse than the obvious
translation of the IR.

Fixes https://llvm.org/bugs/show_bug.cgi?id=31301 .

Differential Revision: https://reviews.llvm.org/D27793

llvm-svn: 289882
2016-12-15 22:41:40 +00:00
Yichao Yu 8f8cdd00da Fix R_AARCH64_MOVW_UABS_G3 relocation
Summary: The relocation is missing mask so an address that has non-zero bits in 47:43 may overwrite the register number. (Frequently shows up as target register changed to `xzr`....)

Reviewers: t.p.northover, lhames

Subscribers: davide, aemerson, rengolin, llvm-commits

Differential Revision: https://reviews.llvm.org/D27609

llvm-svn: 289880
2016-12-15 22:36:53 +00:00
Matt Arsenault 327188aa15 AMDGPU: Select branch on undef to uniform scc branch
llvm-svn: 289877
2016-12-15 21:57:11 +00:00
Teresa Johnson 68e58b4b60 [gold] Add datalayout to test where it was missing
Needed due to change to require datalayout (r289719).

Found this in my own testing, maybe there aren't any bots using a v1.12
gold yet.

llvm-svn: 289876
2016-12-15 21:42:56 +00:00
Eli Friedman 34505083c6 Don't combine a shuffle of two BUILD_VECTORs with duplicate elements.
Targets can't handle this case well in general; we often transform
a shuffle of two cheap BUILD_VECTORs to element-by-element insertion,
which is very inefficient.

Fixes https://llvm.org/bugs/show_bug.cgi?id=31364 . Partially
fixes https://llvm.org/bugs/show_bug.cgi?id=31301.

Differential Revision: https://reviews.llvm.org/D27787

llvm-svn: 289874
2016-12-15 21:36:59 +00:00
Sanjoy Das 6698c15cb6 [Verifier] Allow TBAA metadata on atomicrmw and atomiccmpxchg
This used to be allowed before r289402 by default (before r289402 you
could have TBAA metadata on any instruction), and while I'm not sure
that it helps, it does sound reasonable enough to not fail the verifier
and we have out-of-tree users who use this.

llvm-svn: 289872
2016-12-15 21:23:44 +00:00
Ehsan Amiri 1aa3ef9268 [PPC] Use CHECK-DAG instead of CHECK in the testcase
This test is currently sensitive to scheduling. Using CHECK-DAG allows us to
preserve the main purpose of the test and remove this sensivity.

In preparation to commit Power9 processor model.

llvm-svn: 289869
2016-12-15 20:51:09 +00:00
Matt Arsenault 0b386360c5 AMDGPU: Fix asserting on returned tail calls
llvm-svn: 289868
2016-12-15 20:50:12 +00:00
Matt Arsenault 0e8a299f19 AMDGPU: Assembler support for vintrp instructions
llvm-svn: 289866
2016-12-15 20:40:20 +00:00
Matthew Simpson 6a98bcfe33 [LV] Enable vectorization of loops with conditional stores by default
This patch sets the default value of the "-enable-cond-stores-vec" command line
option to "true".

Differential Revision: https://reviews.llvm.org/D27814

llvm-svn: 289863
2016-12-15 20:11:05 +00:00
Peter Collingbourne e089554c8f LibDriver: Allow resource files to be archive members.
It seems pointless to add a resource to an archive because it won't have
any symbols to link against (and link.exe doesn't have an equivalent of
--whole-archive), but lib.exe allows it for some reason.

llvm-svn: 289859
2016-12-15 19:37:46 +00:00
Sanjay Patel d640641a61 [InstCombine] add folds for icmp (smin X, Y), X
Min/max canonicalization (r287585) exposes the fact that we're missing combines for min/max patterns. 
This patch won't solve the example that was attached to that thread, so something else still needs fixing.

The line between InstCombine and InstSimplify gets blurry here because sometimes the icmp instruction that
we want to fold to already exists, but sometimes it's the swapped form of what we want.

Corresponding changes for smax/umin/umax to follow.

Differential Revision: https://reviews.llvm.org/D27531

llvm-svn: 289855
2016-12-15 19:13:37 +00:00
Sanjay Patel a97358bc8e [x86] use a single shufps for 256-bit vectors when it can save instructions
This is the 256-bit counterpart to the 128-bit transform checked in here:
https://reviews.llvm.org/rL289837

This patch is based on the draft by @sroland (Roland Scheidegger) that is
attached to PR27885:
https://llvm.org/bugs/show_bug.cgi?id=27885

llvm-svn: 289846
2016-12-15 18:43:46 +00:00
Matthew Simpson 2c8de192a1 [AArch64] Guard Misaligned 128-bit store penalty by subtarget feature
This patch checks that the SlowMisaligned128Store subtarget feature is set
when penalizing such stores in getMemoryOpCost.

Differential Revision: https://reviews.llvm.org/D27677

llvm-svn: 289845
2016-12-15 18:36:59 +00:00
Teresa Johnson 1b859a2306 [ThinLTO] Ensure callees get hot threshold when first seen on cold path
This is split out from D27696, since it turned out to be a bug fix and
not part of the NFC efficiency change.

Keep the same adjusted (possibly decayed) threshold in both the worklist
and the ImportList. Otherwise if we encountered it first along a cold
path, the callee would be added to the worklist with a lower decayed
threshold than when it is later encountered along a hot path. But the
logic uses the threshold recorded in the ImportList entry to check if
we should re-add it, and without this patch the threshold recorded there
is the same along both paths so we don't re-add it. Using the
same possibly decayed threshold in the ImportList ensures we re-add it
later with the higher non-decayed hot path threshold.

llvm-svn: 289843
2016-12-15 18:21:01 +00:00
Sanjay Patel a0d8a278a7 [x86] use a single shufps when it can save instructions
This is a tiny patch with a big pile of test changes.
This partially fixes PR27885:
https://llvm.org/bugs/show_bug.cgi?id=27885

My motivating case looks like this:

  - vpshufd {{.*#+}} xmm1 = xmm1[0,1,0,2]
  - vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
  - vpblendw {{.*#+}} xmm0 = xmm0[0,1,2,3],xmm1[4,5,6,7]

  + vshufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[0,2]

And this happens several times in the diffs. For chips with domain-crossing penalties,
the instruction count and size reduction should usually overcome any potential 
domain-crossing penalty due to using an FP op in a sequence of int ops. For chips such
as recent Intel big cores and Atom, there is no domain-crossing penalty for shufps, so
using shufps is a pure win.

So the test case diffs all appear to be improvements except one test in 
vector-shuffle-combining.ll where we miss an opportunity to use a shift to generate 
zero elements and one test in combine-sra.ll where multiple uses prevent the expected
shuffle combining.

Differential Revision: https://reviews.llvm.org/D27692

llvm-svn: 289837
2016-12-15 18:03:38 +00:00
Simon Pilgrim 7522f54feb [X86][SSE] Fix domains for scalar store instructions
As discussed on D27692

llvm-svn: 289834
2016-12-15 17:09:24 +00:00
Robert Lougher 6ea759a83e Revert "[SimplifyCFG] In sinkLastInstruction correctly set debugloc of common inst"
Reverting as it is causing buildbot failures (address sanitizer).

llvm-svn: 289833
2016-12-15 16:59:13 +00:00