Commit Graph

98123 Commits

Author SHA1 Message Date
Mohammed Agabaria 189e2d29ba [Test Commit] fixing some format issue in X86TTI to match clang-format output.
llvm-svn: 291095
2017-01-05 09:51:02 +00:00
Elena Demikhovsky 143cbc425b AVX-512: Optimized pattern for truncate with unsigned saturation.
DAG patterns optimization: truncate + unsigned saturation supported by VPMOVUS* instructions in AVX-512.
Differential revision: https://reviews.llvm.org/D28216

llvm-svn: 291092
2017-01-05 08:21:09 +00:00
Craig Topper 33c544bdb0 [X86] Add Intel Kaby Lake model numbers to getHostCPUName aliased to "skylake" since there are no feature differences.
Model numbers found here http://www.sandpile.org/x86/cpuid.htm

llvm-svn: 291086
2017-01-05 05:57:27 +00:00
Saleem Abdulrasool 6252bd8eac MC: support passing search paths to the IAS
This is needed to support inclusion in inline assembly via the
`.include` directive.

llvm-svn: 291085
2017-01-05 05:56:39 +00:00
Craig Topper 1ab35fa7a8 [X86] Change getHostCPUName to report Intel model 0x4e as "skylake" instead of "skylake-avx512". Add the proper 0x55 model for "skylake-avx512".
Summary:
Intel's i5-6300U CPU is reporting to have a model id of 78 (4e).
The Host detection assumes that to be Skylake Xeon (with AVX512 support),
instead of a normal Skylake machine.

Patch by: Valentin Churavy

Reviewers: nalimilan, craig.topper

Subscribers: hfinkel, tkelman, craig.topper, nalimilan, llvm-commits

Differential Revision: https://reviews.llvm.org/D28221

llvm-svn: 291084
2017-01-05 05:47:29 +00:00
Kostya Serebryany 2648243ebd [libFuzzer] use /tmp (or $TMPDIR, if present) to store temp files during merge
llvm-svn: 291078
2017-01-05 04:32:19 +00:00
Peter Collingbourne b2ce2b6805 IR: Module summary representation for type identifiers; summary test scaffolding for lowertypetests.
Set up basic YAML I/O support for module summaries, plumb the summary into
the pass and add a few command line flags to test YAML I/O support. Bitcode
support to come separately, as will the code in LowerTypeTests that actually
uses the summary. Also add a couple of tests that pass by virtue of the pass
doing nothing with the summary (which happens to be the correct thing to do
for those tests).

Differential Revision: https://reviews.llvm.org/D28041

llvm-svn: 291069
2017-01-05 03:39:00 +00:00
Richard Smith d4d575b955 Revert r291025 ("AMDGPU: Remove unneccessary intermediate vector")
This caused buildbot failures due to returning ArrayRefs referencing local
(temporary) objects.

llvm-svn: 291067
2017-01-05 03:13:10 +00:00
Wolfgang Pieb ce13e716c5 [DWARF] Null out the debug locs of load instructions that have been moved by GVN
performing partial redundancy elimination (PRE). Not doing so can cause jumpy line
tables and confusing (though correct) source attributions.

Differential Revision: https://reviews.llvm.org/D27857

llvm-svn: 291037
2017-01-04 23:58:26 +00:00
Mehdi Amini 19ef4fad91 Use lazy-loading of Metadata in MetadataLoader when importing is enabled (NFC)
Summary:
This is a relatively simple scheme: we use the index emitted in the
bitcode to avoid loading all the global metadata. Instead we load
the index with their position in the bitcode so that we can load each
of them individually. Materializing the global metadata block in this
condition only triggers loading the named metadata, and the ones
referenced from there (transitively). When materializing a function,
metadata from the global block are loaded lazily as they are
referenced.

Two main current limitations are:

1) Global values other than functions are not materialized on demand,
so we need to eagerly load METADATA_GLOBAL_DECL_ATTACHMENT records
(and their transitive dependencies).
2) When we load a single metadata, we don't recurse on the operands,
instead we use a placeholder or a temporary metadata. Unfortunately
tepmorary nodes are very expensive. This is why we don't have it
always enabled and only for importing.

These two limitations can be lifted in a subsequent improvement if
needed.

With this change, the total link time of opt with ThinLTO and Debug
Info enabled is going down from 282s to 224s (~20%).

Reviewers: pcc, tejohnson, dexonsmith

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28113

llvm-svn: 291027
2017-01-04 22:54:33 +00:00
Mehdi Amini 867aad1359 Change BitstreamCursor::skipRecord to return the record code (NFC)
llvm-svn: 291026
2017-01-04 22:54:14 +00:00
Matt Arsenault 6796d7ea8b AMDGPU: Remove unneccessary intermediate vector
llvm-svn: 291025
2017-01-04 22:54:10 +00:00
Matt Arsenault 3bdd75d01e InstCombine: Fold cos(-x) -> cos(x)
Also cos(fabs(x)) -> cos(x)

llvm-svn: 291022
2017-01-04 22:49:03 +00:00
David Blaikie 7ad9dc11db Reapply "Make BitCodeAbbrev ownership explicit using shared_ptr rather than IntrusiveRefCntPtr""
If this is a problem for anyone (shared_ptr is two pointers in size,
whereas IntrusiveRefCntPtr is 1 - and the ref count control block that
make_shared adds is probably larger than the one int in RefCountedBase)
I'd prefer to address this by adding a lower-overhead version of
shared_ptr (possibly refactoring IntrusiveRefCntPtr into such a thing)
to avoid the intrusiveness - this allows memory ownership to remain
orthogonal to types and at least to me, seems to make code easier to
understand (since no implicit ownership acquisition can happen).

This recommits 291006, reverted in r291007.

llvm-svn: 291016
2017-01-04 22:36:33 +00:00
Tim Shen 5480eb8445 [Legalizer] Fix fp-to-uint to fp-tosint promotion assertion.
Summary:
When promoting fp-to-uint16 to fp-to-sint32, the result is actually zero
extended. For example, given double 65534.0, without legalization:

  fp-to-uint16: 65534.0 -> 0xfffe

With the legalization:

  fp-to-sint32: 65534.0 -> 0x0000fffe

Without this patch, legalization wrongly emits a signed extend assertion,
which is consumed by later icmp instruction, and cause miscompile.

Note that the floating point value must be in [0, 65535), otherwise the
behavior is undefined.

This patch reverts r279223 behavior and adds more tests and
documentations.

In PR29041's context, James Molloy mentioned that:

  We don't need to mask because conversion from float->uint8_t is
  undefined if the integer part of the float value is not representable in
  uint8_t. Therefore we can assume this doesn't happen!

which is totally true and good, because fptoui is documented clearly to
have undefined behavior when overflow/underflow happens. We should take
the advantage of this behavior so that we can save unnecessary mask
instructions.

Reviewers: jmolloy, nadav, echristo, kbarton

Subscribers: mehdi_amini, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D28284

llvm-svn: 291015
2017-01-04 22:11:42 +00:00
Evgeny Stupachenko c88697dc16 The patch fixes (base, index, offset) match.
Summary:
Instead of matching:
  (a + i) + 1 -> (a + i, undef, 1)
Now it matches:
  (a + i) + 1 -> (a, i, 1)

Reviewers: rengolin

Differential Revision: http://reviews.llvm.org/D26367

From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 291012
2017-01-04 21:43:39 +00:00
Chad Rosier 63687e40bc [AArch64] Update the feature set for Qualcomm's Falkor CPU.
llvm-svn: 291010
2017-01-04 21:26:23 +00:00
Nirav Dave 0f9d111f97 [AArch64] Fix over-eager early-exit in load-store combiner
Fix early-exit analysis for memory operation pairing when operations are
not emitted in ascending order.

Reviewers: mcrosier, t.p.northover

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: https://reviews.llvm.org/D28251

llvm-svn: 291008
2017-01-04 21:21:46 +00:00
David Blaikie 6e2207a134 Revert "Make BitCodeAbbrev ownership explicit using shared_ptr rather than IntrusiveRefCntPtr"
Breaks Clang's use of bitcode. Reverting until I have a fix to go with
it there.

This reverts commit r291006.

llvm-svn: 291007
2017-01-04 21:19:28 +00:00
David Blaikie daff78cd87 Make BitCodeAbbrev ownership explicit using shared_ptr rather than IntrusiveRefCntPtr
If this is a problem for anyone (shared_ptr is two pointers in size,
whereas IntrusiveRefCntPtr is 1 - and the ref count control block that
make_shared adds is probably larger than the one int in RefCountedBase)
I'd prefer to address this by adding a lower-overhead version of
shared_ptr (possibly refactoring IntrusiveRefCntPtr into such a thing)
to avoid the intrusiveness - this allows memory ownership to remain
orthogonal to types and at least to me, seems to make code easier to
understand (since no implicit ownership acquisition can happen).

llvm-svn: 291006
2017-01-04 21:13:35 +00:00
Hal Finkel b2f951d87a [PowerPC] Fix logic dealing with nop after calls (and tail-call eligibility)
This change aims to unify and correct our logic for when we need to allow for
the possibility of the linker adding a TOC restoration instruction after a
call. This comes up in two contexts:

 1. When determining tail-call eligibility. If we make a tail call (i.e.
    directly branch to a function) then there is no place for the linker to add
    a TOC restoration.
 2. When determining when we need to add a nop instruction after a call.
    Likewise, if there is a possibility that the linker might need to add a
    TOC restoration after a call, then we need to put a nop after the call
    (the bl instruction).

First problem: We were using similar, but different, logic to decide (1) and
(2). This is just wrong. Both the resideInSameModule function (used when
determining tail-call eligibility) and the isLocalCall function (used when
deciding if the post-call nop is needed) were supposed to be determining the
same underlying fact (i.e. might a TOC restoration be needed after the call).
The same logic should be used in both places.

Second problem: The logic in both places was wrong. We only know that two
functions will share the same TOC when both functions come from the same
section of the same object. Otherwise the linker might cause the functions to
use different TOC base addresses (unless the multi-TOC linker option is
disabled, in which case only shared-library boundaries are relevant). There are
a number of factors that can cause functions to be placed in different sections
or come from different objects (-ffunction-sections, explicitly-specified
section names, COMDAT, weak linkage, etc.). All of these need to be checked.
The existing logic only checked properties of the callee, but the properties of
the caller must also be checked (for example, calling from a function in a
COMDAT section means calling between sections).

There was a conceptual error in the resideInSameModule function in that it
allowed tail calls to functions with weak linkage and protected/hidden
visibility. While protected/hidden visibility does prevent the function
implementation from being replaced at runtime (via interposition), it does not
prevent the linker from using an alternate implementation at link time (i.e.
using some strong definition to replace the provided weak one during linking).
If this happens, then we're still potentially looking at a required TOC
restoration upon return.

Otherwise, in general, the post-call nop is needed wherever ELF interposition
needs to be supported. We don't currently support ELF interposition at the IR
level (see http://lists.llvm.org/pipermail/llvm-dev/2016-November/107625.html
for more information), and I don't think we should try to make it appear to
work in the backend in spite of that fact. Unfortunately, because of the way
that the ABI works, we need to generate code as if we supported interposition
whenever the linker might insert stubs for the purpose of supporting it.

Differential Revision: https://reviews.llvm.org/D27231

llvm-svn: 291003
2017-01-04 21:05:13 +00:00
Daniel Berlin 6cc5e44068 NewGVN: Track the maximum number of iterations GVN takes on any function, so we can pinpoint performance issues.
llvm-svn: 291002
2017-01-04 21:01:02 +00:00
Davide Italiano 6309895770 [lib/LTO] Simplify logic removing set but unused variable. NFCI.
Reported by David Binderman and ack'ed by Teresa on IRC.
PR: 31527

llvm-svn: 291000
2017-01-04 20:37:57 +00:00
Peter Collingbourne efdff71b05 YAML: Remove Input::MapHNode::isValidKey(), use llvm::is_contained() instead. NFC.
llvm-svn: 290999
2017-01-04 20:10:43 +00:00
Eric Christopher 568c113ac0 Remove dead and unused variable NumSentinelElements.
Fixes PR31529.

llvm-svn: 290998
2017-01-04 20:05:18 +00:00
Eric Christopher 0192e97911 Remove dead variable Len.
Fixes PR31528

llvm-svn: 290995
2017-01-04 19:47:10 +00:00
Jan Vesely d48445d513 AMDGPU/SI: Implement sendmsghalt intrinsic
v2: expose using amdgcn prefix

Differential Revision: https://reviews.llvm.org/D23511

llvm-svn: 290977
2017-01-04 18:06:55 +00:00
Robert Lougher 5bf0416f45 Reapply "[SimplifyCFG] In sinkLastInstruction correctly set debugloc of common inst"
This reapplies r289828 (reverted in r289833 as it broke the address sanitizer).  The
debugloc is now only set when the instruction is not a call, as this causes the
verifier to assert (the inliner requires an inlinable callsite to have a debug loc
if the caller and callee have debug info).

Original commit message:

Simplify CFG will try to sink the last instruction in a series of basic blocks,
creating a "common" instruction in the successor block (sinkLastInstruction).
When it does this, the debug location of the single instruction should be the
merged debug locations of the commoned instructions.

Original review: https://reviews.llvm.org/D27590

llvm-svn: 290973
2017-01-04 17:40:32 +00:00
Simon Pilgrim bb895f3e9c [CostModel][X86] Updated vXi8 and vXi16 Reverse/Alternate shuffle costs
Actual codegen is much better than the extract+insert patterns that was assumed.

llvm-svn: 290962
2017-01-04 14:01:33 +00:00
Nemanja Ivanovic c08b90d08f [PowerPC] Add identification for POWER8NVL
This CPU type was not previously recognized by LLVM which led to emitting
poor (and sometimes incorrect) code in some JIT workloads on such a machine.

llvm-svn: 290961
2017-01-04 13:58:09 +00:00
Simon Pilgrim 939b8cd708 [X86] Merged Reverse/Alternate shuffle cost tables. NFCI.
As discussed on D27811, merged the shuffle cost LUTs and use the shuffle kind to perform the lookup instead of the ISD opcode.

llvm-svn: 290956
2017-01-04 12:08:41 +00:00
Florian Hahn 5815f6c53c [framelowering] Skip dbg values when getting next/previous instruction.
Summary:
In mergeSPUpdates, debug values need to be ignored when getting the
previous element, otherwise debug data could have an impact on codegen.

In eliminateCallFramePseudoInstr, debug values after the erased element
could have an impact on codegen and should be skipped.

Closes PR31319 (https://llvm.org/bugs/show_bug.cgi?id=31319)

Reviewers: aprantl, MatzeB, mkuper

Subscribers: gbedwell, llvm-commits

Differential Revision: https://reviews.llvm.org/D27688

llvm-svn: 290955
2017-01-04 12:08:35 +00:00
Bjorn Pettersson 3c6ce733f5 Fix for InlineSpiller accessing not updated dom tree base information.
Summary:
The InlineSpiller was accessing the DominatorTreeBase directly
through the public data member DT in the MachineDominatorTree.
This is not a good idea as the "cached" information in
SplitCriticalEdges is not applied before the access.
The DominatorTreeBase must be accessed through the member
function getBase() in MachineDominatorTree.

The fault was introduced in r266162.

I think the public data member DT in the MachineDominatorTree
should have been made private in the original code (r215576)
that introduced the concept of lazily updating the
MachineDominatorTree information from
MachineBasicBlock::SplitCriticalEdge().

Patch by Karl-Johan Karlsson <karl-johan.karlsson@ericsson.com>

Reviewers: wmi, qcolombet

Subscribers: llvm-commits, bjope, uabelho

Differential Revision: https://reviews.llvm.org/D27983

llvm-svn: 290950
2017-01-04 09:41:56 +00:00
Nitesh Jain b0bc573ca8 [LLC][MIPS] Fix crash after enabling LLVM_ENABLE_EXPENSIVE_CHECKS
Reviewers: sdardis, vkalintiris

Subscribers: jaydeep, slthakur, RKSimon, llvm-commits
Differential Revision: https://reviews.llvm.org/D27841

llvm-svn: 290949
2017-01-04 09:34:37 +00:00
Ayman Musa 02f9533823 [X86][AVX512] Passing the appropriate memory operand class to INT_{U}COMIS{S|D} instructions
Replacing the memory operand in the intrinsic versions of the comis/ucomis instrucions from f128mem to ssmem/sdmem accordingly.

Differential Revision: https://reviews.llvm.org/D28138

llvm-svn: 290948
2017-01-04 08:21:54 +00:00
Simon Pilgrim c76ea4b638 [X86] Attempt to pre-truncate arithmetic operations if useful
In some cases its more efficient to combine TRUNC( BINOP( X, Y ) ) --> BINOP( TRUNC( X ), TRUNC( Y ) ) if the binop is legal for the truncated types.

This is true for vector integer multiplication (especially vXi64), as well as ADD/AND/XOR/OR in cases where we only need to truncate one of the inputs at runtime (e.g. a duplicated input or an one use constant we can fold).

Further work could be done here - scalar cases (especially i64) could often benefit (if we avoid partial registers etc.), other opcodes, and better analysis of when truncating the inputs reduces costs.

I have considered implementing this for all targets within the DAGCombiner but wasn't sure we could devise a suitable cost model system that would give us the range we need.

Differential Revision: https://reviews.llvm.org/D28219

llvm-svn: 290947
2017-01-04 08:05:42 +00:00
Craig Topper d0aa53b9ae [AVX-512] Add support for detecting 512-bit shuffles that contain a 128-bit subvector insertion from the lowest subvector of one of the sources.
These are best handled with a vinsert32x4 or vinsert64x2 instruction.

llvm-svn: 290946
2017-01-04 07:32:03 +00:00
Craig Topper 83115a809f [AVX-512] Simplify code for creating 512-bit SHUF128 operations.
We don't need two loops and we can safely assume assume and hardcode the size of the widened mask.

llvm-svn: 290942
2017-01-04 07:31:51 +00:00
Peter Collingbourne 87dd2ab000 Support: Add YAML I/O support for custom mappings.
This will be used to YAMLify parts of the module summary.

Differential Revision: https://reviews.llvm.org/D28014

llvm-svn: 290935
2017-01-04 03:51:36 +00:00
David Majnemer cb892e9066 [InstCombine] Move casts around shift operations
It is possible to perform a left shift before zero extending if the
shift would only shift out zeros.

llvm-svn: 290928
2017-01-04 02:21:34 +00:00
David Majnemer 022d2a563b [InstCombine] Combine adds across a zext
We can perform the following:
(add (zext (add nuw X, C1)), C2) -> (zext (add nuw X, C1+C2))

This is only possible if C2 is negative and C2 is greater than or equal to negative C1.

llvm-svn: 290927
2017-01-04 02:21:31 +00:00
Eugene Zelenko b2ca1b3f37 [Hexagon, TableGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 290925
2017-01-04 02:02:05 +00:00
Teresa Johnson 5a8dba5bda [ThinLTO] Import type as decl only when non-null Identifier
As per post-commit review for r289993 (D27775), we can only safely
import a type as a decl if it has an Identifier, as the Name alone
is not enough to be unique across modules.

llvm-svn: 290915
2017-01-03 23:19:29 +00:00
Matt Arsenault 56ff4839ae InstCombine: Fold fabs on select of constants
llvm-svn: 290913
2017-01-03 22:40:34 +00:00
Sanjay Patel f0d1e77373 [InstCombine] use 'match' to reduce code bloat; NFCI
I wrote this patch before seeing the comment in:
https://reviews.llvm.org/D27114
...that suggests we should actually be canonicalizing the other way.

So just in case we decide this is the right way, we might as well
have a cleaner implementation.

llvm-svn: 290912
2017-01-03 22:25:31 +00:00
Ahmed Bougacha 8a41319d8d [CodeGen] Further simplify returned call operand logic. NFC.
As Pete points out in r290905, CallSite lets us avoid duplicating this!

llvm-svn: 290909
2017-01-03 21:42:43 +00:00
Lang Hames b198e5585e [ExecutionEngine] Fix compile errors in OProfileJITEventListener.
Allows LLVM to build with LLVM_USE_OPROFILE=True.

Patch by Mark Dewing. Thanks Mark!

llvm-svn: 290908
2017-01-03 21:39:43 +00:00
Ahmed Bougacha 6aff744e7c [CodeGen] Simplify logic that looks for returned call operands. NFC-ish.
Use getReturnedArgOperand() instead of rolling our own.  Note that it's
equivalent because there can only be one 'returned' operand.

The existing code was also incorrect: there already was awkward logic to
ignore callee/EH blocks, but operands can now also be operand bundles,
in which case we'll look for non-existent parameter attributes.

Unfortunately, this isn't observable in-tree, as it only crashes when
exercising the regular call lowering logic with operand bundles.
Still, this is a nice small cleanup anyway.

llvm-svn: 290905
2017-01-03 20:33:22 +00:00
Kostya Serebryany 4986e819dc [libFuzzer] disable -print_pcs by default (was enabled by mistake)
llvm-svn: 290899
2017-01-03 18:51:28 +00:00
Michal Gorny 21c12044d2 [ADT] APFloatBase: Prevent collapsing semPPCDoubleDouble and semBogus
Provide a distinct contents for semBogus and semPPCDoubleDouble in order
to prevent compilers from collapsing them to a single memory address,
while we heavily rely on every semantic having distinct address.

This happens if insecure optimization collapsing identical values is
enabled. As a result, APFloats of semBogus are indistinguishable from
semPPCDoubleDouble -- and whenever the move constructor is used, the old
value beings being incorrectly recognized as a semPPCDoubleDouble.

Since the values in semPPCDoubleDouble are not used anywhere,
we can easily solve this issue via altering the value of one of the
fields and therefore ensuring that the collapse can not occur.

Differential Revision: https://reviews.llvm.org/D28112

llvm-svn: 290896
2017-01-03 16:33:50 +00:00