Commit Graph

39601 Commits

Author SHA1 Message Date
Matt Arsenault 0efdd06b22 AMDGPU: Run LoadStoreVectorizer pass by default
llvm-svn: 281112
2016-09-09 22:29:28 +00:00
Matt Arsenault 950a82047b LSV: Fix incorrectly increasing alignment
If the unaligned access has a dynamic offset, it may be odd which
would make the adjusted alignment incorrect to use.

llvm-svn: 281110
2016-09-09 22:20:14 +00:00
Davide Italiano b4d0521c92 [gold] Test that we handle invalid directory correctly.
I had this test sitting around for a while but always forgot to
commit. Rafael reviewed it a while ago.

Differential Revision:  https://reviews.llvm.org/D19207

llvm-svn: 281109
2016-09-09 22:14:57 +00:00
Sanjay Patel 58109abe91 [InstCombine] use m_APInt to allow icmp ult X, C folds for splat constant vectors
llvm-svn: 281107
2016-09-09 21:59:37 +00:00
Simon Pilgrim a3d1e03cd7 [X86][XOP] Fix VPERMIL2PD mask creation on 32-bit targets
Use getConstVector helper to correctly create v2i64/v4i64 constants on 32-bit targets

llvm-svn: 281105
2016-09-09 21:47:21 +00:00
Krzysztof Parzyszek 73e0ad8220 [Hexagon] Fix disassembler crash after r279255
When p0 was added as an explicit operand to the duplex subinstructions,
the disassembler was not updated to reflect this.

llvm-svn: 281104
2016-09-09 21:45:00 +00:00
Michael Kuperstein b1153848ff [X86] Regenerate test. NFC.
llvm-svn: 281099
2016-09-09 21:36:17 +00:00
Arnold Schwaighofer 7d7b4b4014 Create phi nodes for swifterror values at the end of the phi instructions list
ISel makes assumption about the order of phi nodes.

rdar://28190150

llvm-svn: 281095
2016-09-09 21:18:47 +00:00
Justin Lebar b5e884976b [NVPTX] Implement llvm.fabs.f32, llvm.max.f32, etc.
Summary:
Previously these only worked via NVPTX-specific intrinsics.

This change will allow us to convert these target-specific intrinsics
into the general LLVM versions, allowing existing LLVM passes to reason
about their behavior.

It also gets us some minor codegen improvements as-is, from situations
where we canonicalize code into one of these llvm intrinsics.

Reviewers: majnemer

Subscribers: llvm-commits, jholewinski, tra

Differential Revision: https://reviews.llvm.org/D24300

llvm-svn: 281092
2016-09-09 21:07:26 +00:00
Wei Ding 06f8d39424 AMDGPU : Fix mqsad_u32_u8 instruction incorrect data type.
Differential Revision: http://reviews.llvm.org/D23700

llvm-svn: 281081
2016-09-09 19:31:51 +00:00
Tom Stellard b2869eb6e9 AMDGPU/SI: Make sure llvm.amdgcn.implicitarg.ptr() is 8-byte aligned for HSA
Reviewers: arsenm

Subscribers: arsenm, wdng, nhaehnle, llvm-commits

Differential Revision: https://reviews.llvm.org/D24405

llvm-svn: 281080
2016-09-09 19:28:00 +00:00
Zachary Turner 36efbfa6d8 [pdb] Print out some more info when dumping a raw stream.
We have various command line options that print the type of a
stream, the size of a stream, etc but nowhere that it can all be
viewed together.

Since a previous patch introduced the ability to dump the bytes
of a stream, this seems like a good place to present a full view
of the stream's properties including its size, what kind of data
it represents, and the blocks it occupies.  So I added the
ability to print that information to the -stream-data command
line option.

llvm-svn: 281077
2016-09-09 19:00:49 +00:00
Dehao Chen 22ce5eb051 Do not widen load for different variable in GVN.
Summary:
Widening load in GVN is too early because it will block other optimizations like PRE, LICM.

https://llvm.org/bugs/show_bug.cgi?id=29110

The SPECCPU2006 benchmark impact of this patch:

Reference: o2_nopatch
(1): o2_patched

           Benchmark             Base:Reference   (1)  
-------------------------------------------------------
spec/2006/fp/C++/444.namd                  25.2  -0.08%
spec/2006/fp/C++/447.dealII               45.92  +1.05%
spec/2006/fp/C++/450.soplex                41.7  -0.26%
spec/2006/fp/C++/453.povray               35.65  +1.68%
spec/2006/fp/C/433.milc                   23.79  +0.42%
spec/2006/fp/C/470.lbm                    41.88  -1.12%
spec/2006/fp/C/482.sphinx3                47.94  +1.67%
spec/2006/int/C++/471.omnetpp             22.46  -0.36%
spec/2006/int/C++/473.astar               21.19  +0.24%
spec/2006/int/C++/483.xalancbmk           36.09  -0.11%
spec/2006/int/C/400.perlbench             33.28  +1.35%
spec/2006/int/C/401.bzip2                 22.76  -0.04%
spec/2006/int/C/403.gcc                   32.36  +0.12%
spec/2006/int/C/429.mcf                   41.04  -0.41%
spec/2006/int/C/445.gobmk                 26.94  +0.04%
spec/2006/int/C/456.hmmer                  24.5  -0.20%
spec/2006/int/C/458.sjeng                    28  -0.46%
spec/2006/int/C/462.libquantum            55.25  +0.27%
spec/2006/int/C/464.h264ref               45.87  +0.72%

geometric mean                                   +0.23%

For most benchmarks, it's a wash, but we do see stable improvements on some benchmarks, e.g. 447,453,482,400.

Reviewers: davidxl, hfinkel, dberlin, sanjoy, reames

Subscribers: gberry, junbuml

Differential Revision: https://reviews.llvm.org/D24096

llvm-svn: 281074
2016-09-09 18:42:35 +00:00
Vedant Kumar db7f924c03 [llvm-cov] Try to fix the native_separators.c test some more
It's still breaking this bot (though, it looks like it always had been):

  http://lab.llvm.org:8011/builders/clang-x86-windows-msvc2015

This time, add quotes around llvm-{cov,config} so that lit won't expand
them.

Thanks to Reid for suggesting the patch!

llvm-svn: 281072
2016-09-09 18:34:43 +00:00
Zachary Turner 72c5b6451f [pdb] Add command line options for dumping individual streams and blocks
I ran into a situation where I wanted to print out the contents of
page 6 of a PDB as a binary blob, and there was no straightforward
way to do that.

In addition to adding that, this patch also adds the ability to dump
a stream by index as a binary blob, and it will stitch together all
the blocks and dump the whole thing as one seemingly contiguous
sequence of bytes.

llvm-svn: 281070
2016-09-09 18:17:52 +00:00
Zachary Turner c6d54da891 [pdb] Write PDB TPI Stream from Yaml.
This writes the full sequence of type records described in
Yaml to the TPI stream of the PDB file.

Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D24316

llvm-svn: 281063
2016-09-09 17:46:17 +00:00
Reid Kleckner 1076288e22 [codeview] Don't assert if the array element type is incomplete
This can happen when the frontend knows the debug info will be emitted
somewhere else. Usually this happens for dynamic classes with out of
line constructors or key functions, but it can also happen when modules
are enabled.

llvm-svn: 281060
2016-09-09 17:29:36 +00:00
Vedant Kumar e32490933e [Bitcode] Add compatibility test for the 3.9 release
Fork off compatibility.ll for the 3.9 release. The *.bc file in this
commit was produced using a Release build of the release_39 branch.

llvm-svn: 281059
2016-09-09 17:24:31 +00:00
Sanjay Patel 6da0fb8c74 [InstCombine] add tests to show pattern matching failures due to commutation
I was looking to fix a bug in getComplexity(), and these cases showed up as 
obvious failures. I'm not sure how to find these in general though.

llvm-svn: 281055
2016-09-09 16:35:20 +00:00
Sam Kolton 1eeb11bfd4 AMDGPU] Assembler: better support for immediate literals in assembler.
Summary:
Prevously assembler parsed all literals as either 32-bit integers or 32-bit floating-point values. Because of this we couldn't support f64 literals.
E.g. in instruction "v_fract_f64 v[0:1], 0.5", literal 0.5 was encoded as 32-bit literal 0x3f000000, which is incorrect and will be interpreted as 3.0517578125E-5 instead of 0.5. Correct encoding is inline constant 240 (optimal) or 32-bit literal 0x3FE00000 at least.

With this change the way immediate literals are parsed is changed. All literals are always parsed as 64-bit values either integer or floating-point. Then we convert parsed literals to correct form based on information about type of operand parsed (was literal floating or binary) and type of expected instruction operands (is this f32/64 or b32/64 instruction).
Here are rules how we convert literals:
    - We parsed fp literal:
        - Instruction expects 64-bit operand:
            - If parsed literal is inlinable (e.g. v_fract_f64_e32 v[0:1], 0.5)
                - then we do nothing this literal
            - Else if literal is not-inlinable but instruction requires to inline it (e.g. this is e64 encoding, v_fract_f64_e64 v[0:1], 1.5)
                - report error
            - Else literal is not-inlinable but we can encode it as additional 32-bit literal constant
                - If instruction expect fp operand type (f64)
                    - Check if low 32 bits of literal are zeroes (e.g. v_fract_f64 v[0:1], 1.5)
                        - If so then do nothing
                    - Else (e.g. v_fract_f64 v[0:1], 3.1415)
                        - report warning that low 32 bits will be set to zeroes and precision will be lost
                        - set low 32 bits of literal to zeroes
                - Instruction expects integer operand type (e.g. s_mov_b64_e32 s[0:1], 1.5)
                    - report error as it is unclear how to encode this literal
        - Instruction expects 32-bit operand:
            - Convert parsed 64 bit fp literal to 32 bit fp. Allow lose of precision but not overflow or underflow
            - Is this literal inlinable and are we required to inline literal (e.g. v_trunc_f32_e64 v0, 0.5)
                - do nothing
                - Else report error
            - Do nothing. We can encode any other 32-bit fp literal (e.g. v_trunc_f32 v0, 10000000.0)
    - Parsed binary literal:
        - Is this literal inlinable (e.g. v_trunc_f32_e32 v0, 35)
            - do nothing
        - Else, are we required to inline this literal (e.g. v_trunc_f32_e64 v0, 35)
            - report error
        - Else, literal is not-inlinable and we are not required to inline it
            - Are high 32 bit of literal zeroes or same as sign bit (32 bit)
                - do nothing (e.g. v_trunc_f32 v0, 0xdeadbeef)
            - Else
                - report error (e.g. v_trunc_f32 v0, 0x123456789abcdef0)

For this change it is required that we know operand types of instruction (are they f32/64 or b32/64). I added several new register operands (they extend previous register operands) and set operand types to corresponding types:
'''
enum OperandType {
    OPERAND_REG_IMM32_INT,
    OPERAND_REG_IMM32_FP,
    OPERAND_REG_INLINE_C_INT,
    OPERAND_REG_INLINE_C_FP,
}
'''

This is not working yet:
    - Several tests are failing
    - Problems with predicate methods for inline immediates
    - LLVM generated assembler parts try to select e64 encoding before e32.
More changes are required for several AsmOperands.

Reviewers: vpykhtin, tstellarAMD

Subscribers: arsenm, kzhuravl, artem.tamazov

Differential Revision: https://reviews.llvm.org/D22922

llvm-svn: 281050
2016-09-09 14:44:04 +00:00
Chris Dewhurst c59f7c745b [Sparc][LEON] Removed the parts of the errata fixes implemented using inline assembly as this is not the desired behaviour for end-users. Small change to a unit test to implement this without requiring the inline assembly.
llvm-svn: 281047
2016-09-09 14:16:51 +00:00
James Molloy 57d9dfa9ac [ARM] ADD with a negative offset can become SUB for free
So model that directly in TTI::getIntImmCost().

llvm-svn: 281044
2016-09-09 13:35:36 +00:00
James Molloy 1454e90f86 [ARM] icmp %x, -C can be lowered to a simple ADDS or CMN
Tell TargetTransformInfo about this so ConstantHoisting is informed.

llvm-svn: 281043
2016-09-09 13:35:28 +00:00
Simon Pilgrim 153b408433 [SelectionDAG] Ensure DAG::getZeroExtendInReg is called with a scalar type
Fixes issue with rL280927 identified by Mikael Holmén

llvm-svn: 281042
2016-09-09 13:31:52 +00:00
James Molloy 4d86bed0bb [Thumb] Select (CMPZ X, -C) -> (CMPZ (ADDS X, C), 0)
The CMPZ #0 disappears during peepholing, leaving just a tADDi3, tADDi8 or t2ADDri. This avoids having to materialize the expensive negative constant in Thumb-1, and allows a shrinking from a 32-bit CMN to a 16-bit ADDS in Thumb-2.

llvm-svn: 281040
2016-09-09 12:52:24 +00:00
Tim Northover 25d1286e5a GlobalISel: remove G_TYPE and G_PHI
These instructions were only necessary when type information was stored in the
MachineInstr (because only generic MachineInstrs possessed a type). Now that
it's in MachineRegisterInfo, COPY and PHI work fine.

llvm-svn: 281037
2016-09-09 11:47:31 +00:00
Tim Northover 0f140c769a GlobalISel: move type information to MachineRegisterInfo.
We want each register to have a canonical type, which means the best place to
store this is in MachineRegisterInfo rather than on every MachineInstr that
happens to use or define that register.

Most changes following from this are pretty simple (you need an MRI anyway if
you're going to be doing any transformations, so just check the type there).
But legalization doesn't really want to check redundant operands (when, for
example, a G_ADD only ever has one type) so I've made use of MCInstrDesc's
operand type field to encode these constraints and limit legalization's work.

As an added bonus, more validation is possible, both in MachineVerifier and
MachineIRBuilder (coming soon).

llvm-svn: 281035
2016-09-09 11:46:34 +00:00
Simon Dardis ba92b034bf Revert "[mips] Fix c.<cc>.<fmt> instruction definition."
This reverts commit r281022. Mips buildbot broke, due to unhandled register
class FCC.

llvm-svn: 281033
2016-09-09 11:06:01 +00:00
Sam Kolton a2e5c88baf [AMDGPU] Assembler: rename amd_kernel_code_t asm names according to spec
Summary:
Also removed duplicate code from AMDGPUTargetAsmStreamer.
This change only change how amd_kernel_code_t is parsed and printed. No variable names are changed.

Reviewers: vpykhtin, tstellarAMD

Subscribers: arsenm, wdng, nhaehnle

Differential Revision: https://reviews.llvm.org/D24296

llvm-svn: 281028
2016-09-09 10:08:02 +00:00
James Molloy 0f41227b21 [Thumb1] Teach optimizeCompareInstr about thumb1 compares
This avoids us doing a completely unneeded "cmp r0, #0" after a flag-setting instruction if we only care about the Z or C flags.

Add LSL/LSR to the whitelist while we're here and add testing. This code could really do with a spring clean.

llvm-svn: 281027
2016-09-09 09:51:06 +00:00
Simon Dardis 8efa979029 [mips] Fix c.<cc>.<fmt> instruction definition.
As part of this effort, remove MipsFCmp nodes and use tablegen
patterns rather than custom lowering through C++.

Unexpectedly, this improves codesize for microMIPS as previous floating
point setcc expansions would materialize 0 and 1 into GPRs before using
the relevant mov[tf].[sd] instruction. Now $zero is used directly.

Reviewers: dsanders, vkalintiris, zoran.jovanovic

Differential Review: https://reviews.llvm.org/D23118

llvm-svn: 281022
2016-09-09 09:22:52 +00:00
Chris Dewhurst ddad6e028e [Sparc][LEON] Unit test for CASA instruction supported by some LEON processors added.
llvm-svn: 281021
2016-09-09 09:08:13 +00:00
Gor Nishanov faf36c2e0b [Coroutines] Part13: Handle single edge PHINodes across suspends
Summary:
If one of the uses of the value is a single edge PHINode, handle it.

Original:

    %val = something
    <suspend>
    %p = PHINode [%val]

After Spill + Part13:

    %val = something
    %slot = gep val.spill.slot
    store %val, %slot
    <suspend>
    %p = load %slot

Plus tiny fixes/changes:
   * use correct index for coro.free in CoroCleanup
   * fixup id parameter in coro.free to allow authoring coroutine in plain C with __builtins

Reviewers: majnemer

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D24242

llvm-svn: 281020
2016-09-09 05:39:00 +00:00
Craig Topper 149e6bdc16 [AVX-512] Add VPCMP instructions to the load folding tables and make them commutable.
llvm-svn: 281013
2016-09-09 01:36:10 +00:00
Craig Topper 1a1ac11625 [AVX-512] Add more integer vector comparison tests with loads. Some of these show opportunities where we can commute to fold loads.
Commutes will be added in a followup commit.

llvm-svn: 281012
2016-09-09 01:36:04 +00:00
Vedant Kumar a59334da6b [llvm-cov] Emit a summary in the report directory's index
llvm-cov writes out an index file in '-output-dir' mode, albeit not a
very informative one. Try to fix that by using the CoverageReport API to
include some basic summary information in the index file.

llvm-svn: 281011
2016-09-09 01:32:55 +00:00
Vedant Kumar 935bf9a890 [llvm-cov] Speculate fix for a Windows-only test (NFC)
This test should have broken after r280896. Fix up the test case
speculatively, since I don't have a way to test it.

I wonder why I didn't get any angry bot emails about this. Maybe none of
the win32 bots test llvm-cov? That could explain it, since the test says
it 'REQUIRES: system-windows', which is restricted to win32 hosts.

Also: why is 'system-windows' not defined for non-win32 Windows bots?
llvm-svn: 281008
2016-09-09 01:32:47 +00:00
Michael Kuperstein ceff022800 [X86] Add more baseline tests for "irregular" shuffles. NFC.
This adds more tests for shuffles where the output width does not match
the input width and/or the output is generated from more than two inputs.

llvm-svn: 281005
2016-09-09 00:49:29 +00:00
Hans Wennborg c39ef776fc Win64: Don't use REX prefix for direct tail calls
The REX prefix should be used on indirect jmps, but not direct ones.
For direct jumps, the unwinder looks at the offset to determine if
it's inside the current function.

Differential Revision: https://reviews.llvm.org/D24359

llvm-svn: 281003
2016-09-08 23:35:10 +00:00
Dehao Chen 87823f8e4d Remove debug info when hoisting instruction from then/else branch.
Summary: The hoisted instruction is executed speculatively. It could affect the debugging experience as user would see gdb go into code that may not be expected to execute. It will also affect sample profile accuracy by assigning incorrect frequency to source within then/else branch.

Reviewers: davidxl, dblaikie, chandlerc, kcc, echristo

Subscribers: mehdi_amini, probinson, eric_niebler, andreadb, llvm-commits

Differential Revision: https://reviews.llvm.org/D24164

llvm-svn: 280995
2016-09-08 21:53:33 +00:00
Sanjay Patel a4c6223319 [InstCombine] regenerate checks
llvm-svn: 280993
2016-09-08 21:40:21 +00:00
Matthew Simpson bfe5e1817b [LV] Ensure proper handling of multi-use case when collecting uniforms
The test case included in r280979 wasn't checking what it was supposed to be
checking for the predicated store case. Fixing the test revealed that the
multi-use case (when a pointer is used by both vectorized and scalarized memory
accesses) wasn't being handled properly. We can't skip over
non-consecutive-like pointers since they may have looked consecutive-like with
a different memory access.

llvm-svn: 280992
2016-09-08 21:38:26 +00:00
Sanjay Patel ed9fda01a3 [InstCombine] regenerate checks
llvm-svn: 280991
2016-09-08 21:32:21 +00:00
Krzysztof Parzyszek a1218728d3 [RDF] Further improve handling of multiple phis reached from shadows
llvm-svn: 280987
2016-09-08 20:48:42 +00:00
Vedant Kumar 0b33f2c003 [llvm-cov] Fix issues with segment highlighting in the html view
The text and html coverage views take different approaches to emitting
highlighted regions. That's because this problem is easier in the text
view: there's no need to worry about escaping text or adding tooltip
content to a highlighted snippet.

Unfortunately, the html view didn't get region highlighting quite right.

This patch fixes the situation, bringing parity between the two views.

llvm-svn: 280981
2016-09-08 19:18:23 +00:00
Matthew Simpson 408a3abcfe [LV] Don't mark pointers used by scalarized memory accesses uniform
Previously, all consecutive pointers were marked uniform after vectorization.
However, if a consecutive pointer is used by a memory access that is eventually
scalarized, the pointer won't remain uniform after all. An example is
predicated stores. Even though a predicated store may be consecutive, it will
still be scalarized, making it's pointer operand non-uniform.

This patch updates the logic in collectLoopUniforms to consider the cases where
a memory access may be scalarized. If a memory access may be scalarized, its
pointer operand is not marked uniform. The determination of whether a given
memory instruction will be scalarized or not has been moved into a common
function that is used by the vectorizer, cost model, and legality analysis.

Differential Revision: https://reviews.llvm.org/D24271

llvm-svn: 280979
2016-09-08 19:11:07 +00:00
Krzysztof Parzyszek a696b1b641 [Hexagon] Expand sext- and zextloads of vector types, not just extloads
Recent change exposed this issue, breaking the Hexagon buildbots.

llvm-svn: 280973
2016-09-08 17:42:14 +00:00
Matt Arsenault be90f70d3a AMDGPU: Try to commute when selecting s_addk_i32/s_mulk_i32
llvm-svn: 280972
2016-09-08 17:35:41 +00:00
Eric Christopher 98ddbdb563 AArch64 .arch directive - Include default arch attributes with extensions.
Fix the .arch asm parser to use the full set of features for the architecture
and any extensions on the command line. Add and update testcases accordingly
as well as add an extension that was used but not supported.

llvm-svn: 280971
2016-09-08 17:27:03 +00:00
Matt Arsenault bbb47da8a1 AMDGPU: Support commuting with immediate in src0
llvm-svn: 280970
2016-09-08 17:19:29 +00:00
Renato Golin 049f387112 Revert "[XRay] ARM 32-bit no-Thumb support in LLVM"
And associated commits, as they broke the Thumb bots.

This reverts commit r280935.
This reverts commit r280891.
This reverts commit r280888.

llvm-svn: 280967
2016-09-08 17:10:39 +00:00
Dehao Chen ebb715b119 Add unittest for r280760
llvm-svn: 280963
2016-09-08 16:53:40 +00:00
Simon Pilgrim 073472a2bf [InstCombine][X86] Regenerate masked memory op combine tests
llvm-svn: 280960
2016-09-08 16:32:37 +00:00
Simon Pilgrim cd7b2830b9 [InstCombine][X86] Regenerate vperm2f128/vperm2i128 combine tests
llvm-svn: 280959
2016-09-08 16:30:46 +00:00
Simon Pilgrim 3b1ecbe66c [InstCombine][X86] Regenerate insertps combine tests
llvm-svn: 280957
2016-09-08 16:15:21 +00:00
Sam Kolton 1b746d1b9d [TableGen] AsmMatcher: Add AsmVariantName to Instruction class.
Summary:
This allows specifying instructions that are available only in specific assembler variant. If AsmVariantName is specified then instruction will be presented only in MatchTable for this variant. If not specified then assembler variants will be determined based on AsmString.
Also this allows splitting assembler match tables in same way as it is done in dissasembler.

Reviewers: ab, tstellarAMD, craig.topper, vpykhtin

Subscribers: wdng

Differential Revision: https://reviews.llvm.org/D24249

llvm-svn: 280952
2016-09-08 15:50:52 +00:00
Reid Kleckner cb5c98b14a Give an x86 assembler test a triple
llvm-svn: 280950
2016-09-08 15:40:43 +00:00
James Molloy c6a6144966 [SDAGBuilder] Don't create a binary tree for switches in minsize mode
This bloats codesize - all of the non-leaf nodes are extra code.

llvm-svn: 280932
2016-09-08 13:12:22 +00:00
James Molloy 753c18f5c0 [Thumb1] AND with a constant operand can be converted into BIC
So model the cost of materializing the constant operand C as the minimum of
C and ~C.

llvm-svn: 280929
2016-09-08 12:58:12 +00:00
James Molloy 7c7255e40b [Thumb1] Fix cost calculation for complemented immediates
Materializing something like "-3" can be done as 2 instructions:
  MOV r0, #3
  MVN r0, r0

This has a cost of 2, not 3. It looks like we were already trying to detect this pattern in TII::getIntImmCost(), but were taking the complement of the zero-extended value instead of the sign-extended value which is unlikely to ever produce a number < 256.

There were no tests failing after changing this... :/

llvm-svn: 280928
2016-09-08 12:58:04 +00:00
Simon Pilgrim cc7b4b511b [SelectionDAG] Add BUILD_VECTOR support to computeKnownBits and SimplifyDemandedBits
Add the ability to computeKnownBits and SimplifyDemandedBits to extract the known zero/one bits from BUILD_VECTOR, returning the known bits that are shared by every vector element.

This is an initial step towards determining the sign bits of a vector (PR29079).

Differential Revision: https://reviews.llvm.org/D24253

llvm-svn: 280927
2016-09-08 12:57:51 +00:00
Simon Pilgrim a01ee07a19 [DAGCombiner] Enable AND combines of splatted constant vectors
Allow AND combines to use a vector splatted constant as well as a constant scalar.

Preliminary part of D24253.

llvm-svn: 280926
2016-09-08 12:36:39 +00:00
Pablo Barrio 2b7ed1339c Revert "[ARM] Lower UDIV+UREM to UDIV+MLS (and the same for SREM)"
This reverts commit r280808.

It is possible that this change results in an infinite loop. This
is causing timeouts in some tests on ARM, and a Chromebook bot is
failing.

llvm-svn: 280918
2016-09-08 10:05:57 +00:00
Hrvoje Varga dbe4d96b4f [mips][microMIPS] Implement DBITSWAP, DLSA and LWUPC and add tests for AUI instructions
Differential Revision: https://reviews.llvm.org/D16452

llvm-svn: 280909
2016-09-08 07:41:43 +00:00
Vitaly Buka 58a81c6540 [asan] Avoid lifetime analysis for allocas with can be in ambiguous state
Summary:
C allows to jump over variables declaration so lifetime.start can be
avoid before variable usage. To avoid false-positives on such rare cases
we detect them and remove from lifetime analysis.

PR27453
PR28267

Reviewers: eugenis

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24321

llvm-svn: 280907
2016-09-08 06:27:58 +00:00
Michael Zolotukhin e72997a524 Revert "[LoopUnroll] Properly update loop-info when cloning prologues and epilogues."
This reverts commit r280901.

This caused a bunch of failures, reverting it until I investigate them.

llvm-svn: 280905
2016-09-08 03:51:30 +00:00
Michael Zolotukhin 5e0a20697e [LoopUnroll] Properly update loop-info when cloning prologues and epilogues.
Summary:
When cloning blocks for prologue/epilogue we need to replicate the loop
structure from the original loop. It wasn't a problem for the innermost
loops, but it led to an incorrect loop info when we unrolled a loop with
a child loop - in this case created prologue-loop had a child loop, but
loop info didn't reflect that.

This fixes PR28888.

Reviewers: chandlerc, sanjoy, hfinkel

Subscribers: llvm-commits, silvas

Differential Revision: https://reviews.llvm.org/D24203

llvm-svn: 280901
2016-09-08 01:52:26 +00:00
Vedant Kumar c84921774f [llvm-cov] Disable zlib compression in a test input, unbreaks bots
Disable name compression in the inputs used to produce
multiple-files.covmapping. Should fix bots which don't compile with
zlib:

  http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/19610/steps/test/logs/stdio

llvm-svn: 280898
2016-09-08 01:19:29 +00:00
Vedant Kumar 0053c0b679 [llvm-cov] Use less space to describe source names
In r279628, we made SourceCoverageView list the binary associated with a
view and started adding labels (e.g "Source: foo" or "Function: bar") to
everything. Condense this information a bit to unclutter reports.

llvm-svn: 280896
2016-09-08 00:56:48 +00:00
Vedant Kumar fa75437183 [llvm-cov] Drop the longest common filename prefix from summaries
Remove the longest common prefix from filenames when printing coverage
summaries. This makes them easier to compare.

llvm-svn: 280895
2016-09-08 00:56:43 +00:00
Michael Kuperstein f79af6f8c4 [CGP] Be less conservative about tail-duplicating a ret to allow tail calls
CGP tail-duplicates rets into blocks that end with a call that feed the ret.
This puts the call in tail position, potentially allowing the DAG builder to
lower it as a tail call. To avoid tail duplication in cases where we won't
form the tail call, CGP tried to predict whether this is going to be possible,
and avoids doing it when lowering as a tail call will definitely fail.
However, it was being too conservative by always throwing away calls to
functions with a signext/zeroext attribute on the return type.

Instead, we can use the same logic the builder uses to determine whether the
attributes work out.

Differential Revision: https://reviews.llvm.org/D24315

llvm-svn: 280894
2016-09-08 00:48:37 +00:00
Dean Michael Berris 17d94e279e [XRay] ARM 32-bit no-Thumb support in LLVM
This is a port of XRay to ARM 32-bit, without Thumb support yet. The XRay instrumentation support is moving up to AsmPrinter.
This is one of 3 commits to different repositories of XRay ARM port. The other 2 are:

1. https://reviews.llvm.org/D23932 (Clang test)
2. https://reviews.llvm.org/D23933 (compiler-rt)

Differential Revision: https://reviews.llvm.org/D23931

llvm-svn: 280888
2016-09-08 00:19:04 +00:00
Piotr Padlewski 6b96c15b83 Deleted right file
llvm-svn: 280887
2016-09-07 23:46:52 +00:00
Piotr Padlewski 3a0579f699 Revert "[thinlto] Deleted unused test file"
This reverts commit a7ad00460027c4a92640c2a5706a7d1869b60989.

llvm-svn: 280886
2016-09-07 23:46:50 +00:00
Vitaly Buka c5e53b2a53 Revert "[asan] Avoid lifetime analysis for allocas with can be in ambiguous state"
Fails on Windows.

This reverts commit r280880.

llvm-svn: 280883
2016-09-07 23:37:15 +00:00
Piotr Padlewski e66bfd6bd6 [thinlto] Deleted unused test file
Summary:
This file should be referenced from
thinlto-function-summary-callgraph-pgo.ll file,
but someone forgot to use it there. Everything worked because
we store pgo data about callsite blocks, so there is no need to have
pgo count of @func.

Reviewers: tejohnson, eraman, mehdi_amini

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D24309

llvm-svn: 280882
2016-09-07 23:35:46 +00:00
Vitaly Buka 2ca05b07d6 [asan] Avoid lifetime analysis for allocas with can be in ambiguous state
Summary:
C allows to jump over variables declaration so lifetime.start can be
avoid before variable usage. To avoid false-positives on such rare cases
we detect them and remove from lifetime analysis.

PR27453
PR28267

Reviewers: eugenis

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24321

llvm-svn: 280880
2016-09-07 23:18:23 +00:00
Sanjay Patel 9b40f98357 [InstCombine] use m_APInt to allow icmp (and (sh X, Y), C2), C1 folds for splat constant vectors
llvm-svn: 280873
2016-09-07 22:33:03 +00:00
Hal Finkel ac5803ba91 [SimplifyCFG] Don't try to create metadata-valued PHIs
We can't create metadata-valued PHIs; don't try to do so when sinking.

I created a test case for this using the @llvm.type.test intrinsic, because it
takes a metadata parameter and does not have severe side effects (thus
SimplifyCFG is willing to otherwise sink it).

Previously, running the test case would crash with:

  Invalid use of metadata!
    %.sink = select i1 %flag, metadata <...>, metadata <0x4e45dc0>
  LLVM ERROR: Broken function found, compilation aborted!

llvm-svn: 280866
2016-09-07 21:38:22 +00:00
Elena Demikhovsky dcc86d5bb6 Shift-left (ISD::SHL) operation crashes on "DAG Legalization" phase.
https://llvm.org/bugs/show_bug.cgi?id=29058.

While node legalization we tried to legalize its operands.
If an operand node is replaced during legalization the user node may be destroyed.

Differential Revision: https://reviews.llvm.org/D24244

llvm-svn: 280862
2016-09-07 20:54:33 +00:00
Sanjay Patel def931e76a [InstCombine] allow icmp (and X, C2), C1 folds for splat constant vectors
This is a revert of r280676 which was a revert of r280637;
ie, this is r280637 again. It was speculatively reverted to
help debug buildbot failures.

llvm-svn: 280861
2016-09-07 20:50:44 +00:00
Krzysztof Parzyszek 2db0c8b75f [RDF] Fix liveness analysis for phi nodes with shadow uses
Shadow uses need to be analyzed together, since each individual shadow
will only have a partial reaching def. All shadows together may cover
a given register ref, while each individual shadow may not.

llvm-svn: 280855
2016-09-07 20:37:05 +00:00
Wei Mi b96ebe6a1a Rename test pr30298.ll to shrink_vmul_sse.ll, to make the name more meaningful, NFC.
Add PR number and comment in pr30298.ll to explain what is testing.

llvm-svn: 280843
2016-09-07 18:46:15 +00:00
Wei Mi f100d4e93d Don't reduce the width of vector mul if the target doesn't support SSE2.
The patch is to fix PR30298, which is caused by rL272694. The solution is to
bail out if the target has no SSE2.

Differential Revision: https://reviews.llvm.org/D24288

llvm-svn: 280837
2016-09-07 18:22:17 +00:00
Hans Wennborg 6c45f9233c Add more triple to conditional-tailcall.ll test
llvm-svn: 280835
2016-09-07 18:19:31 +00:00
Saleem Abdulrasool 02d9851c1c CodeGen: ensure that libcalls are always AAPCS CC
The original commit was too aggressive about marking LibCalls as AAPCS.  The
libcalls contain libc/libm/libunwind calls which are not AAPCS, but C.

llvm-svn: 280833
2016-09-07 17:56:09 +00:00
Hans Wennborg 75e25f6812 X86: Fold tail calls into conditional branches where possible (PR26302)
When branching to a block that immediately tail calls, it is possible to fold
the call directly into the branch if the call is direct and there is no stack
adjustment, saving one byte.

Example:

  define void @f(i32 %x, i32 %y) {
  entry:
    %p = icmp eq i32 %x, %y
    br i1 %p, label %bb1, label %bb2
  bb1:
    tail call void @foo()
    ret void
  bb2:
    tail call void @bar()
    ret void
  }

before:

  f:
          movl    4(%esp), %eax
          cmpl    8(%esp), %eax
          jne     .LBB0_2
          jmp     foo
  .LBB0_2:
          jmp     bar

after:

  f:
          movl    4(%esp), %eax
          cmpl    8(%esp), %eax
          jne     bar
  .LBB0_1:
          jmp     foo

I don't expect any significant size savings from this (on a Clang bootstrap I
saw 288 bytes), but it does make the code a little tighter.

This patch only does 32-bit, but 64-bit would work similarly.

Differential Revision: https://reviews.llvm.org/D24108

llvm-svn: 280832
2016-09-07 17:52:14 +00:00
Davide Italiano ec9612da1a [lib/LTO] Add a way to run a custom pipeline
Differential Revision:  https://reviews.llvm.org/D24095

llvm-svn: 280830
2016-09-07 17:46:16 +00:00
Yaxun Liu 638914009a AMDGPU: Add hidden kernel arguments to runtime metadata
OpenCL kernels have hidden kernel arguments for global offset and printf buffer. For consistency, these hidden argument should be included in the runtime metadata. Also updated kernel argument kind metadata.

Differential Revision: https://reviews.llvm.org/D23424

llvm-svn: 280829
2016-09-07 17:44:00 +00:00
Reid Kleckner a9f4cc9510 [codeview] Add new directives to record inlined call site line info
Summary:
Previously we were trying to represent this with the "contains" list of
the .cv_inline_linetable directive, which was not enough information.
Now we directly represent the chain of inlined call sites, so we know
what location to emit when we encounter a .cv_loc directive of an inner
inlined call site while emitting the line table of an outer function or
inlined call site. Fixes PR29146.

Also fixes PR29147, where we would crash when .cv_loc directives crossed
sections. Now we write down the section of the first .cv_loc directive,
and emit an error if any other .cv_loc directive for that function is in
a different section.

Also fixes issues with discontiguous inlined source locations, like in
this example:

  volatile int unlikely_cond = 0;
  extern void __declspec(noreturn) abort();
  __forceinline void f() {
    if (!unlikely_cond) abort();
  }
  int main() {
    unlikely_cond = 0;
    f();
    unlikely_cond = 0;
  }

Previously our tables gave bad location information for the 'abort'
call, and the debugger wouldn't snow the inlined stack frame for 'f'.
It is important to emit good line tables for this code pattern, because
it comes up whenever an asan bug occurs in an inlined function. The
__asan_report* stubs are generally placed after the normal function
epilogue, leading to discontiguous regions of inlined code.

Reviewers: majnemer, amccarth

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24014

llvm-svn: 280822
2016-09-07 16:15:31 +00:00
Justin Lebar 3a5f40c191 [LSV] Use the original loads' names for the extractelement instructions.
Summary:
LSV replaces multiple adjacent loads with one vectorized load and a
bunch of extractelement instructions.  This patch makes the
extractelement instructions' names match those of the original loads,
for (hopefully) improved readability.

Reviewers: asbirlea, tstellarAMD

Subscribers: arsenm, mzolotukhin

Differential Revision: https://reviews.llvm.org/D23748

llvm-svn: 280818
2016-09-07 15:49:48 +00:00
Simon Pilgrim d311311beb Fix typo in test - it should be masking bits0-15 not bit16
llvm-svn: 280816
2016-09-07 15:19:07 +00:00
Andrea Di Biagio bdd576dbb0 Regenerate vector bitcast folding tests using update_test_checks.py.
Two tests have been merged together, regenerated and then moved to
a more appropriate directory. No functional change.

llvm-svn: 280814
2016-09-07 14:50:07 +00:00
Simon Pilgrim 32cfa5ba83 [X86][SSE] Added or combine tests for known bits of vectors
Part of the yak shaving for D24253

llvm-svn: 280813
2016-09-07 14:49:50 +00:00
Simon Pilgrim 65cdc058b6 [X86][SSE] Added and+or+zext combine tests for known bits of vectors
Part of the yak shaving for D24253

llvm-svn: 280810
2016-09-07 14:00:52 +00:00
Simon Pilgrim 2415144425 [X86][SSE] Added and+or combine tests currently failing with vectors
(and (or x, C), D) -> D if (C & D) == D

Part of the yak shaving for D24253

llvm-svn: 280809
2016-09-07 13:40:03 +00:00
Pablo Barrio fc752bb70a [ARM] Lower UDIV+UREM to UDIV+MLS (and the same for SREM)
Summary:
This saves a library call to __aeabi_uidivmod. However, the
processor must feature hardware division in order to benefit from
the transformation.

Reviewers: scott-0, jmolloy, compnerd, rengolin

Subscribers: t.p.northover, compnerd, aemerson, rengolin, samparker, llvm-commits

Differential Revision: https://reviews.llvm.org/D24133

llvm-svn: 280808
2016-09-07 12:49:15 +00:00
Andrea Di Biagio f3fd316223 [InstCombine][SSE4a] Fix assertion failure in the insertq/insertqi combining logic.
This fixes a similar issue to the one already fixed by r280804
(revieved in D24256). Revision 280804 fixed the problem with unsafe dyn_casts
in the extrq/extrqi combining logic. However, it turns out that even the
insertq/insertqi logic was affected by the same problem.

llvm-svn: 280807
2016-09-07 12:47:53 +00:00
Andrea Di Biagio 8df5b9cf48 [InstCombine][SSE4a] Fix assertion failure caused by unsafe dyn_casts on the operands of extrq/extrqi intrinsic calls.
This patch fixes an assertion failure caused by unsafe dynamic casts on the
constant operands of sse4a intrinsic calls to extrq/extrqi

The combine logic that simplifies sse4a extrq/extrqi intrinsic calls currently
checks if the input operands are constants. Internally, that logic relies on
dyn_casts of values returned by calls to method Constant::getAggregateElement.
However, method getAggregateElemet may return nullptr if the constant element
cannot be retrieved. So, all the dyn_casts can potentially fail. This is what
happens for example if a constexpr value is passed in input to an extrq/extrqi
intrinsic call.

This patch fixes the problem by using a dyn_cast_or_null (instead of a simple
dyn_cast) on the result of each call to Constant::getAggregateElement.

Added reproducible test cases to x86-sse4a.ll.

Differential Revision: https://reviews.llvm.org/D24256

llvm-svn: 280804
2016-09-07 12:03:03 +00:00
Vasileios Kalintiris 1ed49fd384 [mips] Disable the TImode shift libcalls for 32-bit targets.
Summary:
The o32 ABI doesn't not support the TImode helpers. For the time being,
disable just the shift libcalls as they break recursive builds on MIPS.

Reviewers: sdardis

Subscribers: llvm-commits, sdardis

Differential Revision: https://reviews.llvm.org/D24259

llvm-svn: 280798
2016-09-07 10:01:18 +00:00
James Molloy ec905a62ae [SimplifyCFG] Update workaround for PR30188 to also include loads
I should have realised this the first time around, but if we're avoiding sinking stores where the operands come from allocas so they don't create selects, we also have to do the same for loads because SROA will be just as defective looking at loads of selected addresses as stores.

Fixes PR30188 (again).

llvm-svn: 280792
2016-09-07 08:40:20 +00:00
James Molloy bf1837d9c9 [SimplifyCFG] Check PHI uses more accurately
PR30292 showed a case where our PHI checking wasn't correct. We were checking that all values were used by the same PHI before deciding to sink, but we weren't checking that the incoming values for that PHI were what we expected. As a result, we had to bail out after block splitting which caused us to never reach a steady state in SimplifyCFG.

Fixes PR30292.

llvm-svn: 280790
2016-09-07 08:15:54 +00:00
Hal Finkel 42c83f131e [PowerPC] Fix address-offset folding for plain addi
When folding an addi into a memory access that can take an immediate offset, we
were implicitly assuming that the existing offset was zero. This was incorrect.
If we're dealing with an addi with a plain constant, we can add it to the
existing offset (assuming that doesn't overflow the immediate, etc.), but if we
have anything else (i.e. something that will become a relocation expression),
we'll go back to requiring the existing immediate offset to be zero (because we
don't know what the requirements on that relocation expression might be - e.g.
maybe it is paired with some addis in some relevant way).

On the other hand, when dealing with a plain addi with a regular constant
immediate, the alignment restrictions (from the TOC base pointer, etc.) are
irrelevant.

I've added the test case from PR30280, which demonstrated the bug, but also
demonstrates a missed optimization opportunity (i.e. we don't need the memory
accesses at all).

Fixes PR30280.

llvm-svn: 280789
2016-09-07 07:36:11 +00:00
Elena Demikhovsky f0ddd1b8b5 AVX512F: FMA intrinsic + FNEG - sequence optimization
The previous commit (r280368 - https://reviews.llvm.org/D23313) does not cover AVX-512F, KNL set.
FNEG(x) operation is lowered to (bitcast (vpxor (bitcast x), (bitcast constfp(0x80000000))).
It happens because FP XOR is not supported for 512-bit data types on KNL and we use integer XOR instead.
I added pattern match for integer XOR.

Differential Revision: https://reviews.llvm.org/D24221

llvm-svn: 280785
2016-09-07 06:54:28 +00:00
Craig Topper b880ad3a71 [AVX-512] Add support for commuting masked instructions in findCommutedOpIndices. The default implementation doesn't skip the mask input or the preserved input.
llvm-svn: 280781
2016-09-07 04:46:11 +00:00
Saleem Abdulrasool a7ade33d16 Revert "CodeGen: ensure that libcalls are always AAPCS CC"
This reverts SVN r280683.  Revert until I figure out why this is breaking lli
tests.

llvm-svn: 280778
2016-09-07 03:17:19 +00:00
Zachary Turner c998571493 Re-add "Make FieldList records print as a YAML sequence"
This was originally submitted in r280549, and reverted in r280577
due to breaking one MSVC buildbot.  The issue is that MSVC 2013
doesn't synthesize move constructors.  So even though i was
writing std::move(A) it was copying it, leading to a bogus ArrayRef.
The solution here is to simply remove the std::vector<> from the
type, since it is unused and unnecessary.  This way the ArrayRef
continues to point into the original memory backing the CVType.

llvm-svn: 280769
2016-09-06 23:45:47 +00:00
Hal Finkel 8ca2ed22b2 [DAGCombine] More fixups to SETCC legality checking (visitANDLike/visitORLike)
I might have called this "r246507, the sequel". It fixes the same issue, as the
issue has cropped up in a few more places. The underlying problem is that
isSetCCEquivalent can pick up select_cc nodes with a result type that is not
legal for a setcc node to have, and if we use that type to create new setcc
nodes, nothing fixes that (and so we've violated the contract that the
infrastructure has with the backend regarding setcc node types).

Fixes PR30276.

For convenience, here's the commit message from r246507, which explains the
problem is greater detail:

[DAGCombine] Fixup SETCC legality checking

SETCC is one of those special node types for which operation actions (legality,
etc.) is keyed off of an operand type, not the node's value type. This makes
sense because the value type of a legal SETCC node is determined by its
operands' value type (via the TLI function getSetCCResultType). When the
SDAGBuilder creates SETCC nodes, it either creates them with an MVT::i1 value
type, or directly with the value type provided by TLI.getSetCCResultType.

The first problem being fixed here is that DAGCombine had several places
querying TLI.isOperationLegal on SETCC, but providing the return of
getSetCCResultType, instead of the operand type directly. This does not mean
what the author thought, and "luckily", most in-tree targets have SETCC with
Custom lowering, instead of marking them Legal, so these checks return false
anyway.

The second problem being fixed here is that two of the DAGCombines could create
SETCC nodes with arbitrary (integer) value types; specifically, those that
would simplify:

  (setcc a, b, op1) and|or (setcc a, b, op2) -> setcc a, b, op3
     (which is possible for some combinations of (op1, op2))

If the operands of the and|or node are actual setcc nodes, then this is not an
issue (because the and|or must share the same type), but, the relevant code in
DAGCombiner::visitANDLike and DAGCombiner::visitORLike actually calls
DAGCombiner::isSetCCEquivalent on each operand, and that function will
recognise setcc-like select_cc nodes with other return types. And, thus, when
creating new SETCC nodes, we need to be careful to respect the value-type
constraint. This is even true before type legalization, because it is quite
possible for the SELECT_CC node to have a legal type that does not happen to
match the corresponding TLI.getSetCCResultType type.

To be explicit, there is nothing that later fixes the value types of SETCC
nodes (if the type is legal, but does not happen to match
TLI.getSetCCResultType). Creating SETCCs with an MVT::i1 value type seems to
work only because, either MVT::i1 is not legal, or it is what
TLI.getSetCCResultType returns if it is legal. Fixing that is a larger change,
however. For the time being, restrict the relevant transformations to produce
only SETCC nodes with a value type matching TLI.getSetCCResultType (or MVT::i1
prior to type legalization).

Fixes PR24636.

llvm-svn: 280767
2016-09-06 23:02:23 +00:00
Ying Yi 24e91bd05f [llvm-cov] Add the project summary to the text coverage report for each source file.
This patch is a spin-off from https://reviews.llvm.org/D23922. It extends the text view to preserve the same feature as the html view.

Differential Revision: https://reviews.llvm.org/D24241

llvm-svn: 280756
2016-09-06 21:41:38 +00:00
Konstantin Zhuravlyov 864718c666 [AMDGPU] Wave and register controls
- Add missing test

llvm-svn: 280749
2016-09-06 20:29:10 +00:00
Konstantin Zhuravlyov 1d65026ca6 [AMDGPU] Wave and register controls
- Implemented amdgpu-flat-work-group-size attribute
- Implemented amdgpu-num-active-waves-per-eu attribute
- Implemented amdgpu-num-sgpr attribute
- Implemented amdgpu-num-vgpr attribute
- Dynamic LDS constraints are in a separate patch

Patch by Tom Stellard and Konstantin Zhuravlyov

Differential Revision: https://reviews.llvm.org/D21562

llvm-svn: 280747
2016-09-06 20:22:28 +00:00
Wei Ding 5e832e866e AMDGPU : Add XNACK feature to GPUs that support it.
Differential Revision: http://reviews.llvm.org/D24276

llvm-svn: 280742
2016-09-06 19:55:17 +00:00
Ying Yi d36b47c481 [llvm-cov] Add the "Go to first unexecuted line" feature.
This patch provides easy navigation to find the zero count lines, especially useful when the source file is very large.

Differential Revision: https://reviews.llvm.org/D23277

llvm-svn: 280739
2016-09-06 19:31:18 +00:00
Rafael Espindola b940b66c60 Add an c++ itanium demangler to llvm.
This adds a copy of the demangler in libcxxabi.

The code also has no dependencies on anything else in LLVM. To enforce
that I added it as another library. That way a BUILD_SHARED_LIBS will
fail if anyone adds an use of StringRef for example.

The no llvm dependency combined with the fact that this has to build
on linux, OS X and Windows required a few changes to the code. In
particular:

    No constexpr.
    No alignas

On OS X at least this library has only one global symbol:
__ZN4llvm16itanium_demangleEPKcPcPmPi

My current plan is:

    Commit something like this
    Change lld to use it
    Change lldb to use it as the fallback

    Add a few #ifdefs so that exactly the same file can be used in
    libcxxabi to export abi::__cxa_demangle.

Once the fast demangler in lldb can handle any names this
implementation can be replaced with it and we will have the one true
demangler.

llvm-svn: 280732
2016-09-06 19:16:48 +00:00
Krzysztof Parzyszek 7c9b012629 [RDF] Ignore undef use operands
llvm-svn: 280717
2016-09-06 17:03:13 +00:00
Simon Pilgrim 1b4462b7c1 [SelectionDAG] Simplify extract_subvector( insert_subvector ( Vec, In, Idx ), Idx ) -> In
If we are extracting a subvector that has just been inserted then we should just use the original inserted subvector.

This has come up in certain several x86 shuffle lowering cases where we are crossing 128-bit lanes.

Differential Revision: https://reviews.llvm.org/D24254

llvm-svn: 280715
2016-09-06 16:42:05 +00:00
Adam Nemet c520822dbf [JumpThreading] Only write back branch-weight MDs for blocks that originally had PGO info
Currently the pass updates branch weights in the IR if the function has
any PGO info (entry frequency is set).  However we could still have
regions of the CFG that does not have branch weights collected (e.g. a
cold region).  In this case we'd use static estimates.  Since static
estimates for branches are determined independently, they are
inconsistent.  Updating them can "randomly" inflate block frequencies.

I've run into this in a completely cold loop of h264ref from
SPEC.  -Rpass-with-hotness showed the loop to be completely cold during
inlining (before JT) but completely hot during vectorization (after JT).

The new testcase demonstrate the problem.  We check array elements
against 1, 2 and 3 in a loop.  The check against 3 is the loop-exiting
check.  The block names should be self-explanatory.

In this example, jump threading incorrectly updates the weight of the
loop-exiting branch to 0, drastically inflating the frequency of the
loop (in the range of billions).

There is no run-time profile info for edges inside the loop, so branch
probabilities are estimated.  These are the resulting branch and block
frequencies for the loop body:

                check_1 (16)
            (8) /  |
            eq_1   | (8)
                \  |
                check_2 (16)
            (8) /  |
            eq_2   | (8)
                \  |
                check_3 (16)
            (1) /  |
       (loop exit) | (15)
                   |
              (back edge)

First we thread eq_1 -> check_2 to check_3.  Frequencies are updated to
remove the frequency of eq_1 from check_2 and then from the false edge
leaving check_2.  Changed frequencies are highlighted with * *:

                check_1 (16)
            (8) /  |
           eq_1~   | (8)
           /       |
          /     check_2 (*8*)
         /  (8) /  |
         \  eq_2   | (*0*)
          \     \  |
           ` --- check_3 (16)
            (1) /  |
       (loop exit) | (15)
                   |
              (back edge)

Next we thread eq_1 -> check_3 and eq_2 -> check_3 to check_1 as new
back edges.  Frequencies are updated to remove the frequency of eq_1 and
eq_3 from check_3 and then the false edge leaving check_3 (changed
frequencies are highlighted with * *):

                  check_1 (16)
              (8) /  |
             eq_1~   | (8)
             /       |
            /     check_2 (*8*)
           /  (8) /  |
          /-- eq_2~  | (*0*)
  (back edge)        |
                  check_3 (*0*)
            (*0*) /  |
         (loop exit) | (*0*)
                     |
                (back edge)

As a result, the loop exit edge ends up with 0 frequency which in turn makes
the loop header to have maximum frequency.

There are a few potential problems here:

1. The profile data seems odd.  There is a single profile sample of the
loop being entered.  On the other hand, there are no weights inside the
loop.

2. Based on static estimation we shouldn't set edges to "extreme"
values, i.e. extremely likely or unlikely.

3. We shouldn't create profile metadata that is calculated from static
estimation.  I am not sure what policy is but it seems to make sense to
treat profile metadata as something that is known to originate from
profiling.  Estimated probabilities should only be reflected in BPI/BFI.

Any one of these would probably fix the immediate problem.  I went for 3
because I think it's a good policy to have and added a FIXME about 2.

Differential Revision: https://reviews.llvm.org/D24118

llvm-svn: 280713
2016-09-06 16:08:33 +00:00
Simon Dardis b432a3ed7e [mips] Tighten FastISel restrictions
LLVM PR/29052 highlighted that FastISel for MIPS attempted to lower
arguments assuming that it was using the paired 32bit registers to
perform operations for f64. This mode of operation is not supported
for MIPSR6.

This patch resolves the reported issue by adding additional checks
for unsupported floating point unit configuration.

Thanks to mike.k for reporting this issue!

Reviewers: seanbruno, vkalintiris

Differential Review: https://reviews.llvm.org/D23795

llvm-svn: 280706
2016-09-06 12:36:24 +00:00
Krzysztof Parzyszek 020ec299bf [PPC] Claim stack frame before storing into it, if no red zone is present
Unlike PPC64, PPC32/SVRV4 does not have red zone. In the absence of it 
there is no guarantee that this part of the stack will not be modified 
by any interrupt. To avoid this, make sure to claim the stack frame first
before storing into it.

This fixes https://llvm.org/bugs/show_bug.cgi?id=26519.

Differential Revision: https://reviews.llvm.org/D24093

llvm-svn: 280705
2016-09-06 12:30:00 +00:00
Craig Topper 4fa3b50fc3 [AVX-512] Fix masked VPERMI2PS isel when the index comes from a bitcast.
We need to bitcast the index operand to a floating point type so that it matches the result type. If not then the passthru part of the DAG will be a bitcast from the index's original type to the destination type. This makes it very difficult to match. The other option would be to add 5 sets of patterns for every other possible type.

llvm-svn: 280696
2016-09-06 06:56:59 +00:00
Craig Topper cf9f1b8dfa [AVX-512] Add a test case to show that we don't select masked vpermi2ps when the index operand comes from a bitcast.
It doesn't work because we're looking for a bitcast from the v4i32 index operand to v4f32 for the passthru part of the DAG. But since the index is bitcasted from v2i64 and bitcasts fold, we actually have a bitcast from v2i64 to v4f32 in the passthru part of the DAG.

Taken from optimized output from clang's test case.

llvm-svn: 280695
2016-09-06 05:45:27 +00:00
Saleem Abdulrasool bfa25bd1ac ARM: workaround bundled operation predication
This is a Windows ARM specific issue.  If the code path in the if conversion
ends up using a relocation which will form a IMAGE_REL_ARM_MOV32T, we end up
with a bundle to ensure that the mov.w/mov.t pair is not split up.  This is
normally fine, however, if the branch is also predicated, then we end up trying
to predicate the bundle.

For now, report a bundle as being unpredicatable.  Although this is false, this
would trigger a failure case previously anyways, so this is no worse.  That is,
there should not be any code which would previously have been if converted and
predicated which would not be now.

Under certain circumstances, it may be possible to "predicate the bundle".  This
would require scanning all bundle instructions, and ensure that the bundle
contains only predicatable instructions, and converting the bundle into an IT
block sequence.  If the bundle is larger than the maximal IT block length (4
instructions), it would require materializing multiple IT blocks from the single
bundle.

llvm-svn: 280689
2016-09-06 04:00:12 +00:00
Craig Topper 62d0a5e7d3 [AVX-512] Fix v8i64 shift by immediate lowering on 32-bit targets.
llvm-svn: 280684
2016-09-06 00:31:10 +00:00
Saleem Abdulrasool a6519b1d54 CodeGen: ensure that libcalls are always AAPCS CC
All of the builtins are designed to be invoked with ARM AAPCS CC even on ARM
AAPCS VFP CC hosts.  Tweak the default initialisation to ARM AAPCS CC rather
than C CC for ARM/thumb targets.

The changes to the tests are necessary to ensure that the calling convention for
the lowered library calls are honoured.  Furthermore, these adjustments cause
certain branch invocations to change to branch-and-link since the returned value
needs to be moved across registers (d0 -> r0, r1).

llvm-svn: 280683
2016-09-06 00:28:43 +00:00
Craig Topper dfc4fc9f02 [AVX-512] Teach fastisel load/store handling to use EVEX encoded instructions for 128/256-bit vectors and scalar single/double.
Still need to fix the register classes to allow the extended range of registers.

llvm-svn: 280682
2016-09-05 23:58:40 +00:00
Craig Topper 70e1348031 [X86] Update fast-isel store test to have more 256 and 512-bit test cases. Add command lines for AVX and AVX512 feature sets.
llvm-svn: 280681
2016-09-05 23:58:37 +00:00
Craig Topper f54ebca2dd [X86] Update fast-isel vector load test to have more 256 and 512-bit test cases. Add a command line for SKX features too.
llvm-svn: 280680
2016-09-05 23:58:32 +00:00
Sanjay Patel e341c919b0 fix FileCheck variables for test added with r280677
The script (utils/update_test_checks.py) seems to have problems 
with variable names that start with the same string. 

llvm-svn: 280679
2016-09-05 23:49:32 +00:00
Gor Nishanov ccabaca273 [Coroutines] Part12: Handle alloca address-taken
Summary:
Move early uses of spilled variables after CoroBegin.

For example, if a parameter had address taken, we may end up with the code
like:
        define @f(i32 %n) {
          %n.addr = alloca i32
          store %n, %n.addr
          ...
          call @coro.begin

This patch fixes the problem by moving uses of spilled variables after CoroBegin.

Reviewers: majnemer

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D24234

llvm-svn: 280678
2016-09-05 23:45:45 +00:00
Sanjay Patel eea2ef7862 [InstCombine] don't assert that division-by-constant has been folded (PR30281)
This is effectively a revert of:
https://reviews.llvm.org/rL280115

And this should fix
https://llvm.org/bugs/show_bug.cgi?id=30281:

llvm-svn: 280677
2016-09-05 23:38:22 +00:00
Sanjay Patel 46f9df5b71 [InstCombine] revert r280637 because it causes test failures on an ARM bot
http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/14952/steps/ninja%20check%201/logs/FAIL%3A%20LLVM%3A%3Aicmp.ll

llvm-svn: 280676
2016-09-05 22:36:32 +00:00
Simon Pilgrim 6dd4b3526a [X86][SSE] Add test cases for PR29078
'Failure to recognise i64 sitofp/uitofp conversions that can be performed as i32'

llvm-svn: 280671
2016-09-05 18:11:17 +00:00
Simon Pilgrim efab350967 [X86][SSE] Add test cases for PR29079
'Failure to recognise uitofp conversions that can be performed as sitofp'

llvm-svn: 280670
2016-09-05 18:04:38 +00:00
Simon Pilgrim 5f14ff75ec [X86][SSE] Regenerate odd shuffle tests with common prefixes
llvm-svn: 280661
2016-09-05 14:15:38 +00:00
Oliver Stannard ef38d53a7e [SimplifyCFG] Add test for sinking inline asm in if/else
This test code previously caused a failure in the module verifier,
because SimplifyCFG created this invalid instruction, which tries to
take the address of inline asm:
  %.sink = select i1 %1, i64 ()* asm "mov $0, #1", "=r", i64 ()* asm %"mov $0, #2", "=r"

This has been fixed recently, presumably by James Molloy's patches that
re-wrote and changed parts of SimplifyCFG, so this patch just adds a
regression test for it.

Differential Revision: https://reviews.llvm.org/D24231

llvm-svn: 280660
2016-09-05 13:49:26 +00:00
James Molloy 728cf85950 [Thumb1] Add relocations for fixups fixup_arm_thumb_{br,bcc}
These need to be mapped through to R_ARM_THM_JUMP{11,8} respectively.

Fixes PR30279.

llvm-svn: 280651
2016-09-05 08:29:15 +00:00
Igor Breger a2f8ca9a34 [AVX512] Fix v8i1 /v16i1 zext + bitcast lowering pattern. Explicitly zero upper bits.
Differential Revision: http://reviews.llvm.org/D23983

llvm-svn: 280650
2016-09-05 08:26:51 +00:00
Craig Topper d9ca3d97ef [AVX-512] Simplify X86InstrInfo::copyPhysReg for 128/256-bit vectors with AVX512, but not VLX. We should use the VEX opcodes and trust the register allocator to not use the extended XMM/YMM register space.
Previously we were extending to copying the whole ZMM register. The register allocator shouldn't use XMM16-31 or YMM16-31 in this configuration as the instructions to spill them aren't available.

llvm-svn: 280648
2016-09-05 06:43:06 +00:00
Gor Nishanov 0e18f75a92 [Coroutines] Part11: Add final suspend handling.
Summary:
A frontend may designate a particular suspend to be final, by setting the second argument of the coro.suspend intrinsic to true. Such a suspend point has two properties:

* it is possible to check whether a suspended coroutine is at the final suspend point via coro.done intrinsic;
* a resumption of a coroutine stopped at the final suspend point leads to undefined behavior. The only possible action for a coroutine at a final suspend point is destroying it via coro.destroy intrinsic.

This patch adds final suspend handling logic to CoroEarly and CoroSplit passes.
Now, the final suspend point example from docs\Coroutines.rst compiles and produces expected result (see test/Transform/Coroutines/ex5.ll).

Reviewers: majnemer

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D24068

llvm-svn: 280646
2016-09-05 04:44:30 +00:00
Craig Topper 42b69dd961 [X86] Add AVX and AVX512 command lines to the vec_ss_load_fold test.
llvm-svn: 280645
2016-09-05 02:20:53 +00:00
Sanjay Patel c641e9d6ff [InstCombine] allow icmp (and X, C2), C1 folds for splat constant vectors
The code to calculate 'UsesRemoved' could be simplified.
As-is, that code is a victim of PR30273:
https://llvm.org/bugs/show_bug.cgi?id=30273

llvm-svn: 280637
2016-09-04 20:58:27 +00:00
Simon Pilgrim cded03f163 [X86] Regenerate x64 mmx/f64 return value tests
llvm-svn: 280634
2016-09-04 18:14:45 +00:00
Craig Topper 4177345d7f [AVX-512] Remove 128-bit and 256-bit masked floating point add/sub/mul/div intrinsics and upgrade to native IR.
llvm-svn: 280633
2016-09-04 18:13:33 +00:00
Lang Hames 38c7927b6f [ORC] Clone module flags metadata into the globals module in the
CompileOnDemandLayer.

Also contains a tweak to the orc-lazy jit in LLI to enable the test case.

llvm-svn: 280632
2016-09-04 17:53:30 +00:00
Simon Pilgrim 128047fde5 [X86] Regenerate trunc-store legalization test
llvm-svn: 280631
2016-09-04 17:50:03 +00:00
Simon Pilgrim c228f5c195 [X86][SSE] Regenerate fcmp/uitofp combine tests
llvm-svn: 280629
2016-09-04 17:16:01 +00:00
Igor Breger 7e2a0dfa0c revert r279960.
https://llvm.org/bugs/show_bug.cgi?id=30249

llvm-svn: 280625
2016-09-04 14:03:52 +00:00
Simon Pilgrim 9a36318c54 EOL fixes
llvm-svn: 280624
2016-09-04 13:30:46 +00:00
Dorit Nuzman abd15f69b2 [InstCombine] Preserve llvm.mem.parallel_loop_access metadata when replacing
memcpy with ld/st.

When InstCombine replaces a memcpy with loads+stores it does not copy over the
llvm.mem.parallel_loop_access from the memcpy instruction. This patch fixes
that.

Differential Revision: https://reviews.llvm.org/D23499

llvm-svn: 280617
2016-09-04 07:49:39 +00:00
Hal Finkel 73390c7acd [PowerPC] Zero-extend constants in FastISel
As it turns out, whether we zero-extend or sign-extend i8/i16 constants, which
are illegal types promoted to i32 on PowerPC, is a choice constrained by
assumptions within the infrastructure. Specifically, the logic in
FunctionLoweringInfo::ComputePHILiveOutRegInfo assumes that constant PHI
operands will be zero extended, and so, at least when materializing constants
that are PHI operands, we must do the same.

The rest of our fast-isel implementation does not appear to depend on the fact
that we were sign-extending i8/i16 constants, and all other targets also appear
to zero-extend small-bitwidth constants in fast-isel; we'll now do the same (we
had been doing this only for i1 constants, and sign-extending the others).

Fixes PR27721.

llvm-svn: 280614
2016-09-04 06:07:19 +00:00
Craig Topper af0d63d2e7 [AVX-512] Remove masked integer add/sub/mull intrinsics and upgrade to native IR.
llvm-svn: 280611
2016-09-04 02:09:53 +00:00
Joseph Tremoulet e92e0a9042 Fix inliner funclet unwind memoization
Summary:
The inliner may need to determine where a given funclet unwinds to,
and this determination may depend on other funclets throughout the
funclet tree.  The code that performs this walk in getUnwindDestToken
memoizes results to avoid redundant computations.  In the case that
a funclet's unwind destination is derived from its ancestor, there's
code to walk back down the tree from the ancestor updating the memo
map of its descendants to record the unwind destination.  This change
fixes that code to account for the case that some descendant has a
different unwind destination, which can happen if that unwind dest
is a descendant of the EHPad being queried and thus didn't determine
its unwind destination.

Also update test inline-funclets.ll, which is supposed to cover such
scenarios, to include a case that fails an assertion without this fix
but passes with it.

Fixes PR29151.


Reviewers: majnemer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24117

llvm-svn: 280610
2016-09-04 01:23:20 +00:00
Xinliang David Li 241e6c7086 [Profile] preserve branch metadata lowering select in CGP
CGP currently drops select's MD_prof profile data when
generating conditional branch which can lead to bad
code layout. The patch fixes the issue.

Differential Revision: http://reviews.llvm.org/D24169

llvm-svn: 280600
2016-09-03 21:26:36 +00:00
Mehdi Amini ebb3434850 Fix ThinLTO crash with debug info
Because the recent change about ODR type uniquing in the context,
we can reach types defined in another module during IR linking.
This triggered some assertions in case we IR link without starting
from an empty module. To alleviate that, we can self-map metadata
defined in the destination module so that they won't be visited.

Differential Revision: https://reviews.llvm.org/D23841

llvm-svn: 280599
2016-09-03 21:12:33 +00:00
Craig Topper 907b580d72 [AVX-512] Add integer ADD/SUB instructions to load folding tables. Add an AVX512 stack folding test.
llvm-svn: 280593
2016-09-03 17:20:07 +00:00
Nicolai Haehnle 3bba6a8438 AMDGPU: Reduce the duration of whole-quad-mode
Summary:
This contains two changes that reduce the time spent in WQM, with the
intention of reducing bandwidth required by VMEM loads:

1. Sampling instructions by themselves don't need to run in WQM, only their
   coordinate inputs need it (unless of course there is a dependent sampling
   instruction). The initial scanInstructions step is modified accordingly.

2. When switching back from WQM to Exact, switch back as soon as possible.
   This affects the logic in processBlock.

This should always be a win or at best neutral.

There are also some cleanups (e.g. remove unused ExecExports) and some new
debugging output.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D22092

llvm-svn: 280590
2016-09-03 12:26:38 +00:00
Nicolai Haehnle a246dccc26 AMDGPU: Fix an interaction between WQM and polygon stippling
Summary:
This fixes a rare bug in polygon stippling with non-monolithic pixel shaders.

The underlying problem is as follows: the prolog part contains the polygon
stippling sequence, i.e. a kill. The main part then enables WQM based on the
_reduced_ exec mask, effectively undoing most of the polygon stippling.

Since we cannot know whether polygon stippling will be used, the main part
of a non-monolithic shader must always return to exact mode to fix this
problem.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D23131

llvm-svn: 280589
2016-09-03 12:26:32 +00:00
Matt Arsenault 46a0382ab2 AMDGPU: Do basic folding of class intrinsic
This allows more of the OCML builtin library to be
constant folded.

llvm-svn: 280586
2016-09-03 07:06:58 +00:00
Matt Arsenault 2510a31677 AMDGPU: Fix spilling of m0
readlane/writelane do not support using m0 as the output/input.
Constrain the register class of spill vregs to try to avoid this,
but also handle spilling of the physreg when necessary by inserting
an additional copy to a normal SGPR.

llvm-svn: 280584
2016-09-03 06:57:55 +00:00
Craig Topper 892ce56901 [AVX-512] Add EVEX encoded VPCMPEQ and VPCMPGT to the load folding tables.
llvm-svn: 280581
2016-09-03 04:37:50 +00:00
Nico Weber 05e78450be Revert r280549.
The test it added doesn't pass:
http://lab.llvm.org:8011/builders/clang-x64-ninja-win7/builds/15318/steps/ninja%20check%201/logs/FAIL%3A%20LLVM%3A%3Apdbdump-yaml-types.test

Command Output (stdout):
--
$ "D:/buildslave/clang-x64-ninja-win7/stage1/./bin\llvm-pdbdump.EXE" "pdb2yaml" "-tpi-stream" "D:\buildslave\clang-x64-ninja-win7\llvm\test\DebugInfo\PDB/Inputs/empty.pdb"
$ "D:/buildslave/clang-x64-ninja-win7/stage1/./bin\FileCheck.EXE" "-check-prefix=YAML" "D:\buildslave\clang-x64-ninja-win7\llvm\test\DebugInfo\PDB\pdbdump-yaml-types.test"
# command stderr:
D:\buildslave\clang-x64-ninja-win7\llvm\test\DebugInfo\PDB\pdbdump-yaml-types.test:36:7: error: expected string not found in input
YAML: Name: apartment
      ^
<stdin>:153:10: note: scanning from here
 Value: 161
         ^

llvm-svn: 280577
2016-09-03 03:18:49 +00:00
Hal Finkel 522e4d9d66 [PowerPC] Support asm parsing for bc[l][a][+-] mnemonics
PowerPC assembly code in the wild, so it seems, has things like this:

  bc+     12, 28, .L9

This is a bit odd because the '+' here becomes part of the BO field, and the BO
field is otherwise the first operand. Nevertheless, the ISA specification does
clearly say that the +- hint syntax applies to all conditional-branch mnemonics
(that test either CTR or a condition register, although not the forms which
check both), both basic and extended, so this is supposed to be valid.

This introduces some asm-parser-only definitions which take only the upper
three bits from the specified BO value, and the lower two bits are implied by
the +- suffix (via some associated aliases).

Fixes PR23646.

llvm-svn: 280571
2016-09-03 02:31:44 +00:00
Wei Mi c37307a5f4 Fix buildbot error.
Add -mtriple=x86_64-unknown-linux-gnu for the test and move it to CodeGen/X86.

llvm-svn: 280568
2016-09-03 01:43:28 +00:00
Hal Finkel 28842b96f3 [PowerPC] Add asm parser/disassembler support for hrfid,nap,slbmfev
These few book-III instructions are used by the Linux kernel.

Partially fixes PR24796.

llvm-svn: 280560
2016-09-02 23:42:01 +00:00
Hal Finkel 277736eee6 [PowerPC] Add support for the extended dcbf form and mnemonics
dcbf has an optional hint-like field, add support for the extended form and the
associated mnemonics (dcbfl and dcbflp).

Partially fixes PR24796.

llvm-svn: 280559
2016-09-02 23:41:54 +00:00
Zachary Turner 83849415de [codeview] Make FieldList records print as a yaml sequence.
Before we were kind of imitating the behavior of a Yaml sequence
by outputting each record one after the other.  This makes it a
little cumbersome when we want to go the other direction -- from
Yaml to Pdb.  So this treats FieldList records as no different than
any other list of records, by printing them as a Yaml sequence with
the exact same format.

llvm-svn: 280549
2016-09-02 22:19:01 +00:00
Xinliang David Li 40afd5c9e4 [Profile] handle select instruction in 'expect' lowering
Builtin expect lowering currently ignores select. This patch
fixes the issue

Differential Revision: http://reviews.llvm.org/D24166

llvm-svn: 280547
2016-09-02 22:03:40 +00:00
Hal Finkel 7b104d4721 [PowerPC] For larger offsets, when possible, fold offset into addis toc@ha
When we have an offset into a global, etc. that is accessed relative to the TOC
base pointer, and the offset is larger than the minimum alignment of the global
itself and the TOC base pointer (which is 8-byte aligned), we can still fold
the @toc@ha into the memory access, but we must update the addis instruction's
symbol reference with the offset as the symbol addend. When there is only one
use of the addi to be folded and only one use of the addis that would need its
symbol's offset adjusted, then we can make the adjustment and fold the @toc@l
into the memory access.

llvm-svn: 280545
2016-09-02 21:37:07 +00:00
Jan Vesely ea45746d5a AMDGPU/R600: EXTRACT_VECT_ELT should only bypass BUILD_VECTOR if the vectors have the same number of elements.
Fixes R600 piglit regressions since r280298

Differential Revision: https://reviews.llvm.org/D24174

llvm-svn: 280535
2016-09-02 20:13:19 +00:00
Krzysztof Parzyszek 3bf4aeccd6 Do not consider subreg defs as reads when computing subrange liveness
Subregister definitions are considered uses for the purpose of tracking
liveness of the whole register. At the same time, when calculating live
interval subranges, subregister defs should not be treated as uses.

Differential Revision: https://reviews.llvm.org/D24190

llvm-svn: 280532
2016-09-02 19:48:55 +00:00
Sanjay Patel 70277411d3 [InstCombine] auto-generate assertions for tighter checking
llvm-svn: 280531
2016-09-02 19:38:37 +00:00
Jan Vesely 00864886f4 AMDGPU/R600: Expand unaligned writes to local and global AS
LOCAL and GLOBAL AS only
PRIVATE needs special treatment

Differential Revision: https://reviews.llvm.org/D23971

llvm-svn: 280526
2016-09-02 19:07:06 +00:00
Jan Vesely cd6b12b12e AMDGPU: Reorganize store tests
Split by AS.
Merge with some prviously failing tests.

Differential Revision: https://reviews.llvm.org/D23969

llvm-svn: 280523
2016-09-02 18:52:28 +00:00
Reid Kleckner fa28396f97 [codeview] Use the correct max CV record length of 0xFF00
Previously we were splitting our records at 0xFFFF bytes, which the
Microsoft tools don't like.

Should fix failure on the new Windows self-host buildbot.

This length appears in microsoft-pdb/PDB/dbi/dbiimpl.h

llvm-svn: 280522
2016-09-02 18:43:27 +00:00
Kyle Butt 8699921c4b IfConversion: Fix bug introduced by rescanning diamonds.
Passing the wrong values for predicate-clobbering. Simple to miss.
Added an assert to make this easier to catch in the future.

llvm-svn: 280517
2016-09-02 18:29:26 +00:00
Wei Mi c54d1298f5 Split the store of a wide value merged from an int-fp pair into multiple stores.
For the store of a wide value merged from a pair of values, especially int-fp pair,
sometimes it is more efficent to split it into separate narrow stores, which can
remove the bitwise instructions or sink them to colder places.

Now the feature is only enabled on x86 target, and only store of int-fp pair is
splitted. It is possible that the application scope gets extended with perf evidence
support in the future.

Differential Revision: https://reviews.llvm.org/D22840

llvm-svn: 280505
2016-09-02 17:17:04 +00:00
Sanjay Patel 521f19f249 [InsttCombine] fold insertelement of constant into shuffle with constant operand (PR29126)
The motivating case occurs with SSE/AVX scalar intrinsics, so this is a first step towards
shrinking that to a single shufflevector.

Note that the transform is intentionally limited to shuffles that are equivalent to vector
selects to avoid creating arbitrary shuffle masks that may not lower well.

This should solve PR29126:
https://llvm.org/bugs/show_bug.cgi?id=29126

Differential Revision: https://reviews.llvm.org/D23886

llvm-svn: 280504
2016-09-02 17:05:43 +00:00
Matthew Simpson b65c230eab [LV] Ensure reverse interleaved group GEPs remain uniform
For uniform instructions, we're only required to generate a scalar value for
the first vector lane of each unroll iteration. Thus, if we have a reverse
interleaved group, computing the member index off the scalar GEP corresponding
to the last vector lane of its pointer operand technically makes the GEP
non-uniform. We should compute the member index off the first scalar GEP
instead.

I've added the updated member index computation to the existing reverse
interleaved group test.

llvm-svn: 280497
2016-09-02 16:19:22 +00:00
Andrea Di Biagio 805815f407 [instsimplify] Fix incorrect folding of an ordered fcmp with a vector of all NaN.
This patch fixes a crash caused by an incorrect folding of an ordered comparison
between a packed floating point vector and a splat vector of NaN.

An ordered comparison between a vector and a constant vector of NaN, should
always be folded into a constant vector where each element is i1 false.

Since revision 266175, SimplifyFCmpInst folds the ordered fcmp into a scalar
'false'. Later on, this would cause an assertion failure, since the value type
of the folded value doesn't match the expected value type of the uses of the
original instruction: "Assertion failed: New->getType() == getType() &&
"replaceAllUses of value with new value of different type!".

This patch fixes the issue and adds a test case to the already existing test
InstSimplify/floating-point-compares.ll.

Differential Revision: https://reviews.llvm.org/D24143

llvm-svn: 280488
2016-09-02 14:47:43 +00:00
Andrea Di Biagio fd503e5af3 [DAGcombiner] Fix incorrect sinking of a truncate into the operand of a shift.
This fixes a regression introduced by revision 268094.
Revision 268094 added the following dag combine rule:
// trunc (shl x, K) -> shl (trunc x), K => K < vt.size / 2

That rule converts a truncate of a shift-by-constant into a shift of a truncated
value. We do this only if the shift count is less than half the size in bits of
the truncated value (K < vt.size / 2).

The problem is that the constraint on the shift count is incorrect, so the rule
doesn't work well in some cases involving vector types. The combine rule should
have been written instead like this:
// trunc (shl x, K) -> shl (trunc x), K => K < vt.getScalarSizeInBits()

Basically, if K is smaller than the "scalar size in bits" of the truncated value
then we know that by "sinking" the truncate into the operand of the shift we
would never accidentally make the shift undefined.

This patch fixes the check on the shift count, and adds test cases to make sure
that we don't regress the behavior.

Differential Revision: https://reviews.llvm.org/D24154

llvm-svn: 280482
2016-09-02 11:29:09 +00:00
Alexey Bataev ed27d69037 [InstCombine] Add test for insertelementinsts with constants.
Added a tests that shows that several insertelementinsts with constant
indexes/data are not folded into a single shuffleinst.

llvm-svn: 280474
2016-09-02 09:00:53 +00:00
George Rimar d8a4ecac3b [llvm-readobj] - Teach readobj to print DT_AUXILIARY dynamic tag in human readable form.
Previously DT_AUXILIARY was unknown, patch fixes that.

Differential revision: https://reviews.llvm.org/D24138

llvm-svn: 280471
2016-09-02 07:35:19 +00:00
James Molloy f3cf2a494b [SimplifyCFG] Add a workaround to fix PR30188
We're sinking stores, which is a good thing, but in the process creating selects for the store address operand, which SROA/Mem2Reg can't look through, which caused serious regressions.

The real fix is in SROA, which I'll be looking into.

llvm-svn: 280470
2016-09-02 07:29:00 +00:00
Craig Topper ad79bf4712 [AVX-512] Move tests for masked floating point logical operations to avx512dqvl-intrinsics-upgrade.ll since they have now been autoupgraded.
llvm-svn: 280467
2016-09-02 06:11:31 +00:00
Craig Topper 45d6503089 [AVX-512] Add more patterns for masked and broadcasted logical operations where the select or broadcast has a floating point type.
These are needed in order to remove the masked floating point logical operation intrinsics and use native IR.

llvm-svn: 280465
2016-09-02 05:29:13 +00:00
Craig Topper 00aecd97bf [AVX-512] Add execution domain fixing for logical operations with broadcast loads. This builds on the handling of masked ops since we need to keep element size the same.
llvm-svn: 280464
2016-09-02 05:29:09 +00:00
Hal Finkel 5ef4b03106 [PowerPC] hasAndNotCompare should return true
As Sanjay suggested when he added the hook, PPC should return true from
hasAndNotCompare. We have an efficient negated 'and' on PPC (which can feed a
compare).

Fixes PR27203.

llvm-svn: 280457
2016-09-02 02:58:25 +00:00
Hal Finkel a39fd4bc53 [PowerPC] Add a pattern for a runtime bit check
Following a suggestion by Sanjay, we should lower:

  %shl = shl i32 1, %y
  %and = and i32 %x, %shl
  %cmp = icmp eq i32 %and, %shl
  ret i1 %cmp

into:

  subfic r4, r4, 32
  rlwnm r3, r3, r4, 31, 31

Add this pattern and some associated patterns for the 64-bit case and the
not-equal case. Fixes PR27356.

llvm-svn: 280454
2016-09-02 02:34:44 +00:00
NAKAMURA Takumi 9aec5c3797 llvm/test/Transforms/GCOVProfiling/three-element-mdnode.ll: Use %/T instead of %T, not to emit backslashes.
llvm-svn: 280451
2016-09-02 01:33:00 +00:00
Hal Finkel b54579fab6 [PowerPC] Don't apply the PPC64 address-formation peephole for offsets greater than 7
When applying our address-formation PPC64 peephole, we are reusing the @ha TOC
addis value with the low parts associated with different offsets (i.e.
different effective symbol addends). We were assuming this was okay so long as
the offsets were less than the alignment of the global variable being accessed.
This ignored the fact, however, that the TOC base pointer itself need only be
8-byte aligned. As a result, what we were doing is legal only for offsets less
than 8 regardless of the alignment of the object being accessed.

Fixes PR28727.

llvm-svn: 280441
2016-09-02 00:28:20 +00:00
Hal Finkel 1e8218cc09 [PowerPC] Don't consider fusion in PPC64 address-formation peephole
The logic in this function assumes that the P8 supports fusion of addis/addi,
but it does not. As a result, there is no advantage to restricting our peephole
application, merging addi instructions into dependent memory accesses, even
when the addi has multiple users, regardless of whether or not we're optimizing
for size.

We might need something like this again for the P9; I suspect we'll revisit
this code when we work on P9 tuning.

llvm-svn: 280440
2016-09-02 00:27:50 +00:00
Michael Kuperstein 7bc54cebea [Legalizer] Don't throw away false low half when expanding GT/LT SETCC
When expanding a SETCC for which the low half is known to evaluate to false,
we can only throw it away for LT/GT comparisons, not LE/GE.

This fixes PR29170.

Differential Revision: https://reviews.llvm.org/D24151

llvm-svn: 280424
2016-09-01 23:02:32 +00:00
Michael Kuperstein 5f17d08f49 [SelectionDAG] Generate vector_shuffle nodes for undersized result vector sizes
Prior to this, we could generate a vector_shuffle from an IR shuffle when the
size of the result was exactly the sum of the sizes of the input vectors.
If the output vector was narrower - e.g. a <12 x i8> being formed by a shuffle
with two <8 x i8> inputs - we would lower the shuffle to a sequence of extracts
and inserts.

Instead, we can form a larger vector_shuffle, and then extract a subvector
of the right size - e.g. shuffle the two <8 x i8> inputs into a <16 x i8>
and then extract a <12 x i8>.

This also includes a target-specific X86 combine that in the presence of
AVX2 combines:
(vector_shuffle <mask> (concat_vectors t1, undef)
                       (concat_vectors t2, undef))
into:
(vector_shuffle <mask> (concat_vectors t1, t2), undef)
in cases where this allows us to form VPERMD/VPERMQ.

(This is not a separate commit, as that pattern does not appear without
the DAGBuilder change.)

llvm-svn: 280418
2016-09-01 21:32:09 +00:00
Heejin Ahn c0f18172f5 [WebAssembly] Add asm.js-style setjmp/longjmp handling for wasm (reland r280302)
Summary: This patch adds asm.js-style setjmp/longjmp handling support for WebAssembly. It also uses JavaScript's try and catch mechanism.

Reviewers: jpp, dschuff

Subscribers: jfb, dschuff

Differential Revision: https://reviews.llvm.org/D24121

llvm-svn: 280415
2016-09-01 21:05:15 +00:00
Tim Northover 8d8812c5d7 GlobalISel: add a G_PHI instruction to give phis a type.
They're another source of generic vregs, which are going to need a type on the
definition when we remove the register width from MachineRegisterInfo.

llvm-svn: 280412
2016-09-01 20:45:41 +00:00
Sanjay Patel 199a8e6a92 [InstCombine] add tests to show potential shuffle+insert folds
llvm-svn: 280403
2016-09-01 19:14:19 +00:00
Andrey Turetskiy cde38b6a99 [X86] Loosen memory folding requirements for cvtdq2pd and cvtps2pd instructions.
According to spec cvtdq2pd and cvtps2pd instructions don't require memory operand to be aligned
to 16 bytes. This patch removes this requirement from the memory folding table.

Differential Revision: https://reviews.llvm.org/D23919

llvm-svn: 280402
2016-09-01 18:50:02 +00:00
Yaxun Liu add05a8d95 AMDGPU: Add runtime metadata for pointee alignment of argument.
Add runtime metdata for pointee alignment of pointer type kernel argument. The key is KeyArgPointeeAlign and the value is a 32 bit unsigned integer.

Differential Revision: https://reviews.llvm.org/D24145

llvm-svn: 280399
2016-09-01 18:46:49 +00:00
Michael Kuperstein 65bc3c89ff [DAGCombine] Don't fold a trunc if it feeds an anyext
Legalization tends to create anyext(trunc) patterns. This should always be
combined - into either a single trunc, a single ext, or nothing if the
types match exactly. But if we happen to combine the trunc first, we may pull
the trunc away from the anyext or make it implicit (e.g. the truncate(extract)
-> extract(bitcast) fold).

To prevent this, we can avoid doing the fold, similarly to how we already handle
fpround(fpextend).

Differential Revision: https://reviews.llvm.org/D23893

llvm-svn: 280386
2016-09-01 17:59:24 +00:00
Simon Dardis bd27154757 [mips] interAptiv based generic schedule model
This scheduler describes a processor which covers all MIPS ISAs based
around the interAptiv and P5600 timings.

Reviewers: vkalintiris, dsanders

Differential Revision: https://reviews.llvm.org/D23551

llvm-svn: 280374
2016-09-01 14:53:53 +00:00