Commit Graph

141900 Commits

Author SHA1 Message Date
Matt Arsenault 7b00cf4706 AMDGPU: Fix isTypeDesirableForOp for i16
This should do nothing for targets without i16.

llvm-svn: 289235
2016-12-09 17:57:43 +00:00
Simon Pilgrim 017b7a71d8 [SelectionDAG] Add knownbits support for EXTRACT_VECTOR_ELT opcodes (REAPPLIED)
Reapplied with fix for PR31323 - X86 SSE2 vXi16 multiplies for illegal types were creating CONCAT_VECTORS nodes with vector inputs that might not total the number of elements in the result type.

llvm-svn: 289232
2016-12-09 17:53:11 +00:00
Matt Arsenault 38d8ed2b75 AMDGPU: Fix i128 mul
llvm-svn: 289231
2016-12-09 17:49:14 +00:00
Matt Arsenault 52facf0195 AMDGPU: Allow TBA, TMA, TTMP* registers with SMEM instructions
Fixes assembler regressions.

llvm-svn: 289230
2016-12-09 17:49:11 +00:00
Matt Arsenault eb4a55e066 AMDGPU: Clean up instruction bits
Sort the instruction bits by type and make sure there is one
for each format.

Also cleanup namespaces.

llvm-svn: 289229
2016-12-09 17:49:08 +00:00
Sean Fertile 1c4109b4c2 [PPC] Add intrinsics for vector extract word and vector insert word.
Revision: https://reviews.llvm.org/D26547
llvm-svn: 289227
2016-12-09 17:21:42 +00:00
Nirav Dave bedb5d906c Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r289221 which appears to be triggering an assertion

llvm-svn: 289226
2016-12-09 17:18:24 +00:00
Nirav Dave fd51ff4fd8 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Retrying after fixing overly aggressive load-store forwarding optimization.

Simplify Consecutive Merge Store Candidate Search

Now that address aliasing is much less conservative, push through
simplified store merging search which only checks for parallel stores
through the chain subgraph. This is cleaner as the separation of
non-interfering loads/stores from the store-merging logic.

Whem merging stores, search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited. This improves the quality of the
output SelectionDAG and generally the output CodeGen (with some
exceptions).

Additional Minor Changes:

   1. Finishes removing unused AliasLoad code
   2. Unifies the the chain aggregation in the merged stores across
      code paths
   3. Re-add the Store node to the worklist after calling
      SimplifyDemandedBits.
   4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
      arbitrary, but seemed sufficient to not cause regressions in
      tests.

This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.

Many tests required some changes as memory operations are now
reorderable. Some tests relying on the order were changed to use
volatile memory operations

Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -

      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and
      merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 289221
2016-12-09 16:15:12 +00:00
Simon Pilgrim b9eb99f570 Use SelectionDAG.getSplatBuildVector helper. NFCI.
llvm-svn: 289220
2016-12-09 16:01:50 +00:00
Tom Stellard 2a48433fcf AMDGPU/SI: Don't mark VINTRP instructions as mayLoad
Summary:
These instructions technically do read from memory, but the memory
is considered to be out of bounds for normal load/store instructions.

shader-db stats:

SGPRS: 1416075 -> 1413323 (-0.19 %)
VGPRS: 867413 -> 863935 (-0.40 %)
Spilled SGPRs: 1409 -> 1354 (-3.90 %)
Spilled VGPRs: 63 -> 63 (0.00 %)
Private memory VGPRs: 880 -> 880 (0.00 %)
Scratch size: 2648 -> 2632 (-0.60 %) dwords per thread
Code Size: 37889052 -> 37897340 (0.02 %) bytes
LDS: 2147 -> 2147 (0.00 %) blocks
Max Waves: 279243 -> 280369 (0.40 %)
Wait states: 0 -> 0 (0.00 %)

Reviewers: nhaehnle, mareko, arsenm

Subscribers: kzhuravl, wdng, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D27593

llvm-svn: 289219
2016-12-09 15:57:15 +00:00
Simon Pilgrim bf9c0e7434 [SelectionDAG] Use SelectionDAG.getBuildVector helper. NFCI.
Makes interception of BUILD_VECTOR creation easier for debugging.

llvm-svn: 289218
2016-12-09 15:23:41 +00:00
Sanjoy Das 004de6fe69 [SCEVExpander] Remove \brief, reflow comments; NFC
llvm-svn: 289216
2016-12-09 14:42:14 +00:00
Sanjoy Das 1f6b0433c8 [SCEVExpander] Use llvm data structures; NFC
llvm-svn: 289215
2016-12-09 14:42:11 +00:00
Simon Pilgrim 15f1f828b5 [SelectionDAG] Add additional checks to CONCAT_VECTORS creation
Part of the work for PR31323 - add extra asserts checking that the input vectors are of consistent type and result in the correct number of vector elements.

llvm-svn: 289214
2016-12-09 14:27:52 +00:00
Benjamin Kramer eedc4059c3 Plug another leak in the DWARF unittests, DIEInlineStrings are never destroyed.
llvm-svn: 289208
2016-12-09 13:33:41 +00:00
Benjamin Kramer 9fcb7fe51e Fix memory leak in unit test.
The StringPool entries are destroyed with the allocator, the string pool
itself is not.

llvm-svn: 289207
2016-12-09 13:12:30 +00:00
NAKAMURA Takumi 6bd372bae7 llvm/test/Object/archive-thin-create.test: Make sure that %t is empty to stabilize the test.
llvm-svn: 289202
2016-12-09 11:44:57 +00:00
Dylan McKay 1cdbf42a33 [AVR] Remove a set of redundant tests
This fixes the build.

llvm-svn: 289201
2016-12-09 11:22:26 +00:00
Simon Pilgrim e4050a2961 [SelectionDAG] Add partial BITCAST support to computeKnownBits
Adds support for bitcasting a little endian 'small element' vector to 'large element' scalar/vector (e.g. v16i8 to v4i32 or v2i32 to i64), which is required for PR30845. We extract the knownbits for each 'small element' part and concatenate the results together.

We can add support for big endian and 'large element' scalar/vector to 'small element' vector bitcasting once we have test cases for them.

Differential Revision: https://reviews.llvm.org/D27129

llvm-svn: 289200
2016-12-09 10:13:45 +00:00
Malcolm Parsons 1b1b02d25b Update Doxygen comment in StringSaver (NFC)
llvm-svn: 289196
2016-12-09 09:33:33 +00:00
Daniel Jasper f51e05ffbc Revert "[SelectionDAG] Add knownbits support for EXTRACT_VECTOR_ELT opcodes"
This reverts commit r288916 as it is currently causing a crasher in
Halide. Reproducer on llvm.org/PR31323. While it might be that halide is
generating invalid IR, llc shouldn't crash.

llvm-svn: 289194
2016-12-09 09:04:51 +00:00
Craig Topper 38b1b5d44f [X86] Modify patterns from memory form of RCP/RSQRT/SQRT intrinsics to only allow (scalar_to_vector (loadf32/load64)) instead of anything that sse_load_f32/f64 can match.
sse_load_f32/f64 can also match loads that are zero extended to vectors. We shouldn't match that because we wouldn't be able to get the instruction to zero the upper bits like the intrinsic semantics would require for such a case.

There is a test case that does depend on this behavior.

llvm-svn: 289193
2016-12-09 07:57:21 +00:00
Dylan McKay 18ae0f68f8 [AVR] Use a more appropriate integer type for wide IN/OUT instructions
We could previously select an integer which would hit an assertion error
in pseudo expansion.

The new type will also generate the appropriate fixups if needed, which
wasn't done beforehand.

llvm-svn: 289192
2016-12-09 07:49:14 +00:00
Dylan McKay a5d49dfbb3 [AVR] Add tests for a large number of pseudo instructions
This adds MIR tests for 24 pseudo instructions.

llvm-svn: 289191
2016-12-09 07:49:04 +00:00
Craig Topper a55b483bb5 [AVX-512] Correctly preserve the passthru semantics of the FMA scalar intrinsics
Summary:
Scalar intrinsics have specific semantics about the which input's upper bits are passed through to the output. The same input is also supposed to be the input we use for the lower element when the mask bit is 0 in a masked operation. We aren't currently keeping these semantics with instruction selection.

This patch corrects this by introducing new scalar FMA ISD nodes that indicate whether operand 1(one of the multiply inputs) or operand 3(the additon/subtraction input) should pass thru its upper bits.

We use this information to select 213/132 form for the operand 1 version and the 231 form for the operand 3 version.

We also use this information to suppress combining FNEG operations on the passthru input since semantically the passthru bits aren't negated. This is stronger than the earlier check added for a user being SELECTS so we can remove that.

This fixes PR30913.

Reviewers: delena, zvi, v_klochkov

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27144

llvm-svn: 289190
2016-12-09 06:42:28 +00:00
Matt Arsenault 27c062932a AMDGPU: Select i16 instructions to VOP3 forms
These were selecting directly to the VOP2 form instead
of VOP3 like the i32 instructions. Fixes regressions in
future commits where an immediate isn't folded because it was
initially used for the second operand.

Because uniform 16-bit operations are promoted to i32, it's
difficult to get a simple testcase where this matters. Fold
failures in SIFoldOperands here tend to be hidden by commute
and fold in SIShrinkInstructions.

llvm-svn: 289189
2016-12-09 06:19:12 +00:00
Peter Collingbourne 8db7e5e4ee Re-commit r289184, "Support: Use a 64-bit seek in raw_fd_ostream::seek()." with a configure-time check for lseek64.
llvm-svn: 289187
2016-12-09 05:20:43 +00:00
Craig Topper c4f2b0996d [X86] Add masked versions of VPERMT2* and VPERMI2* to load folding tables.
llvm-svn: 289186
2016-12-09 05:20:11 +00:00
Peter Collingbourne f74fcdd30c Revert r289184, we need more configury for Darwin and *BSD.
llvm-svn: 289185
2016-12-09 05:04:30 +00:00
Peter Collingbourne 08ba509266 Support: Use a 64-bit seek in raw_fd_ostream::seek().
llvm-svn: 289184
2016-12-09 04:57:19 +00:00
Davide Italiano f8f391db16 [SCCP] Make the test added in r289175 more meaningful.
Add a comment while here.

llvm-svn: 289182
2016-12-09 03:49:20 +00:00
Davide Italiano 824d695231 [SCCP] Teach the pass about `mul %x 0` even if %x is overdefined.
The motivating example is:

extern int patatino;
int goo() {
    int x = 0;
    for (int i = 0; i < 1000000; ++i) {
        x *= patatino;
    }
    return x;
}

Currently SCCP will not realize that this function returns always zero,
therefore will try to unroll and vectorize the loop at -O3 producing an
awful lot of (useless) code. With this change, it will just produce:

0000000000000000 <g>:
   xor    %eax,%eax
   retq

llvm-svn: 289175
2016-12-09 03:08:42 +00:00
Craig Topper 2aeb456425 [AVX-512] Add vpermilps/pd to load folding tables.
llvm-svn: 289173
2016-12-09 02:18:11 +00:00
Craig Topper df9de00928 [AVX-512] Move some floating point stack folding test cases out of the integer test.
llvm-svn: 289172
2016-12-09 02:18:07 +00:00
Craig Topper 107b187d2a [Analysis] Fix typo in comment. NFC
llvm-svn: 289171
2016-12-09 02:18:04 +00:00
Kostya Serebryany 111e1d69e3 [libFuzzer] implement crash-resistant merge (https://github.com/google/sanitizers/issues/722). This is a first experimental variant that needs some more testing, thus not yet adding a lit test (but there are unit tests).
llvm-svn: 289166
2016-12-09 01:17:24 +00:00
Peter Collingbourne 8786754cc3 WholeProgramDevirt: Teach the pass to handle structs of arrays.
This will become necessary in some cases once D22296 lands.

llvm-svn: 289165
2016-12-09 01:10:11 +00:00
Chandler Carruth 86f0bdf832 [LCG] Minor cleanup to the LCG walk over a function, NFC.
This just hoists the check for declarations up a layer which allows
various sets used in the walk to be smaller. Also moves the relevant
comments to match, and catches a few other cleanups in this code.

llvm-svn: 289163
2016-12-09 00:46:44 +00:00
Peter Collingbourne 7a1e5bbe4e Make WholeProgramDevirt understand ConstStruct vtables.
Based on a patch by LemonBoy!

Differential Revision: https://reviews.llvm.org/D26581

llvm-svn: 289162
2016-12-09 00:33:27 +00:00
Chris Bieneman 313b326bb6 [ObjectYAML] Support for DWARF debug_aranges
This patch adds support for round tripping DWARF debug_aranges in and out of YAML.

llvm-svn: 289161
2016-12-09 00:26:44 +00:00
Sanjay Patel 568196bf7b [InstCombine] add tests for umin+icmp; NFC
llvm-svn: 289157
2016-12-08 23:44:58 +00:00
Sanjay Patel 73d8bd9905 [InstCombine] add tests for umax+icmp; NFC
llvm-svn: 289156
2016-12-08 23:36:57 +00:00
Zia Ansari 394cef803a [InstSimplify] Add "X / 1.0" to SimplifyFDivInst.
Differential Revision: https://reviews.llvm.org/D27587

llvm-svn: 289153
2016-12-08 23:27:40 +00:00
Sanjay Patel b641aa3f14 [InstCombine] add tests for smax+icmp; NFC
llvm-svn: 289151
2016-12-08 23:16:06 +00:00
Tim Northover b58346f2f2 GlobalISel: fall back gracefully for debug intrinsics.
Supporting them properly is a reasonably complex chunk of work, so to allow bot
testing before then we should at least be able to fall back to DAG ISel.

llvm-svn: 289150
2016-12-08 22:44:13 +00:00
Tim Northover 1e656ec137 GlobalISel: factor overflow handling into separate function. NFC.
llvm-svn: 289149
2016-12-08 22:44:00 +00:00
Davide Italiano 54c683f9e7 [SCCP] Make sure SCCP and ConstantFolding agree on undef >> a.
Currently SCCP folds the value to -1, while ConstantProp folds to
0. This changes SCCP to do what ConstantFolding does.

llvm-svn: 289147
2016-12-08 22:28:53 +00:00
Simon Atanasyan dccdfac877 [mips] Make the test case more specific and provide OS component of a triple. NFC
llvm-svn: 289117
2016-12-08 22:10:52 +00:00
Simon Atanasyan 71db32110b [mips] Change instruction s/daddiu/addiu/ since O32 prohibits the use of 64-bit GPRs. NFC
llvm-svn: 289115
2016-12-08 22:10:48 +00:00
Simon Atanasyan 7f64300a7e [mips] Change gnueabi to gnu in the triple because EABI has been removed recently. NFC
llvm-svn: 289114
2016-12-08 22:10:44 +00:00
Simon Atanasyan 7625e771d2 [mips] Remove N32 Android test because Android does not support N32 ABI. NFC
llvm-svn: 289113
2016-12-08 22:10:38 +00:00
Reid Kleckner 785e7d282c Don't emit .seh_handler directives for any cleanup funclets
We were falsely claiming that we had an LSDA for the relevant EH
personality before this change, which could lead to the EH machinery
interpreting random adjacent data as an LSDA.

Fixes PR31317

This change is safe because cleanups can't contain exception handlers
today. We do these things to maintain that invariant:
- C++ destructors are naturally out-of-line
- __finally blocks are outlined in clang
- LLVM's inliner will not inline EH constructs into cleanups

llvm-svn: 289101
2016-12-08 20:38:46 +00:00
Krzysztof Parzyszek 77a45576ef [RDF] Fix incorrect lane mask calculation
This was exposed by some code that used more than one level of sub-
registers. There is no testcase, because there is no such code in the
Hexagon backend.

llvm-svn: 289099
2016-12-08 20:33:45 +00:00
Sanjay Patel 2580c95dc1 [InstSimplify] add fdiv x/1.0 test and update checks; NFC
llvm-svn: 289098
2016-12-08 20:23:56 +00:00
Matt Arsenault e96d03745d AMDGPU: Make f16 ConstantFP legal
Not having this legal led to combine failures, resulting
in dumb things like bitcasts of constants not being folded
away.

The only reason I'm leaving the v_mov_b32 hack that f32
already uses is to avoid madak formation test regressions.
PeepholeOptimizer has an ordering issue where the immediate
fold attempt is into the sgpr->vgpr copy instead of the actual
use. Running it twice avoids that problem.

llvm-svn: 289096
2016-12-08 20:14:46 +00:00
Stanislav Mekhanoshin 73b54f4134 [AMDGPU] Fix number of reserved SGPRs on CI to reflect flat scratch use
Differential Revision: https://reviews.llvm.org/D27225

llvm-svn: 289095
2016-12-08 20:07:23 +00:00
Matt Arsenault 6c06a6f48a AMDGPU: Fix commuting v_sub_u16
The correct commutable opcode was set to itself, so this
was simply swapping the operands to commute instead of also
changing the opcode to v_subrev_u16.

llvm-svn: 289093
2016-12-08 19:52:38 +00:00
Stanislav Mekhanoshin 50ea93a2bd [AMDGPU] Add amdgpu-unify-metadata pass
Multiple metadata values for records such as opencl.ocl.version, llvm.ident
and similar are created after linking several modules. For some of them, notably
opencl.ocl.version, this creates semantic problem because we cannot tell which
version of OpenCL the composite module conforms.

Moreover, such repetitions of identical values often create a huge list of
unneeded metadata, which grows bitcode size both in memory and stored on disk.
It can go up to several Mb when linked against our OpenCL library. Lastly, such
long lists obscure reading of dumped IR.

The pass unifies metadata after linking.

Differential Revision: https://reviews.llvm.org/D25381

llvm-svn: 289092
2016-12-08 19:46:04 +00:00
Peter Collingbourne 235c275b20 IR, X86: Understand !absolute_symbol metadata on global variables.
Summary:
Attaching !absolute_symbol to a global variable does two things:
1) Marks it as an absolute symbol reference.
2) Specifies the value range of that symbol's address.
Teach the X86 backend to allow absolute symbols to appear in place of
immediates by extending the relocImm and mov64imm32 matchers. Start using
relocImm in more places where it is legal.

As previously proposed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2016-October/105800.html

Differential Revision: https://reviews.llvm.org/D25878

llvm-svn: 289087
2016-12-08 19:01:00 +00:00
Chris Bieneman fbf7dfe1ba [ObjectYAML] Remove DWARF from class names
Since all the DWARF classes are in a DWARFYAML namespace having every class start with DWARF seems like a bit of overkill.

llvm-svn: 289080
2016-12-08 17:46:57 +00:00
Alexander Timofeev 18009560c5 [AMDGPU] Scalarization of global uniform loads.
Summary:
LC can currently select scalar load for uniform memory access
basing on readonly memory address space only. This restriction
originated from the fact that in HW prior to VI vector and scalar caches
are not coherent. With MemoryDependenceAnalysis we can check that the
memory location corresponding to the memory operand of the LOAD is not
clobbered along the all paths from the function entry.

Reviewers: rampitec, tstellarAMD, arsenm

Subscribers: wdng, arsenm, nhaehnle

Differential Revision: https://reviews.llvm.org/D26917

llvm-svn: 289076
2016-12-08 17:28:47 +00:00
Keno Fischer dc09119776 ConstantFolding: Don't crash when encountering vector GEP
ConstantFolding tried to cast one of the scalar indices to a vector
type. Instead, use the vector type only for the first index (which
is the only one allowed to be a vector) and use its scalar type
otherwise.

Fixes PR31250.

Reviewers: majnemer
Differential Revision: https://reviews.llvm.org/D27389

llvm-svn: 289073
2016-12-08 17:22:35 +00:00
Greg Clayton b90328356a Fix ASAN buildbots by fixing a double free crash.
The dwarfgen::Generator::StringPool was in a unique_ptr but it was owned by the Allocator member variable so it was being free twice.

llvm-svn: 289070
2016-12-08 16:57:04 +00:00
NAKAMURA Takumi 689493bb12 Prune unused libdeps.
llvm-svn: 289060
2016-12-08 15:28:02 +00:00
NAKAMURA Takumi 66c8fa94a6 Prune unused \param(s) in r289050. [-Wdocumentation]
llvm-svn: 289057
2016-12-08 15:00:12 +00:00
NAKAMURA Takumi a495f882be DIE::addAttribute(): Prune a redundant \param. [-Wdocumentation]
llvm-svn: 289056
2016-12-08 15:00:07 +00:00
NAKAMURA Takumi 9ccd966612 LanaiInstPrinter: Prune unused libdeps.
llvm-svn: 289054
2016-12-08 14:26:30 +00:00
NAKAMURA Takumi b0f7b03711 DebugInfoDWARFTests: Prune unused libdeps.
llvm-svn: 289053
2016-12-08 14:26:23 +00:00
NAKAMURA Takumi fdf3edeb0c DebugInfoDWARFTests: Add missing deps, AsmPrinter and Object.
llvm-svn: 289052
2016-12-08 14:11:02 +00:00
NAKAMURA Takumi bf177380ab DebugInfoDWARFTests: Reorder LLVM_LINK_COMPONENTS.
llvm-svn: 289051
2016-12-08 14:10:57 +00:00
Nicolai Haehnle f08dc90253 [SelectionDAG] Add expansion and promotion of [US]MUL_LOHI
Summary:
Most targets set the action for these nodes to Expand even though there
isn't actually any code for them in ExpandNode. Instead, targets simply
relied on the fact that no code generates these nodes as long as the
nodes aren't legal or custom.

However, generating these nodes can be useful e.g. for divide-by-constant
in wider integer types.

Expand of [US]MUL_LOHI will use MULH[US] when legal or custom, and
a sequence of half-width multiplications otherwise. Promote uses a wider
multiply.

This patch intends to not change the generated code, but indirect effects
are possible since expansions/promotions that were previously done in
DAGCombine may now be done in LegalizeDAG.

See D24822 for a change that actually uses the new expansion.

Reviewers: spatel, bkramer, venkatra, efriedma, hfinkel, ast, nadav, tstellarAMD

Subscribers: arsenm, jyknight, nemanjai, wdng, nhaehnle, llvm-commits

Differential Revision: https://reviews.llvm.org/D24956

llvm-svn: 289050
2016-12-08 14:08:14 +00:00
Nicolai Haehnle 3c67a08d1b X86: Add checks for fma_patterns[_wide].ll with -enable-no-infs-fp-math
This re-adds checks for the patterns that were disabled with r288506.

Reviewers: spatel, delena, craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27346

llvm-svn: 289049
2016-12-08 14:08:08 +00:00
Nicolai Haehnle 2857dc3893 AMDGPU: Properly implement SIRegisterInfo::isFrameOffsetLegal and needsFrameBaseReg
Summary:
Without the fix to isFrameOffsetLegal to consider the instruction's
immediate offset, the new test case hits the corresponding assertion in
resolveFrameIndex, because the LocalStackSlotAllocation pass re-uses a
different base register.

With only the fix to isFrameOffsetLegal, code quality reduces in a bunch of
places because frame base registers are added where they're not needed.
This is addressed by properly implementing needsFrameBaseReg, which also
helps to avoid unnecessary zero frame indices in a bunch of other places.

Fixes piglit glsl-1.50/execution/variable-indexing/gs-output-array-vec4-index-wr.shader_test

Reviewers: arsenm, tstellarAMD

Subscribers: qcolombet, kzhuravl, wdng, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D27344

llvm-svn: 289048
2016-12-08 14:08:02 +00:00
Daniel Jasper 0f77869d58 Move DwarfGenerator.cpp to unittests
So far it creates a test helper and so it should be moved there. It also
create a layering cycle between CodeGen and CodeGen/AsmPrinter, which
should be avoided.

Review: https://reviews.llvm.org/D27570
llvm-svn: 289044
2016-12-08 12:45:29 +00:00
Alexey Bataev 4f0d469d45 [SLP] Fix for PR6246: vectorization for scalar ops on vector elements.
When trying to vectorize trees that start at insertelement instructions
function tryToVectorizeList() uses vectorization factor calculated as
MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree
cost for this fixed vectorization factor is too high.
Patch tries to improve the situation. It tries different vectorization
factors from max(PowerOf2Floor(NumberOfVectorizedValues),
MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries
to choose the best one.

Differential Revision: https://reviews.llvm.org/D27215

llvm-svn: 289043
2016-12-08 11:57:51 +00:00
Pavel Labath 82b95acfbe Fix MSCV compilation broken by r289040
I wanted to use the "not" keyword to make sure it does not get lost in between
other checks. MSVC does not like that.

llvm-svn: 289041
2016-12-08 11:45:38 +00:00
Pavel Labath fefefeb7f6 Improve format member detection in llvm::formatv
Summary:
The existing detection of a format member function has a couple of deficiencies:
- the member function does not get detected if one calls formatv with an lvalue,
  because the template parameter gets deduced as T&, which fails the is_class
  check.
- it also did not work if the function was called with a const variable because
  the template parameter would get deduced as const T&, again failing the
  is_class check.

This fixes the problem by stripping the references in the uses_format_member
template, to make sure the type is correctly detected as class. It also provides
specializations of the has_FormatMember template for const and non-const members
of the types in order to enable declaring the format member as a "const"
function. I have added tests that verify that formatv can be now called in these
scenarios. As some scenarios could not be verified at runtime (e.g. making sure
that calling a non-const format member on a const object does *not* compile), I
have also added some static_asserts which test the behaviour of the template
classes used internally by formatv().

Reviewers: zturner

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27525

llvm-svn: 289040
2016-12-08 11:31:19 +00:00
Dylan McKay 371117e7a5 [AVR] Add MIR tests for pseudo instruction expansions
This adds tests for 13 pseudo instruction expansions.

llvm-svn: 289039
2016-12-08 10:52:13 +00:00
Simon Pilgrim 413c8e217f Wdocumentation fix
llvm-svn: 289038
2016-12-08 10:41:41 +00:00
Simon Pilgrim d35d067e47 Wdocumentation fix
llvm-svn: 289037
2016-12-08 10:31:32 +00:00
Oliver Stannard 68e7c21ca0 Add a comment consumer mechanism to MCAsmLexer
This allows clients to register an AsmCommentConsumer with the MCAsmLexer,
which receives a callback each time a comment is parsed.

Differential Revision: https://reviews.llvm.org/D27511

llvm-svn: 289036
2016-12-08 10:31:21 +00:00
Simon Pilgrim d9c53710d5 [X86][SSE] Add vector test for (shl (or x, c1), c2) -> (or (shl x, c2), c1 << c2) detailed in D19325
llvm-svn: 289035
2016-12-08 10:17:25 +00:00
Dylan McKay 0cc0446ad2 [AVR] Add MIR tests for a few pseudo instructions
llvm-svn: 289031
2016-12-08 08:54:41 +00:00
Dylan McKay fac9ce5413 [AVR] Add an assertion to ensure we don't emit LPM when it's unsupported
llvm-svn: 289030
2016-12-08 08:34:13 +00:00
Peter Collingbourne f4257528e9 LTO: Hash the parts of the LTO configuration that affect code generation.
Most importantly, we need to hash the relocation model, otherwise we can
end up trying to link non-PIC object files into PIEs or DSOs.

Differential Revision: https://reviews.llvm.org/D27556

llvm-svn: 289024
2016-12-08 05:28:30 +00:00
Greg Clayton fd461fe360 Unbreak buildbots where the debug info test was crashing due to unchecked error.
llvm-svn: 289017
2016-12-08 02:11:03 +00:00
Keno Fischer d4ea4c18f1 Revert "[CodeGen] Fix invalid DWARF info on Win64"
Appears to break on build bots. Reverting pending investigation.

llvm-svn: 289014
2016-12-08 01:56:23 +00:00
Keno Fischer 460218fb7d [CodeGen] Fix invalid DWARF info on Win64
The relocations for `DIEEntry::EmitValue` were wrong for Win64
(emitting FK_Data_4 instead of FK_SecRel_4). This corrects that
oversight so that the DWARF data is correct in Win64 COFF files.

Fixes PR15393.

Patch by Jameson Nash <jameson@juliacomputing.com> based on a patch
by David Majnemer.

Differential Revision: https://reviews.llvm.org/D21731

llvm-svn: 289013
2016-12-08 01:40:21 +00:00
Greg Clayton 3462a420d1 Make a DWARF generator so we can unit test DWARF APIs with gtest.
The only tests we have for the DWARF parser are the tests that use llvm-dwarfdump and expect output from textual dumps.

More DWARF parser modification are coming in the next few weeks and I wanted to add tests that can verify that we can encode and decode all form types, as well as test some other basic DWARF APIs where we ask DIE objects for their children and siblings.

DwarfGenerator.cpp was added in the lib/CodeGen directory. This file contains the code necessary to easily create DWARF for tests:

dwarfgen::Generator DG;
Triple Triple("x86_64--");
bool success = DG.init(Triple, Version);
if (!success)
  return;
dwarfgen::CompileUnit &CU = DG.addCompileUnit();
dwarfgen::DIE CUDie = CU.getUnitDIE();

CUDie.addAttribute(DW_AT_name, DW_FORM_strp, "/tmp/main.c");
CUDie.addAttribute(DW_AT_language, DW_FORM_data2, DW_LANG_C);

dwarfgen::DIE SubprogramDie = CUDie.addChild(DW_TAG_subprogram);
SubprogramDie.addAttribute(DW_AT_name, DW_FORM_strp, "main");
SubprogramDie.addAttribute(DW_AT_low_pc, DW_FORM_addr, 0x1000U);
SubprogramDie.addAttribute(DW_AT_high_pc, DW_FORM_addr, 0x2000U);

dwarfgen::DIE IntDie = CUDie.addChild(DW_TAG_base_type);
IntDie.addAttribute(DW_AT_name, DW_FORM_strp, "int");
IntDie.addAttribute(DW_AT_encoding, DW_FORM_data1, DW_ATE_signed);
IntDie.addAttribute(DW_AT_byte_size, DW_FORM_data1, 4);

dwarfgen::DIE ArgcDie = SubprogramDie.addChild(DW_TAG_formal_parameter);
ArgcDie.addAttribute(DW_AT_name, DW_FORM_strp, "argc");
// ArgcDie.addAttribute(DW_AT_type, DW_FORM_ref4, IntDie);
ArgcDie.addAttribute(DW_AT_type, DW_FORM_ref_addr, IntDie);

StringRef FileBytes = DG.generate();
MemoryBufferRef FileBuffer(FileBytes, "dwarf");
auto Obj = object::ObjectFile::createObjectFile(FileBuffer);
EXPECT_TRUE((bool)Obj);
DWARFContextInMemory DwarfContext(*Obj.get());
This code is backed by the AsmPrinter code that emits DWARF for the actual compiler.

While adding unit tests it was discovered that DIEValue that used DIEEntry as their values had bugs where DW_FORM_ref1, DW_FORM_ref2, DW_FORM_ref8, and DW_FORM_ref_udata forms were not supported. These are all now supported. Added support for DW_FORM_string so we can emit inlined C strings.

Centralized the code to unique abbreviations into a new DIEAbbrevSet class and made both the dwarfgen::Generator and the llvm::DwarfFile classes use the new class.

Fixed comments in the llvm::DIE class so that the Offset is known to be the compile/type unit offset.

DIEInteger now supports more DW_FORM values.

There are also unit tests that cover:

Encoding and decoding all form types and values
Encoding and decoding all reference types (DW_FORM_ref1, DW_FORM_ref2, DW_FORM_ref4, DW_FORM_ref8, DW_FORM_ref_udata, DW_FORM_ref_addr) including cross compile unit references with that go forward one compile unit and backward on compile unit.

Differential Revision: https://reviews.llvm.org/D27326

llvm-svn: 289010
2016-12-08 01:03:48 +00:00
Evgeniy Stepanov 0c8957c198 CFI-icall on Thumb
Replace @progbits in the section directive with %progbits, because "@" starts a comment on arm/thumb.
Use b.w branch instruction.
Use .thumb_function and .thumb_set for proper arm/thumb interwork. This way jumptable entry addresses on thumb have bit 0 set (correctly). This does not affect CFI check math, because the address of the jumptable start also has that bit set.

This does not work on thumbv5, because it does not support b.w, and the linker would not insert a veneer (trampoline?) to extend the range of b.n. We may need to do full-range plt-style jumptables on thumbv54, which are 12 bytes per entry. Another option is "push lr; bl; pop pc" (4 bytes) but that needs unwinding instructions, etc.

Differential Revision: https://reviews.llvm.org/D27499

llvm-svn: 289008
2016-12-08 00:32:26 +00:00
Peter Collingbourne cfef0cd31b LTO: Remove the unused Config::Features field.
We are currently initializing Features via MAttrs.

llvm-svn: 289007
2016-12-08 00:27:37 +00:00
Matthias Braun 9ee1a1df24 The few days mentioned in r267095 are over
llvm-svn: 289004
2016-12-08 00:16:42 +00:00
Matthias Braun e2d2ead661 TargetPassConfig: Rename DisablePostRA -> DisablePostRASched; NFC
llvm-svn: 289003
2016-12-08 00:16:08 +00:00
Matthias Braun 0c989a893b LivePhysReg: Use reference instead of pointer in init(); NFC
llvm-svn: 289002
2016-12-08 00:15:51 +00:00
Quentin Colombet ae3168da3f [InlineSpiller] Don't call TargetInstrInfo::foldMemoryOperand with an empty list.
Since r287792 if we try to do that we will hit an assert.

llvm-svn: 289001
2016-12-08 00:06:51 +00:00
Filipe Cabecinhas d5b21b2418 [asan] Split load and store checks in test. NFCI
llvm-svn: 288991
2016-12-07 22:37:11 +00:00
Chris Bieneman 7d7364ab4f [yaml2obj] Refactor and abstract yaml2dwarf functions
This abstracts the code for emitting DWARF binary from the DWARFYAML types into reusable interfaces that could be used by ELF and COFF.

llvm-svn: 288990
2016-12-07 22:30:15 +00:00
Eugene Zelenko 9408c61830 [ADT, IR] Fix some Clang-tidy modernize-use-equals-delete and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 288989
2016-12-07 22:06:02 +00:00
Davide Italiano 1ed5396304 [BDCE] Skip metadata while replacing uses.
The fix committed in r288851 doesn't cover all the cases.
In particular, if we have an instruction with side effects
which has a no non-dbg use not depending on the bits, we still
perform RAUW destroying the dbg.value's first argument.
Prevent metadata from being replaced here to avoid the issue.

Differential Revision:  https://reviews.llvm.org/D27534

llvm-svn: 288987
2016-12-07 21:47:32 +00:00
Chris Bieneman 26d060fbf9 [obj2yaml] Refactor and abstract dwarf2yaml
This makes the dwarf2yaml code separated and reusable allowing ELF and COFF to share implementations with MachO.

llvm-svn: 288986
2016-12-07 21:47:28 +00:00
Tim Northover c53606ef02 GlobalISel: use correct builder for ConstantExprs.
ConstantExpr instances were emitting code into the current block rather than
the entry block. This meant they didn't necessarily dominate all uses, which is
clearly wrong.

llvm-svn: 288985
2016-12-07 21:29:15 +00:00
Chris Bieneman 79e60eb948 [ObjectYAML] Pull DWARF support into DWARFYAML namespace
Since DWARF formatting is agnostic to the object file it is stored in, it doesn't make sense for this to be in the MachOYAML implementation. Pulling it into its own namespace means we could modify the ELF and COFF YAML tools to emit DWARF as well.

In a follow-up patch I will better abstract this in obj2yaml and yaml2obj so that the DWARF bits in the tools can be re-used too.

llvm-svn: 288984
2016-12-07 21:26:32 +00:00
Tim Northover 50db7f416c GlobalISel: store the current MachineFunction as direct state. NFC.
Having to ask the MIRBuilder for the current function is a little awkward, and
I'm intending to improve how that's threaded through anyway.

llvm-svn: 288983
2016-12-07 21:17:47 +00:00
Chris Bieneman 25ec226dfc [ObjectYAML] Rename DWARF entries to match section names
This change makes the yaml tags for the members of the DWARF data match the names of the DWARF sections.

llvm-svn: 288981
2016-12-07 21:09:37 +00:00
Tim Northover 05cc4859ad GlobalISel: simplify MachineIRBuilder interface.
MachineIRBuilder had weird before/after and beginning/end flags for the insert
point. Unfortunately the non-default means that instructions will be inserted
in reverse order which is almost never what anyone wants.

Really, I think we just want (like IRBuilder has) the ability to insert at any
C++ iterator-style point (i.e. before any instruction or before MBB.end()). So
this fixes MIRBuilders to behave like IRBuilders in this respect.

llvm-svn: 288980
2016-12-07 21:05:38 +00:00
Kostya Serebryany 64a055549a [libFuzzer] include FuzzerIO.h and hopefully fix the Mac build. reported by Dejan Mircevski
llvm-svn: 288979
2016-12-07 21:02:48 +00:00
Matt Arsenault 624e1b348c InstCombine: Fold bitcast of vector to FP scalar
llvm-svn: 288978
2016-12-07 20:56:11 +00:00
Chris Bieneman bb41361814 [CMake] Add check for HAVE_CRASHREPORTER_INFO
This was also explicitly undef in CMake for some unknown reason.

Hopefully this one won't kill all the bots.

llvm-svn: 288977
2016-12-07 20:55:38 +00:00
Eli Friedman c6885fc369 [GVNHoist] Invalidate MemDep when an instruction is moved.
See also r279907.

Fixes https://llvm.org/bugs/show_bug.cgi?id=30991 .

Differential Revision: https://reviews.llvm.org/D27493

llvm-svn: 288968
2016-12-07 19:55:59 +00:00
Michael Kuperstein 5842b20633 [X86] Skip over DEBUG_VALUE while looking for start of call sequence
If we don't skip over DEBUG_VALUEs, we get differences between -g and non-g
code.

This fixes PR31242.

Differential Revision: https://reviews.llvm.org/D27485

llvm-svn: 288965
2016-12-07 19:31:08 +00:00
Michael Kuperstein 18092cf2c3 [X86] Do not assume "ri" instructions always have an immediate operand
The second operand of an "ri" instruction may be an immediate, but it may
also be a globalvariable, so we should make any assumptions.

This fixes PR31271.

Differential Revision: https://reviews.llvm.org/D27481

llvm-svn: 288964
2016-12-07 19:29:18 +00:00
Chris Bieneman bfff254a10 Fix the apple build issue caused by r288956
Should be checking if HAVE_CRASHREPORTERCLIENT_H is defined not relying on it having a value.

llvm-svn: 288963
2016-12-07 19:28:22 +00:00
Chris Bieneman 34264e1769 Revert "[CMake] Use cmakedefine01 instead of cmakedefine"
This reverts commit r288959.

Apparently using cmakedefine01 explodes.

llvm-svn: 288961
2016-12-07 19:25:38 +00:00
Chris Bieneman f3ecc9a1a8 [CMake] Use cmakedefine01 instead of cmakedefine
Looks like we need a 01 value for HAVE_CRASHREPORTERCLIENT_H.

llvm-svn: 288959
2016-12-07 19:13:32 +00:00
Sanjay Patel 964c735f86 [InstCombine] add tests for smin+icmp; NFC
The tests that already work are folded in InstSimplify, so those
tests should be redundant and we can remove them if they don't
seem worthwhile for completeness.

llvm-svn: 288957
2016-12-07 18:56:55 +00:00
Chris Bieneman df11f53b67 [CMake] Add a check for HAVE_CRASHREPORTERCLIENT_H
The CMake build has been hardcoding this to undef forever, we shouldn't have been doing that.

llvm-svn: 288956
2016-12-07 18:53:04 +00:00
Chris Bieneman c6c0e54d3d [ObjectYAML] Support for DWARF __debug_abbrev section
This patch adds support for round-tripping DWARF debug abbreviations through the obj<->yaml tools.

llvm-svn: 288955
2016-12-07 18:52:59 +00:00
Simon Pilgrim ba05d41095 [SelectionDAG] Add knownbits support for vector demandedelts in SMAX/SMIN/UMAX/UMIN opcodes
llvm-svn: 288926
2016-12-07 17:54:00 +00:00
Simon Pilgrim ef76b83164 [X86] Add knownbits vector UMAX test
In preparation for demandedelts support

llvm-svn: 288920
2016-12-07 17:21:13 +00:00
Simon Pilgrim c3c6463ce0 [X86][SSE] Remove AND -> VZEXT combine
This is now performed more generally by the target shuffle combine code.

Already covered by tests that were originally added in D7666/rL229480 to support combineVectorZext (or VectorZextCombine as it was known then....).

Differential Revision: https://reviews.llvm.org/D27510

llvm-svn: 288918
2016-12-07 17:02:41 +00:00
Simon Pilgrim 967325b373 [SelectionDAG] Add knownbits support for EXTRACT_VECTOR_ELT opcodes
llvm-svn: 288916
2016-12-07 16:28:21 +00:00
Simon Pilgrim ff79f31328 [SelectionDAG] Removed old knownbits TODO comment. NFCI.
EXTRACT_VECTOR_ELT does support demanded elts if the element index is known and in range.

llvm-svn: 288913
2016-12-07 15:31:12 +00:00
Simon Pilgrim b421ef2370 [X86] Add test to show missed opportunities to calculate knownbits in INSERT_VECTOR_ELT
llvm-svn: 288912
2016-12-07 15:27:18 +00:00
Simon Pilgrim 33f2a669c1 [X86][SSE] Fix vpextrd/vpextrq checks
They were testing for the pre-vex versions

llvm-svn: 288911
2016-12-07 15:10:05 +00:00
Simon Pilgrim 4b1ebf97fc [X86][SSE] Force execution domain of 32-bit extractps/pextrd in the stack folding tests
llvm-svn: 288910
2016-12-07 15:06:14 +00:00
Matthew Simpson 364da7e527 [LV] Scalarize operands of predicated instructions
This patch attempts to scalarize the operand expressions of predicated
instructions if they were conditionally executed in the original loop. After
scalarization, the expressions will be sunk inside the blocks created for the
predicated instructions. The transformation essentially performs
un-if-conversion on the operands.

The cost model has been updated to determine if scalarization is profitable. It
compares the cost of a vectorized instruction, assuming it will be
if-converted, to the cost of the scalarized instruction, assuming that the
instructions corresponding to each vector lane will be sunk inside a predicated
block, possibly avoiding execution. If it's more profitable to scalarize the
entire expression tree feeding the predicated instruction, the expression will
be scalarized; otherwise, it will be vectorized. We only consider the cost of
the entire expression to accurately estimate the cost of the required
insertelement and extractelement instructions.

Differential Revision: https://reviews.llvm.org/D26083

llvm-svn: 288909
2016-12-07 15:03:32 +00:00
Benjamin Kramer b1332d8bf6 Try unbreaking the MSVC build.
llvm-svn: 288907
2016-12-07 13:35:11 +00:00
Simon Pilgrim e75ff02269 [X86][SSE] Regenerate test.
llvm-svn: 288906
2016-12-07 13:05:04 +00:00
Dylan McKay 99b756eb40 [AVR] Expand 'SELECT_CC' nodes whereever possible
llvm-svn: 288905
2016-12-07 12:34:47 +00:00
Benjamin Kramer 926ab5b00b [LowerTypeTests] Use the TrailingObjects infrastructure for trailing objects.
Also avoid allocating ~3x as much memory as needed.

llvm-svn: 288904
2016-12-07 12:31:45 +00:00
Andrea Di Biagio ae5780104f When GVN removes a redundant load, it should not modify the debug location of the dominating load.
In the case of a fully redundant load LI dominated by an equivalent load V, GVN
should always preserve the original debug location of V. Otherwise, we risk to
introduce an incorrect stepping.
If V has debug info, then clearly it should not be modified. If V has a null
debugloc, then it is still potentially incorrect to propagate LI's debugloc
because LI may not post-dominate V.

Differential Revision: https://reviews.llvm.org/D27468

llvm-svn: 288903
2016-12-07 12:31:36 +00:00
Simon Pilgrim 8893bd95f0 [X86][SSE] Consistently set MOVD/MOVQ load/store/move instructions to integer domain
We are being inconsistent with these instructions (and all their variants.....) with a random mix of them using the default float domain.

Differential Revision: https://reviews.llvm.org/D27419

llvm-svn: 288902
2016-12-07 12:10:49 +00:00
Andrea Di Biagio eff22832c0 [InlineFunction] Refactor code in function `fixupLineNumbers' as suggested by David in D27462. NFC
llvm-svn: 288901
2016-12-07 12:01:45 +00:00
Simon Dardis 615bac37cd [mips][rtdyld] Merge code to write relocated values to the section. NFC
Preparation work for implementing N32 support.

Patch By: Daniel Sanders

Reviewers: vkalintiris, atanasyan

Differential Revision: https://reviews.llvm.org/D27460

llvm-svn: 288900
2016-12-07 11:41:23 +00:00
Dylan McKay 6dbc8d5a0c [AVR] Move a pseudo expansion test into a folder
llvm-svn: 288899
2016-12-07 11:21:45 +00:00
Simon Pilgrim d5bc5c16b2 [X86][XOP] Fix VPERMIL2 non-constant pool shuffle decoding (PR31296)
The non-constant pool version of DecodeVPERMIL2PMask was not offsetting correctly for the second input. I've updated the code to match the implementation in the constant-pool version.

Annoyingly this bug was hidden for so long as it's tricky to combine to useful variable shuffle masks that don't become constant-pool entries.

llvm-svn: 288898
2016-12-07 11:19:00 +00:00
Dylan McKay 8cec7eb6dd [AVR] Allow loading from stack slots where src and dest registers are identical
Fixes PR 31256

llvm-svn: 288897
2016-12-07 11:08:56 +00:00
Andrea Di Biagio 32d5aedd5b [InlineFunction] Do not propagate the callsite debug location to instructions inlined from functions with debug info.
When a function F is inlined, InlineFunction extends the debug location of every
instruction inlined from F by adding an InlinedAt.

However, if an instruction has a 'null' debug location, InlineFunction would
propagate the callsite debug location to it. This behavior existed since
revision 210459.

Revision 210459 was originally committed specifically to workaround the lack of
debug information for instructions inlined from intrinsic functions (which are
usually declared with attributes `__always_inline__, __nodebug__`).

The problem with revision 210459 is that it doesn't make any sort of distinction
between instructions inlined from a 'nodebug' function and instructions which
are inlined from a function built with debug info. This issue may lead to
incorrect stepping in the debugger.

This patch works under the assumption that a nodebug function does not have a
DISubprogram. When a function F is inlined into another function G,
InlineFunction checks if F has debug info associated with it.

For nodebug functions, the InlineFunction logic is unchanged (i.e. it would
still propagate the callsite debugloc to the inlined instructions). Otherwise,
InlineFunction no longer propagates the callsite debug location.

Differential Revision: https://reviews.llvm.org/D27462

llvm-svn: 288895
2016-12-07 10:37:26 +00:00
Chandler Carruth e7b71a7eb8 [PM] Add some more logging to make it more clear when the CGSCC
infrastrucutre is skipping SCCs and RefSCCs.

llvm-svn: 288894
2016-12-07 10:33:15 +00:00
Philip Reames 02bb6a6b0b Reintroduce a check accidentally removed in 288873 to fix clang bots
I believe this is the cause of the failure, but have not been able to confirm.  Note that this is a speculative fix; I'm still waiting for a full build to finish as I synced and ended up doing a clean build which takes 20+ minutes on my machine.

llvm-svn: 288886
2016-12-07 04:48:50 +00:00
Philip Reames 29b19f0e9e Fix a warning introduced in r288874
llvm-svn: 288884
2016-12-07 04:11:22 +00:00
Peter Collingbourne 67ec0eb531 LowerTypeTests: Add a test that covers "unsatisfiable" type metadata.
llvm-svn: 288881
2016-12-07 03:04:34 +00:00
Tom Stellard 8485fa096e AMDGPU : Add S_SETREG instructions to fix fdiv precision issues.
Patch By: Wei Ding

Summary: This patch fixes the fdiv precision issues.

Reviewers: b-sumner, cfang, wdng, arsenm

Subscribers: kzhuravl, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D26424

llvm-svn: 288879
2016-12-07 02:42:15 +00:00
Haicheng Wu f8b834049a [AArch64] Correct the check of signed 9-bit imm in isLegalAddressingMode()
In the addressing mode, signed 9-bit imm is [-256, 255], not [-512, 511].

Differential Revision: https://reviews.llvm.org/D27480

llvm-svn: 288876
2016-12-07 01:45:04 +00:00
Chandler Carruth 5205c35075 [LCG] Add basic verification of the parent set and fix bugs it uncovers.
The existing unittests actually cover this now that we verify things.

llvm-svn: 288875
2016-12-07 01:42:40 +00:00
Philip Reames 71a496777c [LVI] Remove used return value from markX functions
llvm-svn: 288874
2016-12-07 01:03:56 +00:00
Philip Reames b47a719ac0 [LVI] Simplify mergeIn code
Remove the unused return type, use early return, use assignment operator.

llvm-svn: 288873
2016-12-07 00:54:21 +00:00
Philip Reames 864ab5c516 [LVI] Simplify obfuscated code
It doesn't matter why something is overdefined if it is...

llvm-svn: 288871
2016-12-07 00:28:28 +00:00
Peter Collingbourne 6f0b4f2e89 IR: Reduce the amount of boilerplate required for a metadata kind. NFCI.
llvm-svn: 288867
2016-12-06 23:53:01 +00:00
Tom Stellard 2187bb8a89 AMDGPU: Add llvm.amdgcn.interp.mov intrinsic
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D26725

llvm-svn: 288865
2016-12-06 23:52:13 +00:00
Davide Italiano 7f1bad88c3 [llc] Fix -stop-after=consthoist initializing the pass.
llvm-svn: 288864
2016-12-06 23:49:58 +00:00
Matt Arsenault 269ffdac4e AMDGPU: Fix crash on i16 constant expression
llvm-svn: 288861
2016-12-06 23:18:06 +00:00
Peter Collingbourne 7357b2ad62 LowerTypeTests: Improve performance by optimising type metadata queries.
Requesting metadata for a global is a relatively expensive operation as it
involves a map lookup, but it's one that we need to do relatively frequently in
this pass to collect the list of type metadata nodes associated with a global.
This change improves the performance of type metadata queries by prebuilding
data structures that keep the global together with its list of type metadata,
and changing the pass to use that data structure wherever we were previously
passing global references around.

This change also eliminates some O(N^2) behavior by collecting the list of
globals associated with each type identifier during the first pass over the
list of globals rather than visiting each global to compute that list every
time we add a new type identifier.

Reduces pass runtime on a module containing Chrome's vtables from over 60s
to 0.9s.

Differential Revision: https://reviews.llvm.org/D27484

llvm-svn: 288859
2016-12-06 23:02:13 +00:00
Simon Pilgrim 0559b9e557 [X86][XOP] Add test case for PR31296
llvm-svn: 288858
2016-12-06 22:50:13 +00:00
Eli Friedman 0a76e3241f [CodeGen] Fix result type for SMULO/UMULO legalization
On some platforms (like MSP430) the second element of the result
structure for SMULO/UMULO may have a shorter type than the one
returned by SetCC. We need to truncate it to the right type, or
else some incorrect code may be generated later on.

This fixes issue https://github.com/rust-lang/rust/issues/37829

Patch by Vadzim Dambrouski!

Differential Revision: https://reviews.llvm.org/D27154

llvm-svn: 288857
2016-12-06 22:49:36 +00:00
Matt Arsenault ac066f354a AMDGPU: Fix operand name for v_interp_*
Other VOP instructions call the output vdst

llvm-svn: 288856
2016-12-06 22:29:43 +00:00
Sanjay Patel 5369775a84 [InstSimplify] fixed (?) to not mutate icmps
As Eli noted in the post-commit thread for r288833, the use of
swapOperands() may not be allowed in InstSimplify, so I'm 
removing those calls here pending further review. 

The swap mutates the icmp, and there doesn't appear to be precedent
for instruction mutation in InstSimplify.

I didn't actually have any tests for those cases, so I'm adding
a few here. 

llvm-svn: 288855
2016-12-06 22:09:52 +00:00
Eugene Zelenko 40b89ff81e [IR] Fix some Clang-tidy modernize-use-equals-delete and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 288853
2016-12-06 22:00:57 +00:00
Tom Stellard 175959e350 AMDGPU/SI: Set correct value for amd_kernel_code_t::kernarg_segment_alignment
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D27416

llvm-svn: 288852
2016-12-06 21:53:10 +00:00
Davide Italiano 043e66137c [BDCE/DebugInfo] Preserve llvm.dbg.value's argument.
BDCE has two phases:
1. It asks SimplifyDemandedBits if all the bits of an instruction are dead, and if so,
replaces all its uses with the constant zero.
2. Then, it asks SimplifyDemandedBits again if the instruction is really dead
(no side effects etc..) and if so, eliminates it.

Now, in 1) if all the bits of an instruction are dead, we may end up replacing a dbg use:
  %call = tail call i32 (...) @g() #4, !dbg !15
  tail call void @llvm.dbg.value(metadata i32 %call, i64 0, metadata !8, metadata !16), !dbg !17
->
  %call = tail call i32 (...) @g() #4, !dbg !15
  tail call void @llvm.dbg.value(metadata i32 0, i64 0, metadata !8, metadata !16), !dbg !17

but not eliminating the call because it may have arbitrary side effects.
In other words, we lose some debug informations.
This patch fixes the problem making sure that BDCE does nothing with the instruction if
it has side effects and no non-dbg uses.

Differential Revision:  https://reviews.llvm.org/D27471

llvm-svn: 288851
2016-12-06 21:52:47 +00:00
Tom Stellard 00cfa74715 AMDGPU/SI: Don't move copies of immediates to the VALU
Summary:
If we write an immediate to a VGPR and then copy the VGPR to an
SGPR, we can replace the copy with a S_MOV_B32 sgpr, imm, rather than
moving the copy to the SALU.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D27272

llvm-svn: 288849
2016-12-06 21:13:30 +00:00
Tim Northover 14ceb45fb4 GlobalISel: correctly handle small args via memory.
We were rounding size in bits down rather than up, leading to 0-sized slots for
i1 (assert!) and bugs for other types not byte-aligned.

llvm-svn: 288848
2016-12-06 21:02:19 +00:00
Zvi Rackover 8bc7e4da51 [X86] Prefer reduced width multiplication over pmulld on Silvermont
Summary:
Prefer expansions such as: pmullw,pmulhw,unpacklwd,unpackhwd over pmulld.
On Silvermont [source: Optimization Reference Manual]:
PMULLD has a throughput of 1/11 [instruction/cycles].
PMULHUW/PMULHW/PMULLW have a throughput of 1/2 [instruction/cycles].

Fixes pr31202.

Analysis of this issue was done by Fahana Aleen.

Reviewers: wmi, delena, mkuper

Subscribers: RKSimon, llvm-commits

Differential Revision: https://reviews.llvm.org/D27203

llvm-svn: 288844
2016-12-06 19:35:20 +00:00
Simon Pilgrim dd6ca639d5 [DAGCombine] Add (sext_in_reg (zext x)) -> (sext x) combine
Handle the case where a sign extension has ended up being split into separate stages (typically to get around vector legal ops) and a zext + sext_in_reg gets inserted.

Differential Revision: https://reviews.llvm.org/D27461

llvm-svn: 288842
2016-12-06 19:09:37 +00:00
Sanjay Patel 9b1b2de348 [InstSimplify] add folds for and-of-icmps with same operands
All of these (and a few more) are already handled by InstCombine,
but we shouldn't have to wait until then to simplify these because
they're cheap to deal with here in InstSimplify.

This is the 'and' sibling of the earlier 'or' patch:
https://reviews.llvm.org/rL288833

llvm-svn: 288841
2016-12-06 19:05:46 +00:00
Tim Northover 0a683e7bfd GlobalISel: fall back gracefully when we hit unhandled legalizer default.
llvm-svn: 288840
2016-12-06 19:02:15 +00:00
Simon Pilgrim 1577b39f51 [SelectionDAG] We can ignore knownbits from an undef shuffle vector index if we don't actually demand that element
llvm-svn: 288839
2016-12-06 18:58:25 +00:00
Sanjay Patel 827414876f [InstSimplify] add tests for and-of-icmps; NFC
llvm-svn: 288837
2016-12-06 18:46:54 +00:00
Tim Northover c1a23854f3 GlobalISel: handle G_SEQUENCE fallbacks gracefully.
There were two problems:
  + AArch64 was reusing random data from its binary op tables, which is
    complete nonsense for G_SEQUENCE.
  + Even when AArch64 gave up and said it couldn't handle G_SEQUENCE,
    the generic code asserted.

llvm-svn: 288836
2016-12-06 18:38:38 +00:00
Tim Northover f50f2f3d32 GlobalISel: allow G_SELECT instructions for pointers.
llvm-svn: 288835
2016-12-06 18:38:34 +00:00
Tim Northover 405e25cd6a GlobalISel: stop the legalizer from trying to handle oddly-sized types.
It'll almost immediately fail because it always tries to half/double the size
until it finds a legal one. Unfortunately, this triggers an assertion
preventing the DAG fallback from being possible.

llvm-svn: 288834
2016-12-06 18:38:29 +00:00
Sanjay Patel d0ccdb46b9 [InstSimplify] add folds for or-of-icmps with same operands
All of these (and a few more) are already handled by InstCombine,
but we shouldn't have to wait until then to simplify these because
they're cheap to deal with here in InstSimplify.

llvm-svn: 288833
2016-12-06 18:09:37 +00:00
George Rimar 114d335bf9 [llvm-readobj] - Teach readobj to print PT_OPENBSD_BOOTDATA header
These are OpenBSD specific program headers.

OpenBSD commit:
d39116912b

It is required for fixing PR31288.

Differential revision: https://reviews.llvm.org/D27456

llvm-svn: 288831
2016-12-06 17:55:52 +00:00
Sanjay Patel 6d4444f931 [InstSimplify] add tests for or-of-icmps; NFC
llvm-svn: 288830
2016-12-06 17:49:10 +00:00
Chris Bieneman ec758fad08 [CMake] Fixing clang standalone build
I broke this in r288770.

llvm-svn: 288829
2016-12-06 17:09:29 +00:00
Simon Pilgrim 4a2979ce12 [X86][SSE] Add knownbits test demonstrating demandedelts not ignoring undef shuffle elements
llvm-svn: 288825
2016-12-06 17:00:47 +00:00
Simon Pilgrim 0caaadfc2d [X86][SSE] Added vector sext_in_reg combine tests
llvm-svn: 288819
2016-12-06 15:57:26 +00:00
George Rimar 3840121cdd Removed trailing whitespaces. NFC.
llvm-svn: 288817
2016-12-06 15:40:02 +00:00
George Rimar 92b54b5d87 [Support/ELF] - Add OpenBSD PT_OPENBSD_BOOTDATA constant.
OpenBSD commit for reference:
d39116912b

llvm-svn: 288816
2016-12-06 15:38:15 +00:00
Simon Pilgrim 7c7b649639 [X86] Improve UMAX/UMIN knownbits test
Test the sequential effect of each op

llvm-svn: 288815
2016-12-06 15:17:50 +00:00
Simon Pilgrim 29c17f3f58 Avoid repeated calls to Op.getOpcode(). NFCI.
llvm-svn: 288814
2016-12-06 14:50:09 +00:00
Daniel Sanders 4fd1e7c628 [globalisel][aarch64] Fix unintended assumptions about PartialMappingIdx. NFC.
Summary:
This is NFC but prevents assertions when PartialMappingIdx is tablegen-erated.
The assumptions were:
1) FirstGPR is 0
2) FirstGPR is the first of the First* enumerators.

GPR32 is changed to 1 to demonstrate that assumption #1 is fixed. #2 will
be covered by a subsequent patch that tablegen-erates information and swaps
the order of GPR and FPR as a side effect.

Depends on D27336

Reviewers: ab, t.p.northover, qcolombet

Subscribers: aemerson, rengolin, vkalintiris, dberris, rovka, llvm-commits

Differential Revision: https://reviews.llvm.org/D27337

llvm-svn: 288812
2016-12-06 14:39:57 +00:00
Daniel Sanders 21765cb15e [globalisel][aarch64] Replace magic numbers with corresponding enumerators in ValMappings. NFC
Reviewers: ab, t.p.northover, qcolombet

Subscribers: aemerson, rengolin, vkalintiris, dberris, llvm-commits, rovka

Differential Revision: https://reviews.llvm.org/D27336

llvm-svn: 288810
2016-12-06 13:55:01 +00:00
Daniel Sanders 605f8cd30d [globalisel][aarch64] Correct argument names in comments.
llvm-svn: 288809
2016-12-06 13:48:58 +00:00
Simon Pilgrim e633741c3a [SLPVectorizer][X86] Tests to show missed buildvector sitofp/fptosi vectorizations
e.g.
buildvector(sitofp(i32), sitofp(i32), sitofp(i32), sitofp(i32)) --> sitofp(buildvector(i32, i32, i32, i32))

llvm-svn: 288807
2016-12-06 13:29:55 +00:00
Oliver Stannard 870b5cad45 [ARM] Better error message for invalid flag-preserving Thumb1 insts
When we see a non flag-setting instruction for which only the flag-setting
version is available in Thumb1, we should give a better error message than
"invalid instruction".

Differential Revision: https://reviews.llvm.org/D27414

llvm-svn: 288805
2016-12-06 12:59:08 +00:00
Ayman Musa 86c00b799f [X86][AVX512] Detect repeated constant patterns in BUILD_VECTOR suitable for broadcasting.
Check if a build_vector node includes a repeated constant pattern and replace it with a broadcast of that pattern.
For example:
"build_vector <0, 1, 2, 3, 0, 1, 2, 3>" would be replaced by "broadcast <0, 1, 2, 3>"

Differential Revision: https://reviews.llvm.org/D26802

llvm-svn: 288804
2016-12-06 12:24:14 +00:00
Simon Pilgrim ae63dd10f8 [X86] Add tests to show missed opportunities to calculate knownbits in SMAX/SMIN/UMAX/UMIN
llvm-svn: 288801
2016-12-06 12:12:20 +00:00
Nemanja Ivanovic 15748f4921 [PowerPC] Improvements for BUILD_VECTOR Vol. 4
This is the final patch in the series of patches that improves
BUILD_VECTOR handling on PowerPC. This adds a few peephole optimizations
to remove redundant instructions. It also adds a large test case which
encompasses a large set of code patterns that build vectors - this test
case was the motivator for this series of patches.

Differential Revision: https://reviews.llvm.org/D26066

llvm-svn: 288800
2016-12-06 11:47:14 +00:00
Daniel Sanders bfd5ff155a [globalisel][aarch64] Prefix PartialMappingIdx enumerators with 'PMI_' to fit coding standards.
This also stops things like 'None' polluting the llvm::AArch64 namespace.

llvm-svn: 288799
2016-12-06 11:33:04 +00:00
Simon Pilgrim 4c2885947e Fix MSVC -Wmicrosoft-enum-value 'enumerator value is not representable' warning
llvm-svn: 288798
2016-12-06 11:27:19 +00:00
Simon Pilgrim 9335c020c6 Fix MSVC bool to uint64_t promotion warning
llvm-svn: 288796
2016-12-06 11:12:53 +00:00
Chandler Carruth 23a6c3f746 [LCG] Add some much needed asserts and verify runs to uncover
a hilarious bug and fix it.

We somehow were never verifying the RefSCCs newly formed when
splitting an existing one apart, and when verifying them we weren't
really checking the SCC indices mapping effectively.

If we had been, it would have been blindingly obvious that right after
putting something int `RC.SCCs` we should update `RC.SCCIndices` instead
of `SCCIndices` which we were about to clear and rebuild anyways. =[

Anyways, this is thoroughly covered by existing tests now that we
actually verify things properly.

llvm-svn: 288795
2016-12-06 10:29:23 +00:00
Florian Hahn 7582c669bd [framelowering] Improve tracking of first CS pop instruction.
Summary: This patch makes sure FirstCSPop and MBBI never point to DBG_VALUE instructions, which affected the code generated.

Reviewers: mkuper, aprantl, MatzeB

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27343

llvm-svn: 288794
2016-12-06 10:24:55 +00:00
Sam McCall 03435f57aa Add missing parens in assert.
Summary: Add missing parens in assert, which warn in GCC.

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27448

llvm-svn: 288792
2016-12-06 10:14:36 +00:00
Chandler Carruth 8977223e55 [PM] Basic cleanups to CGSCC update code, NFC.
Just using InstIterator, simpler loop structures, and making better use
of the visit callback infrastructure.

llvm-svn: 288790
2016-12-06 10:06:06 +00:00
Craig Topper b34eef7b41 [X86] Remove another weird scalar sqrt/rcp/rsqrt pattern.
This pattern turned a vector sqrt/rcp/rsqrt operation of sse_load_f32/f64 into the the scalar instruction for the operation and put undef into the upper bits. For correctness, the resulting code should still perform the sqrt/rcp/rsqrt on the upper bits after the load is extended since that's what the operation asked for. Particularly in the case where the upper bits are 0, in that case we need calculate the sqrt/rcp/rsqrt of the zeroes and keep the result in the upper-bits. This implies we should be using the packed instruction still.

The only test case for this pattern is one I just added so there was no coverage of this.

llvm-svn: 288784
2016-12-06 08:08:12 +00:00
Craig Topper 26ce4267ef [X86] Add test case demonstrating a case where a vector sqrt being passed (scalar_to_vector loadf64) uses a scalar sqrt instruction.
This occurs due to a pattern that uses sse_load_f32/f64 with vector sqrt/rcp/rsqrt operations and turns them into scalar instructions. Perhaps for the case were the upper bits come from undef this is ok.  I believe a (vzmovl load64) would do the same thing but those seems to become vzload instead and selectScalarSSELoad doesn't handle that today. In that case we should be performing the vector operation on the zeros in the upper bits which is not equivalent to using a scalar instruction.

I will remove this pattern in a follow up patch. There appears to be no other test content for it.

llvm-svn: 288783
2016-12-06 08:08:09 +00:00
Craig Topper aa2c38378c [X86] Regenerate a test using update_llc_test_checks.py
llvm-svn: 288782
2016-12-06 08:08:07 +00:00
Craig Topper 683470bf1b [X86] Remove bad pattern that caused 128-bit loads being used by scalar sqrt/rcp/rsqrt intrinsics to select the memory form of the corresponding instruction and violate the semantics of the intrinsic.
The intrinsics are supposed to pass the upper bits straight through to their output register. This means we need to make sure we still perform the 128-bit load to get those upper bits to pass to give to the instruction since the memory form of the instruction only reads 32 or 64 bits.

llvm-svn: 288781
2016-12-06 08:08:04 +00:00