Commit Graph

97561 Commits

Author SHA1 Message Date
Simon Pilgrim fb58550d73 [X86][SSE] Move ZeroVector creation into the shuffle pattern case where its actually used.
Also fix the ZeroVector's type - I've no idea how this hasn't caused problems........

llvm-svn: 289336
2016-12-10 19:49:55 +00:00
Craig Topper 18b57da491 [AVX-512] Add support for lowering (v2i64 (fp_to_sint (v2f32))) to vcvttps2uqq when AVX512DQ and AVX512VL are available.
llvm-svn: 289335
2016-12-10 19:35:39 +00:00
Craig Topper 8e288e0b68 [X86] Clarify indentation. NFC
llvm-svn: 289334
2016-12-10 19:35:36 +00:00
Craig Topper 85f0e57c33 [X86] Combine LowerFP_TO_SINT and LowerFP_TO_UINT. They only differ by a single boolean flag passed to a helper function. Just check the opcode and create the flag.
llvm-svn: 289333
2016-12-10 19:35:33 +00:00
Sanjay Patel 35289c62a8 [InstSimplify] improve function name; NFC
llvm-svn: 289332
2016-12-10 17:40:47 +00:00
Simon Atanasyan edd7a7bb40 [mips] Eliminate else-after-return. NFC
llvm-svn: 289331
2016-12-10 17:30:09 +00:00
Simon Pilgrim 54945a12ec [SelectionDAG] Add ability for computeKnownBits to peek through bitcasts from 'large element' scalar/vector to 'small element' vector.
Extension to D27129 which already supported bitcasts from 'small element' vector to 'large element' scalar/vector types.

llvm-svn: 289329
2016-12-10 17:00:00 +00:00
Dylan McKay 41258cf07d [AVR] Add a stub README file
llvm-svn: 289326
2016-12-10 12:08:19 +00:00
Dylan McKay d8a603c23b [AVR] Fix and clean up the inline assembly tests
There was a bug where we would hit an assertion if 'Q' was used as a
constraint.

I also removed hardcoded register names to prefer regexes so the tests
don't break when the register allocator changes.

llvm-svn: 289325
2016-12-10 11:49:07 +00:00
Dylan McKay 801a4bd4ed [AVR] Fix an inline asm assertion which would always trigger
It looks like some time in the past, constraint codes were changed from
chars being passed around to enums.

llvm-svn: 289323
2016-12-10 11:18:37 +00:00
Dylan McKay 5c90b8cb4f [AVR] Use the register scavenger when expanding 'LDDW' instructions
Summary: This gets rid of the hardcoded 'r0' that was used previously.

Reviewers: asl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27567

llvm-svn: 289322
2016-12-10 10:51:55 +00:00
Dylan McKay 5d0233bea2 [AVR] Support stores to undefined pointers
This would previously trigger an assertion error in AVRISelDAGToDAG.

llvm-svn: 289321
2016-12-10 10:16:13 +00:00
Chandler Carruth 6b9816477b [PM] Support invalidation of inner analysis managers from a pass over the outer IR unit.
Summary:
This never really got implemented, and was very hard to test before
a lot of the refactoring changes to make things more robust. But now we
can test it thoroughly and cleanly, especially at the CGSCC level.

The core idea is that when an inner analysis manager proxy receives the
invalidation event for the outer IR unit, it needs to walk the inner IR
units and propagate it to the inner analysis manager for each of those
units. For example, each function in the SCC needs to get an
invalidation event when the SCC gets one.

The function / module interaction is somewhat boring here. This really
becomes interesting in the face of analysis-backed IR units. This patch
effectively handles all of the CGSCC layer's needs -- both invalidating
SCC analysis and invalidating function analysis when an SCC gets
invalidated.

However, this second aspect doesn't really handle the
LoopAnalysisManager well at this point. That one will need some change
of design in order to fully integrate, because unlike the call graph,
the entire function behind a LoopAnalysis's results can vanish out from
under us, and we won't even have a cached API to access. I'd like to try
to separate solving the loop problems into a subsequent patch though in
order to keep this more focused so I've adapted them to the API and
updated the tests that immediately fail, but I've not added the level of
testing and validation at that layer that I have at the CGSCC layer.

An important aspect of this change is that the proxy for the
FunctionAnalysisManager at the SCC pass layer doesn't work like the
other proxies for an inner IR unit as it doesn't directly manage the
FunctionAnalysisManager and invalidation or clearing of it. This would
create an ever worsening problem of dual ownership of this
responsibility, split between the module-level FAM proxy and this
SCC-level FAM proxy. Instead, this patch changes the SCC-level FAM proxy
to work in terms of the module-level proxy and defer to it to handle
much of the updates. It only does SCC-specific invalidation. This will
become more important in subsequent patches that support more complex
invalidaiton scenarios.

Reviewers: jlebar

Subscribers: mehdi_amini, mcrosier, mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D27197

llvm-svn: 289317
2016-12-10 06:34:44 +00:00
Craig Topper a39b650d72 [X86] Use X86ISD::CVTTP2SI and X86ISD::CVTTP2UI for lowering 128-bit cvttps2qq and cvttps2uqq intrinsics since there is a mismatch between number of input and output elements.
Ideally ISD::FP_TO_SINT and ISD::FP_TO_UINT would only be used for cases with the same number of input and output elements.

Similar things have already been done for other convert intrinsics.

llvm-svn: 289316
2016-12-10 06:02:48 +00:00
Dylan McKay f368509543 [AVR] Fix a bunch of incorrect assertion messages
These should've been checking whether the immediate is a 6-bit unsigned
integer.

If the immediate was '63', this would cause an assertion error which
shouldn't have occurred.

llvm-svn: 289315
2016-12-10 05:48:48 +00:00
Kostya Serebryany c05cb60369 [libFuzzer] test cleanup (3)
llvm-svn: 289314
2016-12-10 02:48:42 +00:00
Kostya Serebryany 832d39e9cc [libFuzzer] test cleanup (2)
llvm-svn: 289313
2016-12-10 02:47:00 +00:00
Kostya Serebryany 2f962fe5f7 [libFuzzer] test cleanup
llvm-svn: 289312
2016-12-10 02:45:56 +00:00
Kostya Serebryany 61be0f947d [libFuzzer] switch all libFuzzer tests to use -fsanitize-coverage=trace-pc-guard. Support for the previosly used instrumentation will be removed in the following changes
llvm-svn: 289311
2016-12-10 02:26:23 +00:00
Kostya Serebryany 1394ce2aa2 [libFuzzer] use __sanitizer_get_module_and_offset_for_pc to get the module name while printing the coverage
llvm-svn: 289310
2016-12-10 01:19:35 +00:00
Matt Arsenault 2402b95db0 AMDGPU: Fix AMDGPUPromoteAlloca breaking addrspacecasts
The users of the addrspacecast were having their types incorrectly
changed, producing invalid bitcasts between address spaces.

llvm-svn: 289307
2016-12-10 00:52:50 +00:00
Matt Arsenault 4bd7236193 AMDGPU: Fix handling of 16-bit immediates
Since 32-bit instructions with 32-bit input immediate behavior
are used to materialize 16-bit constants in 32-bit registers
for 16-bit instructions, determining the legality based
on the size is incorrect. Change operands to have the size
specified in the type.

Also adds a workaround for a disassembler bug that
produces an immediate MCOperand for an operand that
is supposed to be OPERAND_REGISTER.

The assembler appears to accept out of bounds immediates and
truncates them, but this seems to be an issue for 32-bit
already.

llvm-svn: 289306
2016-12-10 00:39:12 +00:00
Matt Arsenault f0c862594b AMDGPU: Fix vintrp disassembly
llvm-svn: 289292
2016-12-10 00:29:55 +00:00
Matt Arsenault 618b330dd0 AMDGPU: Change vintrp printing to better match sc
Some of the immediates need to be printed differently
eventually.

llvm-svn: 289291
2016-12-10 00:23:12 +00:00
Eugene Zelenko 2bc2f33ba2 [AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 289282
2016-12-09 22:06:55 +00:00
Adrian Prantl 8fafb8d378 Fix LLVM's use of DW_OP_bit_piece in DWARF expressions.
LLVM's use of DW_OP_bit_piece is incorrect and a based on a
misunderstanding of the wording in the DWARF specification. The offset
argument of DW_OP_bit_piece refers to the offset into the location
that is on the top of the DWARF expression stack, and not an offset
into the source variable. This has since also been clarified in the
DWARF specification.

This patch fixes all uses of DW_OP_bit_piece to emit the correct
offset and simplifies the DwarfExpression class to semi-automaticaly
emit empty DW_OP_pieces to adjust the offset of the source variable,
thus simplifying the code using DwarfExpression.

While this is an incompatible bugfix, in practice I don't expect this
to be much of a problem since LLVM's old interpretation and the
correct interpretation of DW_OP_bit_piece differ only when there are
gaps in the fragmented locations of the described variables or if
individual fragments are smaller than a byte. LLDB at least won't
interpret locations with gaps in them because is has no way to present
undefined bits in a variable, and there is a high probability that an
old-form expression will be malformed when interpreted correctly,
because the DW_OP_bit_piece offset will be outside of the location at
the top of the stack.

As a nice side-effect, this patch enables us to use a more efficient
encoding for subregisters: In order to express a sub-register at a
non-zero offset we now use a DW_OP_bit_piece instead of shifting the
value into place manually.

This patch also adds missing test coverage for code paths that weren't
exercised before.

<rdar://problem/29335809>
Differential Revision: https://reviews.llvm.org/D27550

llvm-svn: 289266
2016-12-09 20:43:40 +00:00
Marek Olsak 23ae31cca0 AMDGPU/SI: Remove XNACK feature from CI
Summary: CI doesn't have XNACK.

Reviewers: tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D27175

llvm-svn: 289263
2016-12-09 19:49:58 +00:00
Marek Olsak 0f55fbae6c AMDGPU/SI: Don't reserve XNACK when it's disabled
Summary:
This frees 2 additional scalar registers.

These are results from all of my 3 patches combined:

  Polaris:
    Spilled SGPRs: 2231 -> 1517 (-32.00 %)

  Tonga:
    Spilled SGPRs: 3829 -> 2608 (-31.89 %)
    Spilled VGPRs: 100 -> 84 (-16.00 %)

  Tonga even spills SGPRs via VGPRs to scratch. That's a compute shader
  limited to 64 VGPRs.

Reviewers: tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D27151

llvm-svn: 289262
2016-12-09 19:49:54 +00:00
Marek Olsak 693e9be918 AMDGPU/SI: Don't reserve FLAT_SCR on non-HSA targets & without stack objects
Summary: This frees 2 scalar registers.

Reviewers: tstellarAMD

Subscribers: qcolombet, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D27150

llvm-svn: 289261
2016-12-09 19:49:48 +00:00
Marek Olsak 91f22fbf4f AMDGPU/SI: Allow using SGPRs 96-101 on VI
Summary:
There is no point in setting SGPRS=104, because VI allocates SGPRs
in multiples of 16, so 104 -> 112. That enables us to use all 102 SGPRs
for general purposes.

Reviewers: tstellarAMD

Subscribers: qcolombet, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D27149

llvm-svn: 289260
2016-12-09 19:49:40 +00:00
Paul Robinson 4fa7b57a1f [DWARF] Suppress .loc directives from CFI instructions
Like DBG_VALUE, these emit nothing to the .text section, and sometimes
have no source location specified.  Just ignore them.

Differential Revision: http://reviews.llvm.org/D27492

llvm-svn: 289256
2016-12-09 19:15:32 +00:00
Matt Arsenault 7b00cf4706 AMDGPU: Fix isTypeDesirableForOp for i16
This should do nothing for targets without i16.

llvm-svn: 289235
2016-12-09 17:57:43 +00:00
Simon Pilgrim 017b7a71d8 [SelectionDAG] Add knownbits support for EXTRACT_VECTOR_ELT opcodes (REAPPLIED)
Reapplied with fix for PR31323 - X86 SSE2 vXi16 multiplies for illegal types were creating CONCAT_VECTORS nodes with vector inputs that might not total the number of elements in the result type.

llvm-svn: 289232
2016-12-09 17:53:11 +00:00
Matt Arsenault 38d8ed2b75 AMDGPU: Fix i128 mul
llvm-svn: 289231
2016-12-09 17:49:14 +00:00
Matt Arsenault 52facf0195 AMDGPU: Allow TBA, TMA, TTMP* registers with SMEM instructions
Fixes assembler regressions.

llvm-svn: 289230
2016-12-09 17:49:11 +00:00
Matt Arsenault eb4a55e066 AMDGPU: Clean up instruction bits
Sort the instruction bits by type and make sure there is one
for each format.

Also cleanup namespaces.

llvm-svn: 289229
2016-12-09 17:49:08 +00:00
Sean Fertile 1c4109b4c2 [PPC] Add intrinsics for vector extract word and vector insert word.
Revision: https://reviews.llvm.org/D26547
llvm-svn: 289227
2016-12-09 17:21:42 +00:00
Nirav Dave bedb5d906c Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r289221 which appears to be triggering an assertion

llvm-svn: 289226
2016-12-09 17:18:24 +00:00
Nirav Dave fd51ff4fd8 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Retrying after fixing overly aggressive load-store forwarding optimization.

Simplify Consecutive Merge Store Candidate Search

Now that address aliasing is much less conservative, push through
simplified store merging search which only checks for parallel stores
through the chain subgraph. This is cleaner as the separation of
non-interfering loads/stores from the store-merging logic.

Whem merging stores, search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited. This improves the quality of the
output SelectionDAG and generally the output CodeGen (with some
exceptions).

Additional Minor Changes:

   1. Finishes removing unused AliasLoad code
   2. Unifies the the chain aggregation in the merged stores across
      code paths
   3. Re-add the Store node to the worklist after calling
      SimplifyDemandedBits.
   4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
      arbitrary, but seemed sufficient to not cause regressions in
      tests.

This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.

Many tests required some changes as memory operations are now
reorderable. Some tests relying on the order were changed to use
volatile memory operations

Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -

      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and
      merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 289221
2016-12-09 16:15:12 +00:00
Simon Pilgrim b9eb99f570 Use SelectionDAG.getSplatBuildVector helper. NFCI.
llvm-svn: 289220
2016-12-09 16:01:50 +00:00
Tom Stellard 2a48433fcf AMDGPU/SI: Don't mark VINTRP instructions as mayLoad
Summary:
These instructions technically do read from memory, but the memory
is considered to be out of bounds for normal load/store instructions.

shader-db stats:

SGPRS: 1416075 -> 1413323 (-0.19 %)
VGPRS: 867413 -> 863935 (-0.40 %)
Spilled SGPRs: 1409 -> 1354 (-3.90 %)
Spilled VGPRs: 63 -> 63 (0.00 %)
Private memory VGPRs: 880 -> 880 (0.00 %)
Scratch size: 2648 -> 2632 (-0.60 %) dwords per thread
Code Size: 37889052 -> 37897340 (0.02 %) bytes
LDS: 2147 -> 2147 (0.00 %) blocks
Max Waves: 279243 -> 280369 (0.40 %)
Wait states: 0 -> 0 (0.00 %)

Reviewers: nhaehnle, mareko, arsenm

Subscribers: kzhuravl, wdng, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D27593

llvm-svn: 289219
2016-12-09 15:57:15 +00:00
Simon Pilgrim bf9c0e7434 [SelectionDAG] Use SelectionDAG.getBuildVector helper. NFCI.
Makes interception of BUILD_VECTOR creation easier for debugging.

llvm-svn: 289218
2016-12-09 15:23:41 +00:00
Simon Pilgrim 15f1f828b5 [SelectionDAG] Add additional checks to CONCAT_VECTORS creation
Part of the work for PR31323 - add extra asserts checking that the input vectors are of consistent type and result in the correct number of vector elements.

llvm-svn: 289214
2016-12-09 14:27:52 +00:00
Benjamin Kramer eedc4059c3 Plug another leak in the DWARF unittests, DIEInlineStrings are never destroyed.
llvm-svn: 289208
2016-12-09 13:33:41 +00:00
Simon Pilgrim e4050a2961 [SelectionDAG] Add partial BITCAST support to computeKnownBits
Adds support for bitcasting a little endian 'small element' vector to 'large element' scalar/vector (e.g. v16i8 to v4i32 or v2i32 to i64), which is required for PR30845. We extract the knownbits for each 'small element' part and concatenate the results together.

We can add support for big endian and 'large element' scalar/vector to 'small element' vector bitcasting once we have test cases for them.

Differential Revision: https://reviews.llvm.org/D27129

llvm-svn: 289200
2016-12-09 10:13:45 +00:00
Daniel Jasper f51e05ffbc Revert "[SelectionDAG] Add knownbits support for EXTRACT_VECTOR_ELT opcodes"
This reverts commit r288916 as it is currently causing a crasher in
Halide. Reproducer on llvm.org/PR31323. While it might be that halide is
generating invalid IR, llc shouldn't crash.

llvm-svn: 289194
2016-12-09 09:04:51 +00:00
Craig Topper 38b1b5d44f [X86] Modify patterns from memory form of RCP/RSQRT/SQRT intrinsics to only allow (scalar_to_vector (loadf32/load64)) instead of anything that sse_load_f32/f64 can match.
sse_load_f32/f64 can also match loads that are zero extended to vectors. We shouldn't match that because we wouldn't be able to get the instruction to zero the upper bits like the intrinsic semantics would require for such a case.

There is a test case that does depend on this behavior.

llvm-svn: 289193
2016-12-09 07:57:21 +00:00
Dylan McKay 18ae0f68f8 [AVR] Use a more appropriate integer type for wide IN/OUT instructions
We could previously select an integer which would hit an assertion error
in pseudo expansion.

The new type will also generate the appropriate fixups if needed, which
wasn't done beforehand.

llvm-svn: 289192
2016-12-09 07:49:14 +00:00
Dylan McKay a5d49dfbb3 [AVR] Add tests for a large number of pseudo instructions
This adds MIR tests for 24 pseudo instructions.

llvm-svn: 289191
2016-12-09 07:49:04 +00:00
Craig Topper a55b483bb5 [AVX-512] Correctly preserve the passthru semantics of the FMA scalar intrinsics
Summary:
Scalar intrinsics have specific semantics about the which input's upper bits are passed through to the output. The same input is also supposed to be the input we use for the lower element when the mask bit is 0 in a masked operation. We aren't currently keeping these semantics with instruction selection.

This patch corrects this by introducing new scalar FMA ISD nodes that indicate whether operand 1(one of the multiply inputs) or operand 3(the additon/subtraction input) should pass thru its upper bits.

We use this information to select 213/132 form for the operand 1 version and the 231 form for the operand 3 version.

We also use this information to suppress combining FNEG operations on the passthru input since semantically the passthru bits aren't negated. This is stronger than the earlier check added for a user being SELECTS so we can remove that.

This fixes PR30913.

Reviewers: delena, zvi, v_klochkov

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27144

llvm-svn: 289190
2016-12-09 06:42:28 +00:00
Matt Arsenault 27c062932a AMDGPU: Select i16 instructions to VOP3 forms
These were selecting directly to the VOP2 form instead
of VOP3 like the i32 instructions. Fixes regressions in
future commits where an immediate isn't folded because it was
initially used for the second operand.

Because uniform 16-bit operations are promoted to i32, it's
difficult to get a simple testcase where this matters. Fold
failures in SIFoldOperands here tend to be hidden by commute
and fold in SIShrinkInstructions.

llvm-svn: 289189
2016-12-09 06:19:12 +00:00
Peter Collingbourne 8db7e5e4ee Re-commit r289184, "Support: Use a 64-bit seek in raw_fd_ostream::seek()." with a configure-time check for lseek64.
llvm-svn: 289187
2016-12-09 05:20:43 +00:00
Craig Topper c4f2b0996d [X86] Add masked versions of VPERMT2* and VPERMI2* to load folding tables.
llvm-svn: 289186
2016-12-09 05:20:11 +00:00
Peter Collingbourne f74fcdd30c Revert r289184, we need more configury for Darwin and *BSD.
llvm-svn: 289185
2016-12-09 05:04:30 +00:00
Peter Collingbourne 08ba509266 Support: Use a 64-bit seek in raw_fd_ostream::seek().
llvm-svn: 289184
2016-12-09 04:57:19 +00:00
Davide Italiano 824d695231 [SCCP] Teach the pass about `mul %x 0` even if %x is overdefined.
The motivating example is:

extern int patatino;
int goo() {
    int x = 0;
    for (int i = 0; i < 1000000; ++i) {
        x *= patatino;
    }
    return x;
}

Currently SCCP will not realize that this function returns always zero,
therefore will try to unroll and vectorize the loop at -O3 producing an
awful lot of (useless) code. With this change, it will just produce:

0000000000000000 <g>:
   xor    %eax,%eax
   retq

llvm-svn: 289175
2016-12-09 03:08:42 +00:00
Craig Topper 2aeb456425 [AVX-512] Add vpermilps/pd to load folding tables.
llvm-svn: 289173
2016-12-09 02:18:11 +00:00
Craig Topper 107b187d2a [Analysis] Fix typo in comment. NFC
llvm-svn: 289171
2016-12-09 02:18:04 +00:00
Kostya Serebryany 111e1d69e3 [libFuzzer] implement crash-resistant merge (https://github.com/google/sanitizers/issues/722). This is a first experimental variant that needs some more testing, thus not yet adding a lit test (but there are unit tests).
llvm-svn: 289166
2016-12-09 01:17:24 +00:00
Peter Collingbourne 8786754cc3 WholeProgramDevirt: Teach the pass to handle structs of arrays.
This will become necessary in some cases once D22296 lands.

llvm-svn: 289165
2016-12-09 01:10:11 +00:00
Chandler Carruth 86f0bdf832 [LCG] Minor cleanup to the LCG walk over a function, NFC.
This just hoists the check for declarations up a layer which allows
various sets used in the walk to be smaller. Also moves the relevant
comments to match, and catches a few other cleanups in this code.

llvm-svn: 289163
2016-12-09 00:46:44 +00:00
Peter Collingbourne 7a1e5bbe4e Make WholeProgramDevirt understand ConstStruct vtables.
Based on a patch by LemonBoy!

Differential Revision: https://reviews.llvm.org/D26581

llvm-svn: 289162
2016-12-09 00:33:27 +00:00
Chris Bieneman 313b326bb6 [ObjectYAML] Support for DWARF debug_aranges
This patch adds support for round tripping DWARF debug_aranges in and out of YAML.

llvm-svn: 289161
2016-12-09 00:26:44 +00:00
Zia Ansari 394cef803a [InstSimplify] Add "X / 1.0" to SimplifyFDivInst.
Differential Revision: https://reviews.llvm.org/D27587

llvm-svn: 289153
2016-12-08 23:27:40 +00:00
Tim Northover b58346f2f2 GlobalISel: fall back gracefully for debug intrinsics.
Supporting them properly is a reasonably complex chunk of work, so to allow bot
testing before then we should at least be able to fall back to DAG ISel.

llvm-svn: 289150
2016-12-08 22:44:13 +00:00
Tim Northover 1e656ec137 GlobalISel: factor overflow handling into separate function. NFC.
llvm-svn: 289149
2016-12-08 22:44:00 +00:00
Davide Italiano 54c683f9e7 [SCCP] Make sure SCCP and ConstantFolding agree on undef >> a.
Currently SCCP folds the value to -1, while ConstantProp folds to
0. This changes SCCP to do what ConstantFolding does.

llvm-svn: 289147
2016-12-08 22:28:53 +00:00
Reid Kleckner 785e7d282c Don't emit .seh_handler directives for any cleanup funclets
We were falsely claiming that we had an LSDA for the relevant EH
personality before this change, which could lead to the EH machinery
interpreting random adjacent data as an LSDA.

Fixes PR31317

This change is safe because cleanups can't contain exception handlers
today. We do these things to maintain that invariant:
- C++ destructors are naturally out-of-line
- __finally blocks are outlined in clang
- LLVM's inliner will not inline EH constructs into cleanups

llvm-svn: 289101
2016-12-08 20:38:46 +00:00
Krzysztof Parzyszek 77a45576ef [RDF] Fix incorrect lane mask calculation
This was exposed by some code that used more than one level of sub-
registers. There is no testcase, because there is no such code in the
Hexagon backend.

llvm-svn: 289099
2016-12-08 20:33:45 +00:00
Matt Arsenault e96d03745d AMDGPU: Make f16 ConstantFP legal
Not having this legal led to combine failures, resulting
in dumb things like bitcasts of constants not being folded
away.

The only reason I'm leaving the v_mov_b32 hack that f32
already uses is to avoid madak formation test regressions.
PeepholeOptimizer has an ordering issue where the immediate
fold attempt is into the sgpr->vgpr copy instead of the actual
use. Running it twice avoids that problem.

llvm-svn: 289096
2016-12-08 20:14:46 +00:00
Stanislav Mekhanoshin 73b54f4134 [AMDGPU] Fix number of reserved SGPRs on CI to reflect flat scratch use
Differential Revision: https://reviews.llvm.org/D27225

llvm-svn: 289095
2016-12-08 20:07:23 +00:00
Matt Arsenault 6c06a6f48a AMDGPU: Fix commuting v_sub_u16
The correct commutable opcode was set to itself, so this
was simply swapping the operands to commute instead of also
changing the opcode to v_subrev_u16.

llvm-svn: 289093
2016-12-08 19:52:38 +00:00
Stanislav Mekhanoshin 50ea93a2bd [AMDGPU] Add amdgpu-unify-metadata pass
Multiple metadata values for records such as opencl.ocl.version, llvm.ident
and similar are created after linking several modules. For some of them, notably
opencl.ocl.version, this creates semantic problem because we cannot tell which
version of OpenCL the composite module conforms.

Moreover, such repetitions of identical values often create a huge list of
unneeded metadata, which grows bitcode size both in memory and stored on disk.
It can go up to several Mb when linked against our OpenCL library. Lastly, such
long lists obscure reading of dumped IR.

The pass unifies metadata after linking.

Differential Revision: https://reviews.llvm.org/D25381

llvm-svn: 289092
2016-12-08 19:46:04 +00:00
Peter Collingbourne 235c275b20 IR, X86: Understand !absolute_symbol metadata on global variables.
Summary:
Attaching !absolute_symbol to a global variable does two things:
1) Marks it as an absolute symbol reference.
2) Specifies the value range of that symbol's address.
Teach the X86 backend to allow absolute symbols to appear in place of
immediates by extending the relocImm and mov64imm32 matchers. Start using
relocImm in more places where it is legal.

As previously proposed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2016-October/105800.html

Differential Revision: https://reviews.llvm.org/D25878

llvm-svn: 289087
2016-12-08 19:01:00 +00:00
Chris Bieneman fbf7dfe1ba [ObjectYAML] Remove DWARF from class names
Since all the DWARF classes are in a DWARFYAML namespace having every class start with DWARF seems like a bit of overkill.

llvm-svn: 289080
2016-12-08 17:46:57 +00:00
Alexander Timofeev 18009560c5 [AMDGPU] Scalarization of global uniform loads.
Summary:
LC can currently select scalar load for uniform memory access
basing on readonly memory address space only. This restriction
originated from the fact that in HW prior to VI vector and scalar caches
are not coherent. With MemoryDependenceAnalysis we can check that the
memory location corresponding to the memory operand of the LOAD is not
clobbered along the all paths from the function entry.

Reviewers: rampitec, tstellarAMD, arsenm

Subscribers: wdng, arsenm, nhaehnle

Differential Revision: https://reviews.llvm.org/D26917

llvm-svn: 289076
2016-12-08 17:28:47 +00:00
Keno Fischer dc09119776 ConstantFolding: Don't crash when encountering vector GEP
ConstantFolding tried to cast one of the scalar indices to a vector
type. Instead, use the vector type only for the first index (which
is the only one allowed to be a vector) and use its scalar type
otherwise.

Fixes PR31250.

Reviewers: majnemer
Differential Revision: https://reviews.llvm.org/D27389

llvm-svn: 289073
2016-12-08 17:22:35 +00:00
NAKAMURA Takumi 689493bb12 Prune unused libdeps.
llvm-svn: 289060
2016-12-08 15:28:02 +00:00
NAKAMURA Takumi 9ccd966612 LanaiInstPrinter: Prune unused libdeps.
llvm-svn: 289054
2016-12-08 14:26:30 +00:00
Nicolai Haehnle f08dc90253 [SelectionDAG] Add expansion and promotion of [US]MUL_LOHI
Summary:
Most targets set the action for these nodes to Expand even though there
isn't actually any code for them in ExpandNode. Instead, targets simply
relied on the fact that no code generates these nodes as long as the
nodes aren't legal or custom.

However, generating these nodes can be useful e.g. for divide-by-constant
in wider integer types.

Expand of [US]MUL_LOHI will use MULH[US] when legal or custom, and
a sequence of half-width multiplications otherwise. Promote uses a wider
multiply.

This patch intends to not change the generated code, but indirect effects
are possible since expansions/promotions that were previously done in
DAGCombine may now be done in LegalizeDAG.

See D24822 for a change that actually uses the new expansion.

Reviewers: spatel, bkramer, venkatra, efriedma, hfinkel, ast, nadav, tstellarAMD

Subscribers: arsenm, jyknight, nemanjai, wdng, nhaehnle, llvm-commits

Differential Revision: https://reviews.llvm.org/D24956

llvm-svn: 289050
2016-12-08 14:08:14 +00:00
Nicolai Haehnle 2857dc3893 AMDGPU: Properly implement SIRegisterInfo::isFrameOffsetLegal and needsFrameBaseReg
Summary:
Without the fix to isFrameOffsetLegal to consider the instruction's
immediate offset, the new test case hits the corresponding assertion in
resolveFrameIndex, because the LocalStackSlotAllocation pass re-uses a
different base register.

With only the fix to isFrameOffsetLegal, code quality reduces in a bunch of
places because frame base registers are added where they're not needed.
This is addressed by properly implementing needsFrameBaseReg, which also
helps to avoid unnecessary zero frame indices in a bunch of other places.

Fixes piglit glsl-1.50/execution/variable-indexing/gs-output-array-vec4-index-wr.shader_test

Reviewers: arsenm, tstellarAMD

Subscribers: qcolombet, kzhuravl, wdng, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D27344

llvm-svn: 289048
2016-12-08 14:08:02 +00:00
Daniel Jasper 0f77869d58 Move DwarfGenerator.cpp to unittests
So far it creates a test helper and so it should be moved there. It also
create a layering cycle between CodeGen and CodeGen/AsmPrinter, which
should be avoided.

Review: https://reviews.llvm.org/D27570
llvm-svn: 289044
2016-12-08 12:45:29 +00:00
Alexey Bataev 4f0d469d45 [SLP] Fix for PR6246: vectorization for scalar ops on vector elements.
When trying to vectorize trees that start at insertelement instructions
function tryToVectorizeList() uses vectorization factor calculated as
MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree
cost for this fixed vectorization factor is too high.
Patch tries to improve the situation. It tries different vectorization
factors from max(PowerOf2Floor(NumberOfVectorizedValues),
MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries
to choose the best one.

Differential Revision: https://reviews.llvm.org/D27215

llvm-svn: 289043
2016-12-08 11:57:51 +00:00
Simon Pilgrim 413c8e217f Wdocumentation fix
llvm-svn: 289038
2016-12-08 10:41:41 +00:00
Oliver Stannard 68e7c21ca0 Add a comment consumer mechanism to MCAsmLexer
This allows clients to register an AsmCommentConsumer with the MCAsmLexer,
which receives a callback each time a comment is parsed.

Differential Revision: https://reviews.llvm.org/D27511

llvm-svn: 289036
2016-12-08 10:31:21 +00:00
Dylan McKay fac9ce5413 [AVR] Add an assertion to ensure we don't emit LPM when it's unsupported
llvm-svn: 289030
2016-12-08 08:34:13 +00:00
Peter Collingbourne f4257528e9 LTO: Hash the parts of the LTO configuration that affect code generation.
Most importantly, we need to hash the relocation model, otherwise we can
end up trying to link non-PIC object files into PIEs or DSOs.

Differential Revision: https://reviews.llvm.org/D27556

llvm-svn: 289024
2016-12-08 05:28:30 +00:00
Keno Fischer d4ea4c18f1 Revert "[CodeGen] Fix invalid DWARF info on Win64"
Appears to break on build bots. Reverting pending investigation.

llvm-svn: 289014
2016-12-08 01:56:23 +00:00
Keno Fischer 460218fb7d [CodeGen] Fix invalid DWARF info on Win64
The relocations for `DIEEntry::EmitValue` were wrong for Win64
(emitting FK_Data_4 instead of FK_SecRel_4). This corrects that
oversight so that the DWARF data is correct in Win64 COFF files.

Fixes PR15393.

Patch by Jameson Nash <jameson@juliacomputing.com> based on a patch
by David Majnemer.

Differential Revision: https://reviews.llvm.org/D21731

llvm-svn: 289013
2016-12-08 01:40:21 +00:00
Greg Clayton 3462a420d1 Make a DWARF generator so we can unit test DWARF APIs with gtest.
The only tests we have for the DWARF parser are the tests that use llvm-dwarfdump and expect output from textual dumps.

More DWARF parser modification are coming in the next few weeks and I wanted to add tests that can verify that we can encode and decode all form types, as well as test some other basic DWARF APIs where we ask DIE objects for their children and siblings.

DwarfGenerator.cpp was added in the lib/CodeGen directory. This file contains the code necessary to easily create DWARF for tests:

dwarfgen::Generator DG;
Triple Triple("x86_64--");
bool success = DG.init(Triple, Version);
if (!success)
  return;
dwarfgen::CompileUnit &CU = DG.addCompileUnit();
dwarfgen::DIE CUDie = CU.getUnitDIE();

CUDie.addAttribute(DW_AT_name, DW_FORM_strp, "/tmp/main.c");
CUDie.addAttribute(DW_AT_language, DW_FORM_data2, DW_LANG_C);

dwarfgen::DIE SubprogramDie = CUDie.addChild(DW_TAG_subprogram);
SubprogramDie.addAttribute(DW_AT_name, DW_FORM_strp, "main");
SubprogramDie.addAttribute(DW_AT_low_pc, DW_FORM_addr, 0x1000U);
SubprogramDie.addAttribute(DW_AT_high_pc, DW_FORM_addr, 0x2000U);

dwarfgen::DIE IntDie = CUDie.addChild(DW_TAG_base_type);
IntDie.addAttribute(DW_AT_name, DW_FORM_strp, "int");
IntDie.addAttribute(DW_AT_encoding, DW_FORM_data1, DW_ATE_signed);
IntDie.addAttribute(DW_AT_byte_size, DW_FORM_data1, 4);

dwarfgen::DIE ArgcDie = SubprogramDie.addChild(DW_TAG_formal_parameter);
ArgcDie.addAttribute(DW_AT_name, DW_FORM_strp, "argc");
// ArgcDie.addAttribute(DW_AT_type, DW_FORM_ref4, IntDie);
ArgcDie.addAttribute(DW_AT_type, DW_FORM_ref_addr, IntDie);

StringRef FileBytes = DG.generate();
MemoryBufferRef FileBuffer(FileBytes, "dwarf");
auto Obj = object::ObjectFile::createObjectFile(FileBuffer);
EXPECT_TRUE((bool)Obj);
DWARFContextInMemory DwarfContext(*Obj.get());
This code is backed by the AsmPrinter code that emits DWARF for the actual compiler.

While adding unit tests it was discovered that DIEValue that used DIEEntry as their values had bugs where DW_FORM_ref1, DW_FORM_ref2, DW_FORM_ref8, and DW_FORM_ref_udata forms were not supported. These are all now supported. Added support for DW_FORM_string so we can emit inlined C strings.

Centralized the code to unique abbreviations into a new DIEAbbrevSet class and made both the dwarfgen::Generator and the llvm::DwarfFile classes use the new class.

Fixed comments in the llvm::DIE class so that the Offset is known to be the compile/type unit offset.

DIEInteger now supports more DW_FORM values.

There are also unit tests that cover:

Encoding and decoding all form types and values
Encoding and decoding all reference types (DW_FORM_ref1, DW_FORM_ref2, DW_FORM_ref4, DW_FORM_ref8, DW_FORM_ref_udata, DW_FORM_ref_addr) including cross compile unit references with that go forward one compile unit and backward on compile unit.

Differential Revision: https://reviews.llvm.org/D27326

llvm-svn: 289010
2016-12-08 01:03:48 +00:00
Evgeniy Stepanov 0c8957c198 CFI-icall on Thumb
Replace @progbits in the section directive with %progbits, because "@" starts a comment on arm/thumb.
Use b.w branch instruction.
Use .thumb_function and .thumb_set for proper arm/thumb interwork. This way jumptable entry addresses on thumb have bit 0 set (correctly). This does not affect CFI check math, because the address of the jumptable start also has that bit set.

This does not work on thumbv5, because it does not support b.w, and the linker would not insert a veneer (trampoline?) to extend the range of b.n. We may need to do full-range plt-style jumptables on thumbv54, which are 12 bytes per entry. Another option is "push lr; bl; pop pc" (4 bytes) but that needs unwinding instructions, etc.

Differential Revision: https://reviews.llvm.org/D27499

llvm-svn: 289008
2016-12-08 00:32:26 +00:00
Matthias Braun e2d2ead661 TargetPassConfig: Rename DisablePostRA -> DisablePostRASched; NFC
llvm-svn: 289003
2016-12-08 00:16:08 +00:00
Matthias Braun 0c989a893b LivePhysReg: Use reference instead of pointer in init(); NFC
llvm-svn: 289002
2016-12-08 00:15:51 +00:00
Quentin Colombet ae3168da3f [InlineSpiller] Don't call TargetInstrInfo::foldMemoryOperand with an empty list.
Since r287792 if we try to do that we will hit an assert.

llvm-svn: 289001
2016-12-08 00:06:51 +00:00
Eugene Zelenko 9408c61830 [ADT, IR] Fix some Clang-tidy modernize-use-equals-delete and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 288989
2016-12-07 22:06:02 +00:00
Davide Italiano 1ed5396304 [BDCE] Skip metadata while replacing uses.
The fix committed in r288851 doesn't cover all the cases.
In particular, if we have an instruction with side effects
which has a no non-dbg use not depending on the bits, we still
perform RAUW destroying the dbg.value's first argument.
Prevent metadata from being replaced here to avoid the issue.

Differential Revision:  https://reviews.llvm.org/D27534

llvm-svn: 288987
2016-12-07 21:47:32 +00:00
Tim Northover c53606ef02 GlobalISel: use correct builder for ConstantExprs.
ConstantExpr instances were emitting code into the current block rather than
the entry block. This meant they didn't necessarily dominate all uses, which is
clearly wrong.

llvm-svn: 288985
2016-12-07 21:29:15 +00:00
Chris Bieneman 79e60eb948 [ObjectYAML] Pull DWARF support into DWARFYAML namespace
Since DWARF formatting is agnostic to the object file it is stored in, it doesn't make sense for this to be in the MachOYAML implementation. Pulling it into its own namespace means we could modify the ELF and COFF YAML tools to emit DWARF as well.

In a follow-up patch I will better abstract this in obj2yaml and yaml2obj so that the DWARF bits in the tools can be re-used too.

llvm-svn: 288984
2016-12-07 21:26:32 +00:00
Tim Northover 50db7f416c GlobalISel: store the current MachineFunction as direct state. NFC.
Having to ask the MIRBuilder for the current function is a little awkward, and
I'm intending to improve how that's threaded through anyway.

llvm-svn: 288983
2016-12-07 21:17:47 +00:00
Chris Bieneman 25ec226dfc [ObjectYAML] Rename DWARF entries to match section names
This change makes the yaml tags for the members of the DWARF data match the names of the DWARF sections.

llvm-svn: 288981
2016-12-07 21:09:37 +00:00
Tim Northover 05cc4859ad GlobalISel: simplify MachineIRBuilder interface.
MachineIRBuilder had weird before/after and beginning/end flags for the insert
point. Unfortunately the non-default means that instructions will be inserted
in reverse order which is almost never what anyone wants.

Really, I think we just want (like IRBuilder has) the ability to insert at any
C++ iterator-style point (i.e. before any instruction or before MBB.end()). So
this fixes MIRBuilders to behave like IRBuilders in this respect.

llvm-svn: 288980
2016-12-07 21:05:38 +00:00
Kostya Serebryany 64a055549a [libFuzzer] include FuzzerIO.h and hopefully fix the Mac build. reported by Dejan Mircevski
llvm-svn: 288979
2016-12-07 21:02:48 +00:00
Matt Arsenault 624e1b348c InstCombine: Fold bitcast of vector to FP scalar
llvm-svn: 288978
2016-12-07 20:56:11 +00:00
Eli Friedman c6885fc369 [GVNHoist] Invalidate MemDep when an instruction is moved.
See also r279907.

Fixes https://llvm.org/bugs/show_bug.cgi?id=30991 .

Differential Revision: https://reviews.llvm.org/D27493

llvm-svn: 288968
2016-12-07 19:55:59 +00:00
Michael Kuperstein 5842b20633 [X86] Skip over DEBUG_VALUE while looking for start of call sequence
If we don't skip over DEBUG_VALUEs, we get differences between -g and non-g
code.

This fixes PR31242.

Differential Revision: https://reviews.llvm.org/D27485

llvm-svn: 288965
2016-12-07 19:31:08 +00:00
Michael Kuperstein 18092cf2c3 [X86] Do not assume "ri" instructions always have an immediate operand
The second operand of an "ri" instruction may be an immediate, but it may
also be a globalvariable, so we should make any assumptions.

This fixes PR31271.

Differential Revision: https://reviews.llvm.org/D27481

llvm-svn: 288964
2016-12-07 19:29:18 +00:00
Chris Bieneman bfff254a10 Fix the apple build issue caused by r288956
Should be checking if HAVE_CRASHREPORTERCLIENT_H is defined not relying on it having a value.

llvm-svn: 288963
2016-12-07 19:28:22 +00:00
Chris Bieneman c6c0e54d3d [ObjectYAML] Support for DWARF __debug_abbrev section
This patch adds support for round-tripping DWARF debug abbreviations through the obj<->yaml tools.

llvm-svn: 288955
2016-12-07 18:52:59 +00:00
Simon Pilgrim ba05d41095 [SelectionDAG] Add knownbits support for vector demandedelts in SMAX/SMIN/UMAX/UMIN opcodes
llvm-svn: 288926
2016-12-07 17:54:00 +00:00
Simon Pilgrim c3c6463ce0 [X86][SSE] Remove AND -> VZEXT combine
This is now performed more generally by the target shuffle combine code.

Already covered by tests that were originally added in D7666/rL229480 to support combineVectorZext (or VectorZextCombine as it was known then....).

Differential Revision: https://reviews.llvm.org/D27510

llvm-svn: 288918
2016-12-07 17:02:41 +00:00
Simon Pilgrim 967325b373 [SelectionDAG] Add knownbits support for EXTRACT_VECTOR_ELT opcodes
llvm-svn: 288916
2016-12-07 16:28:21 +00:00
Simon Pilgrim ff79f31328 [SelectionDAG] Removed old knownbits TODO comment. NFCI.
EXTRACT_VECTOR_ELT does support demanded elts if the element index is known and in range.

llvm-svn: 288913
2016-12-07 15:31:12 +00:00
Matthew Simpson 364da7e527 [LV] Scalarize operands of predicated instructions
This patch attempts to scalarize the operand expressions of predicated
instructions if they were conditionally executed in the original loop. After
scalarization, the expressions will be sunk inside the blocks created for the
predicated instructions. The transformation essentially performs
un-if-conversion on the operands.

The cost model has been updated to determine if scalarization is profitable. It
compares the cost of a vectorized instruction, assuming it will be
if-converted, to the cost of the scalarized instruction, assuming that the
instructions corresponding to each vector lane will be sunk inside a predicated
block, possibly avoiding execution. If it's more profitable to scalarize the
entire expression tree feeding the predicated instruction, the expression will
be scalarized; otherwise, it will be vectorized. We only consider the cost of
the entire expression to accurately estimate the cost of the required
insertelement and extractelement instructions.

Differential Revision: https://reviews.llvm.org/D26083

llvm-svn: 288909
2016-12-07 15:03:32 +00:00
Benjamin Kramer b1332d8bf6 Try unbreaking the MSVC build.
llvm-svn: 288907
2016-12-07 13:35:11 +00:00
Dylan McKay 99b756eb40 [AVR] Expand 'SELECT_CC' nodes whereever possible
llvm-svn: 288905
2016-12-07 12:34:47 +00:00
Benjamin Kramer 926ab5b00b [LowerTypeTests] Use the TrailingObjects infrastructure for trailing objects.
Also avoid allocating ~3x as much memory as needed.

llvm-svn: 288904
2016-12-07 12:31:45 +00:00
Andrea Di Biagio ae5780104f When GVN removes a redundant load, it should not modify the debug location of the dominating load.
In the case of a fully redundant load LI dominated by an equivalent load V, GVN
should always preserve the original debug location of V. Otherwise, we risk to
introduce an incorrect stepping.
If V has debug info, then clearly it should not be modified. If V has a null
debugloc, then it is still potentially incorrect to propagate LI's debugloc
because LI may not post-dominate V.

Differential Revision: https://reviews.llvm.org/D27468

llvm-svn: 288903
2016-12-07 12:31:36 +00:00
Simon Pilgrim 8893bd95f0 [X86][SSE] Consistently set MOVD/MOVQ load/store/move instructions to integer domain
We are being inconsistent with these instructions (and all their variants.....) with a random mix of them using the default float domain.

Differential Revision: https://reviews.llvm.org/D27419

llvm-svn: 288902
2016-12-07 12:10:49 +00:00
Andrea Di Biagio eff22832c0 [InlineFunction] Refactor code in function `fixupLineNumbers' as suggested by David in D27462. NFC
llvm-svn: 288901
2016-12-07 12:01:45 +00:00
Simon Dardis 615bac37cd [mips][rtdyld] Merge code to write relocated values to the section. NFC
Preparation work for implementing N32 support.

Patch By: Daniel Sanders

Reviewers: vkalintiris, atanasyan

Differential Revision: https://reviews.llvm.org/D27460

llvm-svn: 288900
2016-12-07 11:41:23 +00:00
Simon Pilgrim d5bc5c16b2 [X86][XOP] Fix VPERMIL2 non-constant pool shuffle decoding (PR31296)
The non-constant pool version of DecodeVPERMIL2PMask was not offsetting correctly for the second input. I've updated the code to match the implementation in the constant-pool version.

Annoyingly this bug was hidden for so long as it's tricky to combine to useful variable shuffle masks that don't become constant-pool entries.

llvm-svn: 288898
2016-12-07 11:19:00 +00:00
Dylan McKay 8cec7eb6dd [AVR] Allow loading from stack slots where src and dest registers are identical
Fixes PR 31256

llvm-svn: 288897
2016-12-07 11:08:56 +00:00
Andrea Di Biagio 32d5aedd5b [InlineFunction] Do not propagate the callsite debug location to instructions inlined from functions with debug info.
When a function F is inlined, InlineFunction extends the debug location of every
instruction inlined from F by adding an InlinedAt.

However, if an instruction has a 'null' debug location, InlineFunction would
propagate the callsite debug location to it. This behavior existed since
revision 210459.

Revision 210459 was originally committed specifically to workaround the lack of
debug information for instructions inlined from intrinsic functions (which are
usually declared with attributes `__always_inline__, __nodebug__`).

The problem with revision 210459 is that it doesn't make any sort of distinction
between instructions inlined from a 'nodebug' function and instructions which
are inlined from a function built with debug info. This issue may lead to
incorrect stepping in the debugger.

This patch works under the assumption that a nodebug function does not have a
DISubprogram. When a function F is inlined into another function G,
InlineFunction checks if F has debug info associated with it.

For nodebug functions, the InlineFunction logic is unchanged (i.e. it would
still propagate the callsite debugloc to the inlined instructions). Otherwise,
InlineFunction no longer propagates the callsite debug location.

Differential Revision: https://reviews.llvm.org/D27462

llvm-svn: 288895
2016-12-07 10:37:26 +00:00
Philip Reames 02bb6a6b0b Reintroduce a check accidentally removed in 288873 to fix clang bots
I believe this is the cause of the failure, but have not been able to confirm.  Note that this is a speculative fix; I'm still waiting for a full build to finish as I synced and ended up doing a clean build which takes 20+ minutes on my machine.

llvm-svn: 288886
2016-12-07 04:48:50 +00:00
Philip Reames 29b19f0e9e Fix a warning introduced in r288874
llvm-svn: 288884
2016-12-07 04:11:22 +00:00
Tom Stellard 8485fa096e AMDGPU : Add S_SETREG instructions to fix fdiv precision issues.
Patch By: Wei Ding

Summary: This patch fixes the fdiv precision issues.

Reviewers: b-sumner, cfang, wdng, arsenm

Subscribers: kzhuravl, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D26424

llvm-svn: 288879
2016-12-07 02:42:15 +00:00
Haicheng Wu f8b834049a [AArch64] Correct the check of signed 9-bit imm in isLegalAddressingMode()
In the addressing mode, signed 9-bit imm is [-256, 255], not [-512, 511].

Differential Revision: https://reviews.llvm.org/D27480

llvm-svn: 288876
2016-12-07 01:45:04 +00:00
Chandler Carruth 5205c35075 [LCG] Add basic verification of the parent set and fix bugs it uncovers.
The existing unittests actually cover this now that we verify things.

llvm-svn: 288875
2016-12-07 01:42:40 +00:00
Philip Reames 71a496777c [LVI] Remove used return value from markX functions
llvm-svn: 288874
2016-12-07 01:03:56 +00:00
Philip Reames b47a719ac0 [LVI] Simplify mergeIn code
Remove the unused return type, use early return, use assignment operator.

llvm-svn: 288873
2016-12-07 00:54:21 +00:00
Philip Reames 864ab5c516 [LVI] Simplify obfuscated code
It doesn't matter why something is overdefined if it is...

llvm-svn: 288871
2016-12-07 00:28:28 +00:00
Peter Collingbourne 6f0b4f2e89 IR: Reduce the amount of boilerplate required for a metadata kind. NFCI.
llvm-svn: 288867
2016-12-06 23:53:01 +00:00
Tom Stellard 2187bb8a89 AMDGPU: Add llvm.amdgcn.interp.mov intrinsic
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D26725

llvm-svn: 288865
2016-12-06 23:52:13 +00:00
Matt Arsenault 269ffdac4e AMDGPU: Fix crash on i16 constant expression
llvm-svn: 288861
2016-12-06 23:18:06 +00:00
Peter Collingbourne 7357b2ad62 LowerTypeTests: Improve performance by optimising type metadata queries.
Requesting metadata for a global is a relatively expensive operation as it
involves a map lookup, but it's one that we need to do relatively frequently in
this pass to collect the list of type metadata nodes associated with a global.
This change improves the performance of type metadata queries by prebuilding
data structures that keep the global together with its list of type metadata,
and changing the pass to use that data structure wherever we were previously
passing global references around.

This change also eliminates some O(N^2) behavior by collecting the list of
globals associated with each type identifier during the first pass over the
list of globals rather than visiting each global to compute that list every
time we add a new type identifier.

Reduces pass runtime on a module containing Chrome's vtables from over 60s
to 0.9s.

Differential Revision: https://reviews.llvm.org/D27484

llvm-svn: 288859
2016-12-06 23:02:13 +00:00
Eli Friedman 0a76e3241f [CodeGen] Fix result type for SMULO/UMULO legalization
On some platforms (like MSP430) the second element of the result
structure for SMULO/UMULO may have a shorter type than the one
returned by SetCC. We need to truncate it to the right type, or
else some incorrect code may be generated later on.

This fixes issue https://github.com/rust-lang/rust/issues/37829

Patch by Vadzim Dambrouski!

Differential Revision: https://reviews.llvm.org/D27154

llvm-svn: 288857
2016-12-06 22:49:36 +00:00
Matt Arsenault ac066f354a AMDGPU: Fix operand name for v_interp_*
Other VOP instructions call the output vdst

llvm-svn: 288856
2016-12-06 22:29:43 +00:00
Sanjay Patel 5369775a84 [InstSimplify] fixed (?) to not mutate icmps
As Eli noted in the post-commit thread for r288833, the use of
swapOperands() may not be allowed in InstSimplify, so I'm 
removing those calls here pending further review. 

The swap mutates the icmp, and there doesn't appear to be precedent
for instruction mutation in InstSimplify.

I didn't actually have any tests for those cases, so I'm adding
a few here. 

llvm-svn: 288855
2016-12-06 22:09:52 +00:00
Tom Stellard 175959e350 AMDGPU/SI: Set correct value for amd_kernel_code_t::kernarg_segment_alignment
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D27416

llvm-svn: 288852
2016-12-06 21:53:10 +00:00
Davide Italiano 043e66137c [BDCE/DebugInfo] Preserve llvm.dbg.value's argument.
BDCE has two phases:
1. It asks SimplifyDemandedBits if all the bits of an instruction are dead, and if so,
replaces all its uses with the constant zero.
2. Then, it asks SimplifyDemandedBits again if the instruction is really dead
(no side effects etc..) and if so, eliminates it.

Now, in 1) if all the bits of an instruction are dead, we may end up replacing a dbg use:
  %call = tail call i32 (...) @g() #4, !dbg !15
  tail call void @llvm.dbg.value(metadata i32 %call, i64 0, metadata !8, metadata !16), !dbg !17
->
  %call = tail call i32 (...) @g() #4, !dbg !15
  tail call void @llvm.dbg.value(metadata i32 0, i64 0, metadata !8, metadata !16), !dbg !17

but not eliminating the call because it may have arbitrary side effects.
In other words, we lose some debug informations.
This patch fixes the problem making sure that BDCE does nothing with the instruction if
it has side effects and no non-dbg uses.

Differential Revision:  https://reviews.llvm.org/D27471

llvm-svn: 288851
2016-12-06 21:52:47 +00:00
Tom Stellard 00cfa74715 AMDGPU/SI: Don't move copies of immediates to the VALU
Summary:
If we write an immediate to a VGPR and then copy the VGPR to an
SGPR, we can replace the copy with a S_MOV_B32 sgpr, imm, rather than
moving the copy to the SALU.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D27272

llvm-svn: 288849
2016-12-06 21:13:30 +00:00
Tim Northover 14ceb45fb4 GlobalISel: correctly handle small args via memory.
We were rounding size in bits down rather than up, leading to 0-sized slots for
i1 (assert!) and bugs for other types not byte-aligned.

llvm-svn: 288848
2016-12-06 21:02:19 +00:00
Zvi Rackover 8bc7e4da51 [X86] Prefer reduced width multiplication over pmulld on Silvermont
Summary:
Prefer expansions such as: pmullw,pmulhw,unpacklwd,unpackhwd over pmulld.
On Silvermont [source: Optimization Reference Manual]:
PMULLD has a throughput of 1/11 [instruction/cycles].
PMULHUW/PMULHW/PMULLW have a throughput of 1/2 [instruction/cycles].

Fixes pr31202.

Analysis of this issue was done by Fahana Aleen.

Reviewers: wmi, delena, mkuper

Subscribers: RKSimon, llvm-commits

Differential Revision: https://reviews.llvm.org/D27203

llvm-svn: 288844
2016-12-06 19:35:20 +00:00
Simon Pilgrim dd6ca639d5 [DAGCombine] Add (sext_in_reg (zext x)) -> (sext x) combine
Handle the case where a sign extension has ended up being split into separate stages (typically to get around vector legal ops) and a zext + sext_in_reg gets inserted.

Differential Revision: https://reviews.llvm.org/D27461

llvm-svn: 288842
2016-12-06 19:09:37 +00:00
Sanjay Patel 9b1b2de348 [InstSimplify] add folds for and-of-icmps with same operands
All of these (and a few more) are already handled by InstCombine,
but we shouldn't have to wait until then to simplify these because
they're cheap to deal with here in InstSimplify.

This is the 'and' sibling of the earlier 'or' patch:
https://reviews.llvm.org/rL288833

llvm-svn: 288841
2016-12-06 19:05:46 +00:00
Tim Northover 0a683e7bfd GlobalISel: fall back gracefully when we hit unhandled legalizer default.
llvm-svn: 288840
2016-12-06 19:02:15 +00:00
Simon Pilgrim 1577b39f51 [SelectionDAG] We can ignore knownbits from an undef shuffle vector index if we don't actually demand that element
llvm-svn: 288839
2016-12-06 18:58:25 +00:00
Tim Northover c1a23854f3 GlobalISel: handle G_SEQUENCE fallbacks gracefully.
There were two problems:
  + AArch64 was reusing random data from its binary op tables, which is
    complete nonsense for G_SEQUENCE.
  + Even when AArch64 gave up and said it couldn't handle G_SEQUENCE,
    the generic code asserted.

llvm-svn: 288836
2016-12-06 18:38:38 +00:00
Tim Northover f50f2f3d32 GlobalISel: allow G_SELECT instructions for pointers.
llvm-svn: 288835
2016-12-06 18:38:34 +00:00
Tim Northover 405e25cd6a GlobalISel: stop the legalizer from trying to handle oddly-sized types.
It'll almost immediately fail because it always tries to half/double the size
until it finds a legal one. Unfortunately, this triggers an assertion
preventing the DAG fallback from being possible.

llvm-svn: 288834
2016-12-06 18:38:29 +00:00