Commit Graph

47843 Commits

Author SHA1 Message Date
Matt Arsenault b81495dccb AMDGPU: Match load d16 hi instructions
Also starts selecting global loads for constant address
in some cases. Some end up selecting to mubuf still, which
requires investigation.

We still get sub-optimal regalloc and extra waitcnts inserted
due to not really tracking the liveness of the separate register
halves.

llvm-svn: 313716
2017-09-20 05:01:53 +00:00
Stanislav Mekhanoshin 5670e6d482 [AMDGPU] Port of HSAIL inliner
Differential Revision: https://reviews.llvm.org/D36849

llvm-svn: 313714
2017-09-20 04:25:58 +00:00
Matt Arsenault fcc213fab7 AMDGPU: Match store d16_hi instructions
llvm-svn: 313712
2017-09-20 03:20:09 +00:00
Sanjoy Das 09613b122e Tighten the invariants around LoopBase::invalidate
Summary:
With this change:
 - Methods in LoopBase trip an assert if the receiver has been invalidated
 - LoopBase::clear frees up the memory held the LoopBase instance

This change also shuffles things around as necessary to work with this stricter invariant.

Reviewers: chandlerc

Subscribers: mehdi_amini, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D38055

llvm-svn: 313708
2017-09-20 02:31:57 +00:00
Mike Edwards b487bf45f0 Reverting due to Green Dragon bot failure.
http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/42594/

llvm-svn: 313706
2017-09-20 01:21:02 +00:00
Quentin Colombet d652aeb144 [MIRPrinter] Print empty successor lists when they cannot be guessed
This re-applies commit r313685, this time with the proper updates to
the test cases.

Original commit message:
Unreachable blocks in the machine instr representation are these
weird empty blocks with no successors.
The MIR printer used to not print empty lists of successors. However,
the MIR parser now treats non-printed list of successors as "please
guess it for me". As a result, the parser tries to guess the list of
successors and given the block is empty, just assumes it falls through
the next block (if any).

For instance, the following test case used to fail the verifier.
The MIR printer would print

         entry
        /      \
   true (def)   false (no list of successors)
       |
 split.true (use)

The MIR parser would understand this:

         entry
        /      \
   true (def)   false
       |        /  <-- invalid edge
 split.true (use)

Because of the invalid edge, we get the "def does not
dominate all uses" error.

The fix consists in printing empty successor lists, so that the parser
knows what to do for unreachable blocks.

rdar://problem/34022159

llvm-svn: 313696
2017-09-19 23:34:12 +00:00
Sam Clegg b292c25966 [WebAssembly] Add support for naming wasm data segments
Add adds support for naming data segments.  This is useful
useful linkers so that they can merge similar sections.

Differential Revision: https://reviews.llvm.org/D37886

llvm-svn: 313692
2017-09-19 23:00:57 +00:00
Quentin Colombet 6888dbcda7 Revert "[MIRPrinter] Print empty successor lists when they cannot be guessed"
This reverts commit r313685.

I thought I had ran ninja check, but apparently I didn't...
Need to update a bunch of mir tests.

llvm-svn: 313686
2017-09-19 22:03:50 +00:00
Quentin Colombet 7fdaa5e641 [MIRPrinter] Print empty successor lists when they cannot be guessed
Unreachable blocks in the machine instr representation are these
weird empty blocks with no successors.
The MIR printer used to not print empty lists of successors. However,
the MIR parser now treats non-printed list of successors as "please
guess it for me". As a result, the parser tries to guess the list of
successors and given the block is empty, just assumes it falls through
the next block (if any).

For instance, the following test case used to fail the verifier.
The MIR printer would print
          entry
         /      \
    true (def)   false (no list of successors)
        |
  split.true (use)

The MIR parser would understand this:
          entry
         /      \
    true (def)   false
        |        /  <-- invalid edge
  split.true (use)

Because of the invalid edge, we get the "def does not
dominate all uses" error.

The fix consists in printing empty successor lists, so that the parser
knows what to do for unreachable blocks.

rdar://problem/34022159

llvm-svn: 313685
2017-09-19 21:55:51 +00:00
Jake Ehrlich d246b0a284 Reland "[llvm-objcopy] Add support for nested and overlapping segments"
I didn't initialize a pointer to be nullptr that I needed to.

This change adds support for nested and even overlapping segments. This means
that PT_PHDR, PT_GNU_RELRO, PT_TLS, and PT_DYNAMIC can be supported properly.

Differential Revision: https://reviews.llvm.org/D36558

llvm-svn: 313682
2017-09-19 21:37:35 +00:00
Jonathan Roelofs 85908aa84b [ARM] Relax 'cpsie'/'cpsid' flag parsing.
The ARM docs suggest in examples that the flags can have either case, and there
are applications in the wild that (libopencm3, for example) that expect to be
able to use the uppercase spelling.

https://reviews.llvm.org/D37953

llvm-svn: 313680
2017-09-19 21:23:19 +00:00
Reid Kleckner ffdf087499 Revert "[DebugInfo] Insert DW_OP_deref when spilling indirect DBG_VALUEs"
This reverts r313640, originally r313400, one more time for essentially
the same issue. My BitVector of spilled location numbers isn't working
because we coalesce identical DBG_VALUE locations as we rewrite them,
invalidating the location numbers used to index the BitVector.

llvm-svn: 313679
2017-09-19 21:18:32 +00:00
Dehao Chen 62b9c33e1e Import all inlined indirect call targets for SamplePGO.
Summary: In the ThinLTO compilation, if a function is inlined in the profiling binary, we need to inline it before annotation. If the callee is not available in the primary module, a first step is needed to import that callee function. For the current implementation, if the call is an indirect call, which has been promoted to >1 targets and inlined, SamplePGO will only import one target with the largest sample count. This patch fixed the bug to import all targets instead.

Reviewers: tejohnson, davidxl

Reviewed By: tejohnson

Subscribers: sanjoy, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D36637

llvm-svn: 313678
2017-09-19 21:18:14 +00:00
Adrian Prantl ccc21c459b llvm-dwarfdump: un-hide more command line options
llvm-svn: 313673
2017-09-19 20:58:57 +00:00
Adrian Prantl b780d909fe Move test into non-target-specific directory.
llvm-svn: 313672
2017-09-19 20:58:56 +00:00
Stanislav Mekhanoshin d4ae470d2e [AMDGPU] Prevent post-RA scheduler from breaking memory clauses
The pre-RA scheduler does load/store clustering, but post-RA
scheduler undoes it. Add mutation to prevent it.

Differential Revision: https://reviews.llvm.org/D38014

llvm-svn: 313670
2017-09-19 20:54:38 +00:00
Ulrich Weigand 59a01a958a [SystemZ] Fix truncstore + bswap codegen bug
SystemZTargetLowering::combineSTORE contains code to transform a
combination of STORE + BSWAP into a STRV type instruction.

This transformation is correct for regular stores, but not for
truncating stores.  The routine neglected to check for that case.

Fixes a miscompilation of llvm-objcopy with clang, which caused
test suite failures in the SystemZ multistage build bot.

llvm-svn: 313669
2017-09-19 20:50:05 +00:00
Saleem Abdulrasool 5c37d57e15 Revert "ExecutionEngine: add R_AARCH64_ABS{16,32}"
This reverts commit SVN r313654.  Seems that it is triggering an
assertion on Windows specifically.  Revert until I can build on Windows
and look into what is happening there.

llvm-svn: 313668
2017-09-19 20:35:25 +00:00
Jake Ehrlich 317782122c Revert "[llvm-objcopy] Add support for .dynamic, .dynsym, and .dynstr"
This reverts commit r313663. Broken because overlapping-sections was
reverted.

llvm-svn: 313665
2017-09-19 20:00:04 +00:00
Jake Ehrlich 8f108248ba Revert "[llvm-objcopy] Add support for nested and overlapping segments"
This reverts commit r313656. Appears to be broken on Windows.

llvm-svn: 313664
2017-09-19 19:52:09 +00:00
Jake Ehrlich f20c3f4333 [llvm-objcopy] Add support for .dynamic, .dynsym, and .dynstr
This change adds support for sections involved in dynamic loading such
as SHT_DYNAMIC, SHT_DYNSYM, and allocated string tables.

The two added binaries used for tests can be downloaded [[
https://drive.google.com/file/d/0B3gtIAmiMwZXOXE3T0RobFg4ZTg/view?usp=sharing
| here ]] and [[
https://drive.google.com/file/d/0B3gtIAmiMwZXTFJSQUJZMGxNSXc/view?usp=sharing
| here ]]

Differential Revision: https://reviews.llvm.org/D36560

llvm-svn: 313663
2017-09-19 19:21:09 +00:00
David Blaikie 610e03a009 Fix test to not depend on another subdirectories Input directory
Inputs should be placed local to the test (or possibly in a common
parent? I think we do that in some places - but the only common parent
between these two directories is 'test' which seems a bit overly broad).

llvm-svn: 313662
2017-09-19 19:20:08 +00:00
Jake Ehrlich 523560ef57 [llvm-objcopy] Add test to check that architecture specific values are not used on wrong architecture.
This change adds a test that checks the an error is produced when a hexagon
specific reserved section index is used but e_machine is not EM_HEXAGON.

Differential Revision: https://reviews.llvm.org/D38017

llvm-svn: 313661
2017-09-19 19:05:15 +00:00
Dehao Chen b6e60c8b80 Handle profile mismatch correctly for SamplePGO.
Summary: Fix the bug when promoted call return type mismatches with the promoted function, we should not try to inline it. Otherwise it may lead to compiler crash.

Reviewers: davidxl, tejohnson, eraman

Reviewed By: tejohnson

Subscribers: llvm-commits, sanjoy

Differential Revision: https://reviews.llvm.org/D38018

llvm-svn: 313658
2017-09-19 18:26:54 +00:00
Reid Kleckner 26fa1bf4da Re-land "Fix Bug 30978 by emitting cv file checksums."
This reverts r313431 and brings back r313374 with a fix to write
checksums as binary data and not ASCII hex strings.

llvm-svn: 313657
2017-09-19 18:14:45 +00:00
Jake Ehrlich 0a84b1ac80 [llvm-objcopy] Add support for nested and overlapping segments
This change adds support for nested and even overlapping segments. This means
that PT_PHDR, PT_GNU_RELRO, PT_TLS, and PT_DYNAMIC can be supported properly.

Differential Revision: https://reviews.llvm.org/D36558

llvm-svn: 313656
2017-09-19 18:14:03 +00:00
Saleem Abdulrasool 6567ecd741 ExecutionEngine: add R_AARCH64_ABS{16,32}
Add support for the R_AARCH64_ABS{16,32} relocations in the execution
engine.  This is primarily used for DWARF debug information relocations
and needed by the LLVM JIT to support JITing for lldb.

Patch by Alex Langford!

llvm-svn: 313654
2017-09-19 18:00:50 +00:00
Reid Kleckner 86c74dd791 Re-land r313400 "[DebugInfo] Insert DW_OP_deref when spilling indirect DBG_VALUEs"
I forgot to zero out the BitVector when reusing it between UserValues.

Later uses of the same location number for a different UserValue would
falsely indicate that they were spilled. Usually this would lead to
incorrect debug info, but in some cases they would indicate something
nonsensical like a memory location based on a vector register (Q8 on
ARM).

llvm-svn: 313640
2017-09-19 16:32:15 +00:00
Tony Jiang 2d9c5f3b8b [PowerPC Peephole] Constants into a join add, use ADDI over LI/ADD.
Two blocks prior to the join each perform an li and the the join block has an
add using the initialized register. Optimize each predecessor block to instead
use addi and delete the li's and add.

Differential Revision: https://reviews.llvm.org/D36734

llvm-svn: 313639
2017-09-19 16:14:37 +00:00
Evandro Menezes 0a98abc67c [AArch64] Extend tests of loads and stores of register pairs
Include instances of FP register pairs.

llvm-svn: 313638
2017-09-19 15:46:35 +00:00
Tony Jiang 425071eff3 [Power9] Add missing Power9 instructions.
The following 8 instructions are implemented in this patch.
addpcis(subpcis, lnia), darn, maddhd, maddhdu, maddld, setb

llvm-svn: 313636
2017-09-19 15:22:36 +00:00
Daniel Sanders 83e23d1398 [globalisel] Add a G_BSWAP instruction and support bswap using it.
llvm-svn: 313633
2017-09-19 14:25:15 +00:00
Simon Pilgrim d5e2878252 [X86][SSE] Add 'redundant pand' test case from PR34620
llvm-svn: 313632
2017-09-19 14:02:16 +00:00
Sanjay Patel bd7958d7ca [x86] regenerate checks; NFC
llvm-svn: 313631
2017-09-19 13:43:09 +00:00
Alexey Bataev 10ad32b76d [SLP] Reduce test, NFC.
llvm-svn: 313630
2017-09-19 13:38:56 +00:00
Daniel Sanders 000327742f [globalisel] Add support for intrinsic_void
llvm-svn: 313629
2017-09-19 13:23:01 +00:00
Daniel Sanders 28887fe548 [globalisel] Add support for intrinsic_w_chain.
This maps directly to G_INTRINSIC_W_SIDE_EFFECTS.

llvm-svn: 313627
2017-09-19 12:56:36 +00:00
Jina Nahias ccfb8d4fe8 [x86] Lowering Mask Set1 intrinsics to LLVM IR
This patch, together with a matching clang patch (https://reviews.llvm.org/D37668), implements the lowering of X86 mask set1 intrinsics to IR.

Differential Revision: https://reviews.llvm.org/D37669

llvm-svn: 313625
2017-09-19 11:03:06 +00:00
Roger Ferrer Ibanez 8d0180c955 [ARM] Use ADDCARRY / SUBCARRY
This is a preparatory step for D34515.

This change:
 - makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32
 - lowering is done by first converting the boolean value into the carry flag
   using (_, C) ← (ARMISD::ADDC R, -1) and converted back to an integer value
   using (R, _) ← (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two
   operations does the actual addition.
 - for subtraction, given that ISD::SUBCARRY second result is actually a
   borrow, we need to invert the value of the second operand and result before
   and after using ARMISD::SUBE. We need to invert the carry result of
   ARMISD::SUBE to preserve the semantics.
 - given that the generic combiner may lower ISD::ADDCARRY and
   ISD::SUBCARRYinto ISD::UADDO and ISD::USUBO we need to update their lowering
   as well otherwise i64 operations now would require branches. This implies
   updating the corresponding test for unsigned.
 - add new combiner to remove the redundant conversions from/to carry flags
   to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) → C
 - fixes PR34045
 - fixes PR34564

Differential Revision: https://reviews.llvm.org/D35192

llvm-svn: 313618
2017-09-19 09:05:39 +00:00
Andrei Elovikov 142516b456 Test commit.
llvm-svn: 313617
2017-09-19 07:56:20 +00:00
Matt Arsenault e745d9963e AMDGPU: Run internalize symbols at -O0
The relocations used for externally visible functions
aren't supported, so the direct call emitted ends
up hitting a linker error.

llvm-svn: 313616
2017-09-19 07:40:11 +00:00
Gadi Haber 6f8fbf4b86 [X86][Skylake] Adding the scheduling information for the SkylakeClient target
This patch adds the instruction scheduling information for the SkylakeClient (SKL) architecture target by adding the file X86SchedSkylakeClient.td located under the X86 Target.
We used the scheduling information retrieved from the Skylake architects in order to create the file.
The scheduling information includes latency, number of micro-Ops and used ports by each SKL instruction.
The patch continues the scheduling replacement and insertion effort started with the SNB target in r307529 and r310792 and for HSW in r311879.

Please expect some performance fluctuations due to code alignment effects.

Reviewers: craig.topper, zvi, chandlerc, igorb, aymanmus, RKSimon, delena
Differential Revision: https://reviews.llvm.org/D37294

llvm-svn: 313613
2017-09-19 06:19:27 +00:00
Craig Topper a80949feb5 [X86] Add VPERMPD/VPERMQ and VPERMPS/VPERMD to the execution domain fixing table.
llvm-svn: 313610
2017-09-19 04:39:55 +00:00
Vedant Kumar b7fdaf2cd4 [llvm-cov] Make report metrics agree with line exec counts, fixes PR34615
Use the same logic as the line-oriented coverage view to determine the
number of covered lines in a function.

Fixes llvm.org/PR34615.

llvm-svn: 313604
2017-09-19 02:00:12 +00:00
Vedant Kumar ad8f637bd8 [Coverage] Use gap regions to select better line exec counts
After clang started emitting deferred regions (r312818), llvm-cov has
had a hard time picking reasonable line execuction counts. There have
been one or two generic improvements in this area (e.g r310012), but
line counts can still report coverage for whitespace instead of code
(llvm.org/PR34612).

To fix the problem:

 * Introduce a new region kind so that frontends can explicitly label
   gap areas.

   This is done by changing the encoding of the columnEnd field of
   MappingRegion. This doesn't substantially increase binary size, and
   makes it easy to maintain backwards-compatibility.

 * Don't set the line count to a count from a gap area, unless the count
   comes from a wrapped segment.

 * Don't highlight gap areas as uncovered.

Fixes llvm.org/PR34612.

llvm-svn: 313597
2017-09-18 23:37:28 +00:00
Vedant Kumar 27d1b14ab0 [llvm-cov] Repair a test. NFC.
The checks with the MARKER prefix were not being run over the right
input, because stderr was not redirected properly.

llvm-svn: 313596
2017-09-18 23:37:27 +00:00
Yonghong Song 9ef85f0677 bpf: add inline-asm support
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 313593
2017-09-18 23:29:36 +00:00
Yi Kong bb4b4eef61 [ThinLTO/gold] Implement ThinLTO cache pruning support
Differential Revision: https://reviews.llvm.org/D37993

llvm-svn: 313592
2017-09-18 23:24:55 +00:00
Hans Wennborg 4de620ab3d Revert r313400 "[DebugInfo] Insert DW_OP_deref when spilling indirect DBG_VALUEs"
This caused asserts in Chromium. See http://crbug.com/766261

> Summary:
> This comes up in optimized debug info for C++ programs that pass and
> return objects indirectly by address. In these programs,
> llvm.dbg.declare survives optimization, which causes us to emit indirect
> DBG_VALUE instructions. The fast register allocator knows to insert
> DW_OP_deref when spilling indirect DBG_VALUE instructions, but the
> LiveDebugVariables did not until this change.
>
> This fixes part of PR34513. I need to look into why this doesn't work at
> -O0 and I'll send follow up patches to handle that.
>
> Reviewers: aprantl, dblaikie, probinson
>
> Subscribers: qcolombet, hiraditya, llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D37911

llvm-svn: 313589
2017-09-18 23:08:42 +00:00
Zachary Turner d4401d354a [lit] Update clang and lld to use new config helpers.
NFC intended here, this only updates clang and lld's lit configs
to use some helper functionality in the lit.llvm submodule.

llvm-svn: 313579
2017-09-18 22:26:48 +00:00
Sanjay Patel f31b1a00ea [DAGCombiner] fold assertzexts separated by trunc
If we have an AssertZext of a truncated value that has already been AssertZext'ed, 
we can assert on the wider source op to improve the zext-y knowledge:
 assert (trunc (assert X, i8) to iN), i1 --> trunc (assert X, i1) to iN

This moves a fold from being Mips-specific to general combining, and x86 shows
improvements.

Differential Revision: https://reviews.llvm.org/D37017

llvm-svn: 313577
2017-09-18 22:05:35 +00:00
Sanjay Patel 709a804a0a [InstCombine] auto-generate complete checks; NFC
The code responsible for these transforms has the potential to add 2 
instructions and break min/max patterns (PR33301).

llvm-svn: 313575
2017-09-18 21:57:56 +00:00
Adrian Prantl c2bc717028 llvm-dwarfdump: add a --show-parents options when selectively dumping DIEs.
llvm-svn: 313567
2017-09-18 21:27:44 +00:00
Adrian Prantl f077b5b885 Fix typo in testcase.
llvm-svn: 313566
2017-09-18 21:27:42 +00:00
Konstantin Zhuravlyov ca8946a376 AMDGPU: Start selecting s_xnor_{b32, b64}
Differential Revision: https://reviews.llvm.org/D37981

llvm-svn: 313565
2017-09-18 21:22:45 +00:00
Sanjay Patel 7765c93be2 [DAG, x86] allow store merging before and after legalization (PR34217)
rL310710 allowed store merging to occur after legalization to catch stores that are created late,
but this exposes a logic hole seen in PR34217:
https://bugs.llvm.org/show_bug.cgi?id=34217

We will miss merging stores if the target lowers vector extracts into target-specific operations.
This patch allows store merging to occur both before and after legalization if the target chooses
to get maximum merging.

I don't think the potential regressions in the other tests are relevant. The tests are for
correctness of weird IR constructs rather than perf tests, and I think those are still correct.

Differential Revision: https://reviews.llvm.org/D37987

llvm-svn: 313564
2017-09-18 20:54:26 +00:00
Craig Topper 39cdb84560 [X86] Make sure we still emit zext for GR32 to GR64 when the source of the zext is AssertZext
The AssertZext we might see in this case is only giving information about the lower 32 bits. It isn't providing information about the upper 32 bits. So we should emit a zext.

This fixes PR28540.

Differential Revision: https://reviews.llvm.org/D37729

llvm-svn: 313563
2017-09-18 20:49:13 +00:00
Alexey Bataev 286fe620ef [SLP] Add a test for PR34635, NFC.
llvm-svn: 313559
2017-09-18 19:33:30 +00:00
Sanjay Patel 74d12b5697 [x86] add tests for PR34217; NFC
llvm-svn: 313548
2017-09-18 18:07:50 +00:00
Simon Pilgrim 4aa28b9730 [X86][AVX] Improve (i8 bitcast (v8i1 x)) handling for 256-bit vector compare results.
As commented on D37849, AVX1 targets were missing a chance to use vmovmskps for v8f32/v8i32 results for bool vector bitcasts

llvm-svn: 313547
2017-09-18 17:58:31 +00:00
Sanjay Patel 078d5d978c [x86] regenerate checks; NFC
llvm-svn: 313545
2017-09-18 17:33:47 +00:00
Manoj Gupta 7476f629ed [LoopVectorizer] Add more testcases for PR33804.
Summary:
Add test cases when float <-> pointer types conversion is triggered
in presence of load instructions.

Reviewers: Ayal, srhines, mkuper, rengolin

Reviewed By: rengolin

Subscribers: javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D37967

llvm-svn: 313544
2017-09-18 17:28:15 +00:00
Simon Pilgrim 0b21ef1fa3 [SelectionDAG] Add BITCAST handling to ComputeNumSignBits for splatted sign bits.
For cases where we are BITCASTing to vectors of smaller elements, then if the entire source was a splatted sign (src's NumSignBits == SrcBitWidth) we can say that the dst's NumSignBit == DstBitWidth, as we're just splitting those sign bits across multiple elements.

We could generalize this but at the moment the only use case I have is to peek through bitcasts to vector comparison results.

Differential Revision: https://reviews.llvm.org/D37849

llvm-svn: 313543
2017-09-18 16:45:05 +00:00
Craig Topper 77d7f331dd [X86] Fix two more places to prefer VPERMQ/PD over VPERM2X128 when AVX2 is enabled
The shuffle combining and lowerVectorShuffleAsLanePermuteAndBlend were both still trying to use VPERM2XF128 for unary shuffles when AVX2 is enabled. VPERM2X128 takes two inputs meaning when we use it for a unary shuffle one of those inputs is left undefined creating a false dependency on whatever register gets allocated there.

If we have VPERMQ/PD we should prefer those since they only have a single input.

Differential Revision: https://reviews.llvm.org/D37947

llvm-svn: 313542
2017-09-18 16:39:49 +00:00
Sam Parker 3fa0ccffc6 [AArch64] Add V8_2aOps feature to Cortex-A55 and 75
Add the missing hardware features the ProcA55 and ProcA75 feature.
These are already enabled via the target parser, but I had missed
them in the backend.

Differential Revision: https://reviews.llvm.org/D37974

llvm-svn: 313535
2017-09-18 14:46:14 +00:00
Sam Parker 71efbe4c68 [ARM] Implement isTruncateFree
Implement the isTruncateFree hooks, lifted from AArch64, that are
used by TargetTransformInfo. This allows simplifycfg to reduce the
test case into a single basic block.

Differential Revision: https://reviews.llvm.org/D37516

llvm-svn: 313533
2017-09-18 14:28:51 +00:00
Simon Pilgrim 00161c9961 [X86][SSE] Improve support for vselect(Cond, 0, X) -> ANDN(Cond, X)
As discussed on PR28925 and D37849.

Differential Revision: https://reviews.llvm.org/D37975

llvm-svn: 313532
2017-09-18 14:23:23 +00:00
Sjoerd Meijer 4e6df15962 [ARM] Fix for indexed dot product instruction descriptions
The indexed dot product instructions only accept the lower 16 D-registers as
the indexed register, but we were e.g. incorrectly accepting:

vudot.u8 d16,d16,d18[0]

Differential Revision: https://reviews.llvm.org/D37968

llvm-svn: 313531
2017-09-18 14:17:57 +00:00
Jonas Devlieghere c0a758d8ab [dwarfdump] Make .eh_frame an alias for .debug_frame
This patch makes the `.eh_frame` extension an alias for `.debug_frame`.
Up till now it was only possible to dump the section using objdump, but
not with dwarfdump. Since the two are essentially interchangeable, we
dump whichever of the two is present.

As a workaround, this patch also adds parsing for 3 currently
unimplemented CFA instructions: `DW_CFA_def_cfa_expression`,
`DW_CFA_expression`, and `DW_CFA_val_expression`. Because I lack the
required knowledge, I just parse the fields without actually creating
the instructions.

Finally, this also fixes the typo in the `.debug_frame` section name
which incorrectly contained a trailing `s`.

Differential revision: https://reviews.llvm.org/D37852

llvm-svn: 313530
2017-09-18 14:15:57 +00:00
Simon Pilgrim 360629d170 [X86][SSE] Add vselect with zero tests (PR28925)
llvm-svn: 313529
2017-09-18 13:32:33 +00:00
Nikolai Bozhenov 84af99b3b1 [X86FixupBWInsts] More precise register liveness if no <imp-use> on MOVs.
Summary:
Subregister liveness tracking is not implemented for X86 backend, so
sometimes the whole super register is said to be live, when only a
subregister is really live. That might happen if the def and the use
are located in different MBBs, see added fixup-bw-isnt.mir test.

However, using knowledge of the specific instructions handled by the
bw-fixup-pass we can get more precise liveness information which this
change does.

Reviewers: MatzeB, DavidKreitzer, ab, andrew.w.kaylor, craig.topper

Reviewed By: craig.topper

Subscribers: n.bozhenov, myatsina, llvm-commits, hiraditya

Patch by Andrei Elovikov <andrei.elovikov@intel.com>

Differential Revision: https://reviews.llvm.org/D37559

llvm-svn: 313524
2017-09-18 10:17:59 +00:00
Mohammed Agabaria 77cb080c2d [X86][Codegen] adding masked gathers tests for avx2
related to patch: https://reviews.llvm.org/D35772
adding llvm gathers test before gathers codegen support.

Differential Revision: https://reviews.llvm.org/D37800

llvm-svn: 313516
2017-09-18 06:49:54 +00:00
Dean Michael Berris 0f84a7d355 [XRay][tools] Support tail-call exits before we write them in the runtime
Summary:
This change adds support for explicit tail-exit records to be written by
the XRay runtime. This lets us differentiate the tail exit
records/events in the log, and allows us to treat those exit events
especially in the future. For now we allow printing those out in YAML
(and reading them in).

Reviewers: kpw, pelikan

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37964

llvm-svn: 313514
2017-09-18 06:08:46 +00:00
Craig Topper a6054328e8 [X86] Teach the execution domain fixing tables to use movlhps inplace of unpcklpd for the packed single domain.
MOVLHPS has a smaller encoding than UNPCKLPD in the legacy encodings. With VEX and EVEX encodings it doesn't matter.

llvm-svn: 313509
2017-09-18 04:40:58 +00:00
Craig Topper 87f7381edf [X86] Teach execution domain fixing to convert between FP and int unpack instructions.
llvm-svn: 313508
2017-09-18 03:29:54 +00:00
Craig Topper d4341920d5 [X86] Teach execution domain fixing to convert between VPERMILPS and VPSHUFD.
llvm-svn: 313507
2017-09-18 03:29:47 +00:00
Craig Topper ee6646d7de [X86] Teach shuffle lowering to use MOVLHPS/MOVHLPS for lowering v4f32 unary shuffles with SSE1 only.
llvm-svn: 313504
2017-09-17 22:36:41 +00:00
Craig Topper 6c221690a3 [X86] Add a couple more unary shuffles to the sse1 shuffle test.
These can be implemented with movlhps and movhlps.

llvm-svn: 313503
2017-09-17 22:36:39 +00:00
Jatin Bhateja 356e3e2c1d Adding test cases for PR34629 & PR34634.
Differential Revision: https://reviews.llvm.org/D37962

llvm-svn: 313490
2017-09-17 18:16:26 +00:00
Alex Bradbury 8ab4a9696a [RISCV] Add support for disassembly
This Disassembly support allows for 'round-trip' testing, and rv32i-valid.s
has been updated appropriately.

Differential Revision: https://reviews.llvm.org/D23567

llvm-svn: 313486
2017-09-17 14:36:28 +00:00
Alex Bradbury 6758ecb98c [RISCV] Add support for all RV32I instructions
This patch supports all RV32I instructions as described in the RISC-V manual.
A future patch will add support for pseudoinstructions and other instruction
expansions (e.g. 0-arg fence -> fence iorw, iorw).

Differential Revision: https://reviews.llvm.org/D23566

llvm-svn: 313485
2017-09-17 14:27:35 +00:00
Igor Breger f1d388a5c5 [GlobalISel][X86] Legalize i1 G_ADD/G_SUB/G_MUL/G_XOR/G_OR/G_AND instructions.
llvm-svn: 313483
2017-09-17 11:34:17 +00:00
Igor Breger 0f382ccb68 [GlobalISel][X86] Use correct physical register in mir tests.NFC.
llvm-svn: 313479
2017-09-17 08:30:42 +00:00
Igor Breger 21200ed7af [GlobalISel][X86] G_FCONSTANT support.
Summary: G_FCONSTANT support, port the implementation from X86FastIsel.

Reviewers: zvi, delena, guyblank

Reviewed By: delena

Subscribers: rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D37734

llvm-svn: 313478
2017-09-17 08:08:13 +00:00
Zachary Turner 6326d56195 [llvm-symbolizer] Fix coff-dwarf.test
This was a bug in the test that was only exposed as a result of
refactoring some code in lit configuration files.  Previously,
llvm's lit configuration would only set the target-windows feature
if the system was also windows.  Since cross-compilation is
a thing, this isn't correct.  target-windows should be set
independently of system-windows.

Adding to that bug, this particular test then checked for
target-windows when it really meant "can I call a certain API on
the host machine", which is what system-windows is for.

Ultimately, this test only works if *both* the target and host
are Windows, so I've updated the test to reflect that.

llvm-svn: 313468
2017-09-16 19:01:04 +00:00
Zachary Turner c3023d1bd1 Resubmit "Add a shared llvm.lit module that all test suites can use."
There were some issues surrounding Py2 / Py3 compatibility, but
I've now tested with both Py2 and Py3 and everything seems to
work.

llvm-svn: 313467
2017-09-16 18:46:21 +00:00
Adrian Prantl 597aa48d11 llvm-dwarfdump: support a --show-children option
This will print all children of a DIE when selectively printing only
one DIE at a given offset.

llvm-svn: 313464
2017-09-16 17:28:00 +00:00
Adrian Prantl 099d7e452a llvm-dwarfdump: Add support for -debug-types=<offset>.
llvm-svn: 313463
2017-09-16 16:58:18 +00:00
George Rimar 762abff698 [llvm-readobj] - Teach tool to report error if some section is in multiple COMDAT groups at once.
readelf tool reports an error when output contains the same section
in multiple COMDAT groups. That can be useful.
Path teaches llvm-readobj to do the same.

Differential revision: https://reviews.llvm.org/D37567

llvm-svn: 313459
2017-09-16 14:29:51 +00:00
Sanjay Patel 65d6780703 [x86] enable storeOfVectorConstantIsCheap() target hook
This allows vector-sized store merging of constants in DAGCombiner using the existing code in MergeConsecutiveStores(). 
All of the twisted logic that decides exactly what vector operations are legal and fast for each particular CPU are 
handled separately in there using the appropriate hooks.

For the motivating tests in merge-store-constants.ll, we already produce the same vector code in IR via the SLP vectorizer. 
So this is just providing a backend backstop for code that doesn't go through that pass (-O1). More details in PR24449:
https://bugs.llvm.org/show_bug.cgi?id=24449 (this change should be the last step to resolve that bug)

Differential Revision: https://reviews.llvm.org/D37451

llvm-svn: 313458
2017-09-16 13:29:12 +00:00
Craig Topper 23f78c1662 [X86] Add isel patterns to be able to fold loads into VPERM2F128 even when the load is on the first input to the SDNode.
We just need to toggle bits 1 and 5 of the immediate and swap the sources. The peephole pass could trigger commuting/folding for this later, but its easy enough to fix in isel.

Disable the peephole pass on the main vperm2x128 test so we know we're doing this through isel.

llvm-svn: 313455
2017-09-16 09:16:48 +00:00
Craig Topper 0d1b519f78 [X86] Remove unused check lines that got left behind when I moved tests to the instrinsic upgrade file and regenerated.
llvm-svn: 313454
2017-09-16 09:16:46 +00:00
Craig Topper 8374ffde08 [X86] Remove the vperm2f128 test file I just added in r313450.
I missed the we already had a pretty thorough test file for these instructions.

llvm-svn: 313451
2017-09-16 07:51:01 +00:00
Craig Topper f264fcc704 [X86] Remove VPERM2F128/VPERM2I128 intrinsics and autoupgrade to native shuffles.
I've moved the test cases from the InstCombine optimizations to the backend to keep the coverage we had there. It covered every possible immediate so I've preserved the resulting shuffle mask for each of those immediates.

llvm-svn: 313450
2017-09-16 07:36:14 +00:00
Craig Topper aa499c1cb2 [X86] Fix some FileCheck lines that use the wrong prefix.
Assume they were moved during autoupgrading and not changed.

llvm-svn: 313448
2017-09-16 07:13:39 +00:00
Craig Topper 950b19515a [X86] Don't set reserved bits in the immediate in the test cases for vperm2f128.
I'm going to autoupgrade these intrinsics in a future commit. This bit will never be set in the resulting output so pre-removing the bit.

llvm-svn: 313434
2017-09-16 02:11:21 +00:00
Craig Topper 9313df747d [X86] Remove slash in front of a CHECK line in a test.
llvm-svn: 313433
2017-09-16 01:43:21 +00:00
Eric Beckmann 913213c8ae Revert "Fix Bug 30978 by emitting cv file checksums."
This reverts commit 6389e7aa724ea7671d096f4770f016c3d86b0d54.

There is a bug in this implementation where the string value of the
checksum is outputted, instead of the actual hex bytes.  Therefore the
checksum is incorrect, and this prevent pdbs from being loaded by visual
studio.  Revert this until the checksum is emitted correctly.

llvm-svn: 313431
2017-09-16 01:14:36 +00:00
Zachary Turner 525b09d347 Revert lit changes related to lit.llvm module.
It looks like this is going to be non-trivial to get working
in both Py2 and Py3, so for now I'm reverting until I have time
to fully test it under Python 3.

llvm-svn: 313429
2017-09-16 00:52:49 +00:00
Zachary Turner 2aa5a92bdb Resubmit "[lit] Add a lit.llvm module that all llvm projects can use"
This was reverted alongside the revert of the lit/llvm-lit refactor,
but now that that has re-landed, I'm relanding this as well.

llvm-svn: 313426
2017-09-16 00:25:58 +00:00
Craig Topper d02179cd9c [X86] Remove usages of vperm2f intrinsics from fast isel tests to match what clang generates after r313418.
llvm-svn: 313424
2017-09-15 23:53:43 +00:00
Adrian Prantl 057d336c0d llvm-dwarfdump: Add support for -debug-info=<offset>.
This is the first of many commits that enable selectively dumping just
one record from the debug info.

This reapplies r313412 with some extra qualification to appease GCC and MSVC.

llvm-svn: 313419
2017-09-15 23:04:04 +00:00
Vedant Kumar 51d8f887db [llvm-cov] Avoid over-counting covered lines and regions
* Fix an unsigned integer overflow in the logic that computes the
  number of uncovered lines in a function.

* When aggregating region and line coverage summaries, take into account
  that different instantiations may have a different number of regions.

The new test case provides test coverage for both bugs. I also verified
this change by preparing a coverage report for a stage2 build of llc --
the new assertions should detect any outstanding over-counting bugs.

Fixes PR34613.

llvm-svn: 313417
2017-09-15 23:00:02 +00:00
Adrian Prantl b5abcc558d Revert "llvm-dwarfdump: Add support for -debug-info=<offset>."
This reverts commit r313412 because of a g++ incompatibility.

llvm-svn: 313413
2017-09-15 22:47:16 +00:00
Adrian Prantl fb5d284e97 llvm-dwarfdump: Add support for -debug-info=<offset>.
This is the first of many commits that enable selectively dumping just
one record from the debug info.

llvm-svn: 313412
2017-09-15 22:37:56 +00:00
Guozhi Wei 3d1305f6da [TargetTransformInfo] Static alloca has 0 cost
Static alloca usually doesn't generate any machine instructions, so it has 0 cost.

Differential Revision: https://reviews.llvm.org/D37879

llvm-svn: 313410
2017-09-15 22:28:12 +00:00
Chandler Carruth beb22b5437 [SLP] Revert r312791 and other necessary commits, except for TTI and
CostModel.

The original patch added support for horizontal min/max reductions to
the SLP vectorizer.

This patch causes LLVM to miscompile fairly simple signed min
reductions. I have attached a test progrom to http://llvm.org/PR34635
that shows the behavior change after this patch. We found this in a test
for the open source Eigen library, but also in other code.

Unfortunately, the revert is moderately challenging. It required
reverting:
r313042: [SLP] Test with multiple uses of conditional op and wrong parent.
r312853: [SLP] Fix buildbots, NFC.
r312793: [SLP] Fix the warning about paths not returning the value, NFC.
r312791: [SLP] Support for horizontal min/max reduction.

And even then, I had to completely skip reverting the changes to TTI and
CostModel because r312832 rewrote so much of this code. Plus, the cost
modeling changes aren implicated in the miscompile, so they should be
fine and will just not be used until this gets re-introduced.

llvm-svn: 313409
2017-09-15 22:23:27 +00:00
Zachary Turner ce92db13ea Resubmit "[lit] Force site configs to run before source-tree configs"
This is a resubmission of r313270.  It broke standalone builds of
compiler-rt because we were not correctly generating the llvm-lit
script in the standalone build directory.

The fixes incorporated here attempt to find llvm/utils/llvm-lit
from the source tree returned by llvm-config.  If present, it
will generate llvm-lit into the output directory.  Regardless,
the user can specify -DLLVM_EXTERNAL_LIT to point to a specific
lit.py on their file system.  This supports the use case of
someone installing lit via a package manager.  If it cannot find
a source tree, and -DLLVM_EXTERNAL_LIT is either unspecified or
invalid, then we print a warning that tests will not be able
to run.

Differential Revision: https://reviews.llvm.org/D37756

llvm-svn: 313407
2017-09-15 22:10:46 +00:00
Reid Kleckner 3a66c1cb58 [DebugInfo] Insert DW_OP_deref when spilling indirect DBG_VALUEs
Summary:
This comes up in optimized debug info for C++ programs that pass and
return objects indirectly by address. In these programs,
llvm.dbg.declare survives optimization, which causes us to emit indirect
DBG_VALUE instructions. The fast register allocator knows to insert
DW_OP_deref when spilling indirect DBG_VALUE instructions, but the
LiveDebugVariables did not until this change.

This fixes part of PR34513. I need to look into why this doesn't work at
-O0 and I'll send follow up patches to handle that.

Reviewers: aprantl, dblaikie, probinson

Subscribers: qcolombet, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D37911

llvm-svn: 313400
2017-09-15 21:54:38 +00:00
Reid Kleckner 9e6c309ef3 [DebugInfo] Add missing DW_OP_deref when an NRVO pointer is spilled
Summary:
Fixes PR34513.

Indirect DBG_VALUEs typically come from dbg.declares of non-trivially
copyable C++ objects that must be passed by address. We were already
handling the case where the virtual register gets allocated to a
physical register and is later spilled. That's what usually happens for
normal parameters that aren't NRVO variables: they usually appear in
physical register parameters, and are spilled later in the function,
which would correctly add deref.

NRVO variables are different because the dbg.declare can come much later
after earlier instructions cause the incoming virtual register to be
spilled.

Also, clean up this code. We only need to look at the first operand of a
DBG_VALUE, which eliminates the operand loop.

Reviewers: aprantl, dblaikie, probinson

Subscribers: MatzeB, qcolombet, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D37929

llvm-svn: 313399
2017-09-15 21:49:56 +00:00
Steven Wu ab211df5de [AutoUpgrade] Fix a compatibility issue with module flag
Summary:
After r304661, module flag to record objective-c image info section is
encoded without whitespaces after comma. The new name is equivalent to
the old one, except that when LTO a module built by old compiler and a
module built by a new compiler, it will fail with conflicting values.

Fix the issue by removing whitespaces in bitcode upgrade path.

rdar://problem/34416934

Reviewers: compnerd

Reviewed By: compnerd

Subscribers: mehdi_amini, hans, llvm-commits

Differential Revision: https://reviews.llvm.org/D37909

llvm-svn: 313398
2017-09-15 21:12:14 +00:00
Sam Clegg 759631c77b [WebAssembly] MC: Create wasm data segments based on MCSections
This means that we can honor -fdata-sections rather than
always creating a segment for each symbol.

It also allows for a followup change to add .init_array and friends.

Differential Revision: https://reviews.llvm.org/D37876

llvm-svn: 313395
2017-09-15 20:54:59 +00:00
Davide Italiano dee018c51f [ConstantFold] Return the correct type when folding a GEP with vector indices.
As Eli pointed out (and I got wrong in the first place), langref says: "The
getelementptr returns a vector of pointers, instead of a single address, when one
or more of its arguments is a vector. In such cases, all vector arguments should
have the same number of elements, and every scalar argument will be effectively
broadcast into a vector during address calculation."

Costantfold for gep doesn't really take in account this paragraph, returning a
pointer instead of a vector of pointer which triggers an assertion in RAUW,
as we're trying to replace values with mistmatching types.

Differential Revision:  https://reviews.llvm.org/D37928

llvm-svn: 313394
2017-09-15 20:53:05 +00:00
Vivek Pandya b5ab895e2a This patch fixes https://bugs.llvm.org/show_bug.cgi?id=32352
It enables OptimizationRemarkEmitter::allowExtraAnalysis and MachineOptimizationRemarkEmitter::allowExtraAnalysis to return true not only for -fsave-optimization-record but when specific remarks are requested with
command line options.
The diagnostic handler used to be callback now this patch adds a class
DiagnosticHandler. It has virtual method to provide custom diagnostic handler
and methods to control which particular remarks are enabled. 
However LLVM-C API users can still provide callback function for diagnostic handler.

llvm-svn: 313390
2017-09-15 20:10:09 +00:00
Vivek Pandya df8598dcc4 This reverts r313381
llvm-svn: 313387
2017-09-15 19:53:54 +00:00
Vivek Pandya 00d887447b This patch fixes https://bugs.llvm.org/show_bug.cgi?id=32352
It enables OptimizationRemarkEmitter::allowExtraAnalysis and MachineOptimizationRemarkEmitter::allowExtraAnalysis to return true not only for -fsave-optimization-record but when specific remarks are requested with
command line options.
The diagnostic handler used to be callback now this patch adds a class
DiagnosticHandler. It has virtual method to provide custom diagnostic handler
and methods to control which particular remarks are enabled. 
However LLVM-C API users can still provide callback function for diagnostic handler.

llvm-svn: 313382
2017-09-15 19:30:59 +00:00
Sam Clegg aff1c4df25 [WebAssembly] MC: Fix crash in getProvitionalValue on weak references
- Create helper function for resolving weak references.
- Add test that preproduces the crash.

Differential Revision: https://reviews.llvm.org/D37916

llvm-svn: 313381
2017-09-15 19:22:01 +00:00
Hans Wennborg 534bfbd3ba Revert r313343 "[X86] PR32755 : Improvement in CodeGen instruction selection for LEAs."
This caused PR34629: asserts firing when building Chromium. It also broke some
buildbots building test-suite as reported on the commit thread.

> Summary:
>    1/  Operand folding during complex pattern matching for LEAs has been
>        extended, such that it promotes Scale to accommodate similar operand
>        appearing in the DAG.
>        e.g.
>           T1 = A + B
>           T2 = T1 + 10
>           T3 = T2 + A
>        For above DAG rooted at T3, X86AddressMode will no look like
>           Base = B , Index = A , Scale = 2 , Disp = 10
>
>    2/  During OptimizeLEAPass down the pipeline factorization is now performed over LEAs
>        so that if there is an opportunity then complex LEAs (having 3 operands)
>        could be factored out.
>        e.g.
>           leal 1(%rax,%rcx,1), %rdx
>           leal 1(%rax,%rcx,2), %rcx
>        will be factored as following
>           leal 1(%rax,%rcx,1), %rdx
>           leal (%rdx,%rcx)   , %edx
>
>    3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops,
>       thus avoiding creation of any complex LEAs within a loop.
>
> Reviewers: lsaba, RKSimon, craig.topper, qcolombet
>
> Reviewed By: lsaba
>
> Subscribers: spatel, igorb, llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D35014

llvm-svn: 313376
2017-09-15 18:40:26 +00:00
Eric Beckmann 349746f044 Fix Bug 30978 by emitting cv file checksums.
Summary:
The checksums had already been placed in the IR, this patch allows
MCCodeView to actually write it out to an MCStreamer.

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D37157

llvm-svn: 313374
2017-09-15 18:20:28 +00:00
Craig Topper 7a183e2760 [X86] Prefer VPERMQ over VPERM2F128 for any unary shuffle, not just the ones that can be done with a insertf128
The early out for AVX2 in lowerV2X128VectorShuffle is positioned in a weird spot below some shuffle mask equivalency checks.

But I think we want to allow VPERMQ for any unary shuffle.

Differential Revision: https://reviews.llvm.org/D37893

llvm-svn: 313373
2017-09-15 18:11:13 +00:00
Craig Topper e0d724cf51 [X86] Don't create i64 constants on 32-bit targets when lowering v64i1 constant build vectors
When handling a v64i1 build vector of constants on 32-bit targets we were creating an illegal i64 constant that we then bitcasted back to v64i1. We need to instead create two 32-bit constants, bitcast them to v32i1 and concat the result. We should also take care to handle the halves being all zeros/ones after the split.

This patch splits the build vector and then recursively lowers the two pieces. This allows us to handle the all ones and all zeros cases with minimal effort. Ideally we'd just do the split and concat, and let lowering get called again on the new nodes, but getNode has special handling for CONCAT_VECTORS that reassembles the pieces back into a single BUILD_VECTOR. Hopefully the two temporary BUILD_VECTORS we had to create to do this that don't get returned don't cause any issues.

Fixes PR34605.

Differential Revision: https://reviews.llvm.org/D37858

llvm-svn: 313366
2017-09-15 17:09:03 +00:00
Craig Topper 143797eb89 [X86] Add isel pattern infrastructure to begin recognizing when we're inserting 0s into the upper portions of a vector register and the producing instruction as already produced the zeros.
Currently if we're inserting 0s into the upper elements of a vector register we insert an explicit move of the smaller register to implicitly zero the upper bits. But if we can prove that they are already zero we can skip that. This is based on a similar idea of what we do to avoid emitting explicit zero extends for GR32->GR64.

Unfortunately, this is harder for vector registers because there are several opcodes that don't have VEX equivalent instructions, but can write to XMM registers. Among these are SHA instructions and a MMX->XMM move. Bitcasts can also get in the way.

So for now I'm starting with explicitly allowing only VPMADDWD because we emit zeros in combineLoopMAddPattern. So that is placing extra instruction into the reduction loop.

I'd like to allow PSADBW as well after D37453, but that's currently blocked by a bitcast. We either need to peek through bitcasts or canonicalize insert_subvectors with zeros to remove bitcasts on the value being inserted.

Longer term we should probably have a cleanup pass that removes superfluous zeroing moves even when the producer is in another basic block which is something these isel tricks can't do. See PR32544.

Differential Revision: https://reviews.llvm.org/D37653

llvm-svn: 313365
2017-09-15 17:09:00 +00:00
Anna Thomas f34537dff8 [RuntimeUnroll] Add heuristic for unrolling multi-exit loop
Add a profitability heuristic to enable runtime unrolling of multi-exit
loop: There can be atmost two unique exit blocks for the loop and the
second exit block should be a deoptimizing block. Also, there can be one
other exiting block other than the latch exiting block. The reason for
the latter is so that we limit the number of branches in the unrolled
code to being at most the unroll factor.  Deoptimizing blocks are rarely
taken so these additional number of branches created due to the
unrolling are predictable, since one of their target is the deopt block.

Reviewers: apilipenko, reames, evstupac, mkuper

Subscribers: llvm-commits

Reviewed by: reames

Differential Revision: https://reviews.llvm.org/D35380

llvm-svn: 313363
2017-09-15 15:56:05 +00:00
Anna Thomas 512dde77ba [RuntimeUnrolling] Populate the VMap entry correctly when default generated through lookup
During runtime unrolling on loops with multiple exits, we update the
exit blocks with the correct phi values from both original and remainder
loop.
In this process, we lookup the VMap for the mapped incoming phi values,
but did not update the VMap if a default entry was generated in the VMap
during the lookup. This default value is generated when constants or
values outside the current loop are looked up.
This patch fixes the assertion failure when null entries are present in
the VMap because of this lookup. Added a testcase that showcases the
problem.

llvm-svn: 313358
2017-09-15 13:29:33 +00:00
Simon Pilgrim 905e79c4dc [X86][SSE] Add test cases vector for integer multiplies
Mainly inspired by PR34474 / D37896

llvm-svn: 313353
2017-09-15 11:17:42 +00:00
Ilya Biryukov d23faa843e Revert "[SLPVectorizer] Failure to beneficially vectorize 'copyable' elements in integer binary ops."
This reverts commit r313348.

Reason: it caused buildbot failures.
llvm-svn: 313352
2017-09-15 10:15:00 +00:00
Sjoerd Meijer 0c5ba21cbf [AArch64] allow v8f16 types when FullFP16 is supported
This adds support for allowing v8f16 vector types, thus avoiding conversions
from/to single precision for these types. This is a follow up patch of
commits r311154 and r312104, which added support for scalars and v4f16
types, respectively.

Differential Revision: https://reviews.llvm.org/D37802

llvm-svn: 313351
2017-09-15 09:24:48 +00:00
Dinar Temirbulatov e2358b53bc [SLPVectorizer] Failure to beneficially vectorize 'copyable' elements in integer binary ops.
Patch tries to improve vectorization of the following code:

void add1(int * __restrict dst, const int * __restrict src) {
  *dst++ = *src++;
  *dst++ = *src++ + 1;
  *dst++ = *src++ + 2;
  *dst++ = *src++ + 3;
}
Allows to vectorize even if the very first operation is not a binary add, but just a load.

Reviewers: spatel, mzolotukhin, mkuper, hfinkel, RKSimon, filcab, ABataev, davide

Subscribers: llvm-commits, RKSimon

Differential Revision: https://reviews.llvm.org/D28907

llvm-svn: 313348
2017-09-15 06:56:39 +00:00
Jatin Bhateja 908c8b37c2 [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs.
Summary:
   1/  Operand folding during complex pattern matching for LEAs has been
       extended, such that it promotes Scale to accommodate similar operand
       appearing in the DAG.
       e.g.
          T1 = A + B
          T2 = T1 + 10
          T3 = T2 + A
       For above DAG rooted at T3, X86AddressMode will no look like
          Base = B , Index = A , Scale = 2 , Disp = 10

   2/  During OptimizeLEAPass down the pipeline factorization is now performed over LEAs
       so that if there is an opportunity then complex LEAs (having 3 operands)
       could be factored out.
       e.g.
          leal 1(%rax,%rcx,1), %rdx
          leal 1(%rax,%rcx,2), %rcx
       will be factored as following
          leal 1(%rax,%rcx,1), %rdx
          leal (%rdx,%rcx)   , %edx

   3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops,
      thus avoiding creation of any complex LEAs within a loop.

Reviewers: lsaba, RKSimon, craig.topper, qcolombet

Reviewed By: lsaba

Subscribers: spatel, igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D35014

llvm-svn: 313343
2017-09-15 05:29:51 +00:00
Zachary Turner 83dcb68468 Revert "[lit] Force site configs to run before source-tree configs"
This patch is still breaking several multi-stage compiler-rt bots.
I already know what the fix is, but I want to get the bots green
for now and then try re-applying in the morning.

llvm-svn: 313335
2017-09-15 02:56:40 +00:00
Reid Kleckner 87288b98b6 [codeview] Use a type index of zero for static method "this" types
Otherwise VS won't show anything in the autos or watch window of static
methods.

llvm-svn: 313329
2017-09-15 00:59:07 +00:00
Zachary Turner 7f9c0ce1bb [lit] Revert "Add a lit.llvm module that all llvm projects can use"
This is breaking due to some changes I forgot to merge in, so I'm
temporarily reverting them until I can re-test that this works.

llvm-svn: 313328
2017-09-15 00:56:08 +00:00
Zachary Turner e5412b882a [lit] Add a lit.llvm module that all test suites can use.
To further reduce duplicate code, this patch introduces a module
that configs can simply import and get access to a lot of useful
functionality such as setting up paths, adding features that are
useful across all projects, and other utility-type functions.

For now this only updates llvm's suite to use this new library,
but subsequent patches will update other projects.

Differential Revision: https://reviews.llvm.org/D37778

llvm-svn: 313325
2017-09-15 00:34:00 +00:00
Sam Clegg 7c39594357 [WebAssembly] Use a separate wasm data segment for each global symbol
This is stepping stone towards honoring -fdata-sections
and letting the assembler decide how many wasm data
segments to create.

Differential Revision: https://reviews.llvm.org/D37834

llvm-svn: 313313
2017-09-14 23:07:53 +00:00
Matt Arsenault c317287fde AMDGPU: Fix violating constant bus restriction
You can't use madmk/madmk if it already uses an SGPR input.

llvm-svn: 313298
2017-09-14 20:54:29 +00:00
Guozhi Wei 21f8fad909 [TargetTransformInfo] Detect 0 latency instructions
For instructions that unlikely generate machine instructions, they should also have 0 latency.

Differential Revision: https://reviews.llvm.org/D37833

llvm-svn: 313288
2017-09-14 19:20:02 +00:00
Matt Arsenault 37ab4cf8b8 AMDGPU: Fix assert on alloca of array of struct
llvm-svn: 313282
2017-09-14 18:02:29 +00:00
Simon Dardis 6819a7a1cb [bpf] Fix test to always use little endian.
r313055 broke the big endian buildbots as the CHECK lines contained little
endian data but -triple bpf uses the host endian.

llvm-svn: 313281
2017-09-14 17:55:50 +00:00
Matt Arsenault defe371771 AMDGPU: Stop modifying SP in call sequences
Because the stack growth direction and addressing is done
in the same direction, modifying SP at the beginning of the
call sequence was incorrect. If we had a stack passed argument,
we would end up skipping that number of bytes before pushing
arguments, leaving unused/inconsistent space.

The callee creates fixed stack objects in its frame, so
the space necessary for these is already logically allocated
in the callee, so we just let the callee increment SP if
it really requires it.

llvm-svn: 313279
2017-09-14 17:37:40 +00:00
Dehao Chen 3a81f84d9a Invoke GetInlineCost for legality check before inline functions in SampleProfileLoader.
Summary: SampleProfileLoader inlines hot functions if it is inlined in the profiled binary. However, the inline needs to be guarded by legality check, otherwise it could lead to correctness issues.

Reviewers: eraman, davidxl

Reviewed By: eraman

Subscribers: vitalybuka, sanjoy, llvm-commits

Differential Revision: https://reviews.llvm.org/D37779

llvm-svn: 313277
2017-09-14 17:29:56 +00:00
Simon Dardis 55e446737f [mips] Implement the 'dext' aliases and it's disassembly alias.
The other members of the dext family of instructions (dextm, dextu) are
traditionally handled by the assembler selecting the right variant of
'dext' depending on the values of the position and size operands.

When these instructions are disassembled, rather than reporting the
actual instruction, an equivalent aliased form of 'dext' is generated
and is reported. This is to mimic the behaviour of binutils.

Reviewers: slthakur, nitesh.jain, atanasyan

Differential Revision: https://reviews.llvm.org/D34887

llvm-svn: 313276
2017-09-14 17:27:53 +00:00
Adrian Prantl 82959cd5d8 Adapt more testcases for llvm-dwarfdump changes.
llvm-svn: 313275
2017-09-14 17:27:03 +00:00
Matt Arsenault 6efd082c01 AMDGPU: Make frame register caller preserved
Using SplitCSR for the frame register was very broken. Often
the copies in the prolog and epilog were optimized out, in addition
to them being inserted after the true prolog where the FP
was clobbered.

I have a hacky solution which works that continues to use
split CSR, but for now this is simpler and will get to working
programs.

llvm-svn: 313274
2017-09-14 17:14:57 +00:00
Adrian Prantl 5fd3d49bc4 llvm-dwarfdump: support dumping static archives.
llvm-svn: 313272
2017-09-14 17:01:53 +00:00
Krzysztof Parzyszek 779d98e1c0 TableGen support for parameterized register class information
This replaces TableGen's type inference to operate on parameterized
types instead of MVTs, and as a consequence, some interfaces have
changed:
- Uses of MVTs are replaced by ValueTypeByHwMode.
- EEVT::TypeSet is replaced by TypeSetByHwMode.

This affects the way that types and type sets are printed, and the
tests relying on that have been updated.

There are certain users of the inferred types outside of TableGen
itself, namely FastISel and GlobalISel. For those users, the way
that the types are accessed have changed. For typical scenarios,
these replacements can be used:
- TreePatternNode::getType(ResNo) -> getSimpleType(ResNo)
- TreePatternNode::hasTypeSet(ResNo) -> hasConcreteType(ResNo)
- TypeSet::isConcrete -> TypeSetByHwMode::isValueTypeByHwMode(false)

For more information, please refer to the review page.

Differential Revision: https://reviews.llvm.org/D31951

llvm-svn: 313271
2017-09-14 16:56:21 +00:00
Zachary Turner a0e55b6403 [lit] Force site configs to be run before source-tree configs
This patch simplifies LLVM's lit infrastructure by enforcing an ordering
that a site config is always run before a source-tree config.

A significant amount of the complexity from lit config files arises from
the fact that inside of a source-tree config file, we don't yet know if
the site config has been run.  However it is *always* required to run
a site config first, because it passes various variables down through
CMake that the main config depends on.  As a result, every config
file has to do a bunch of magic to try to reverse-engineer the location
of the site config file if they detect (heuristically) that the site
config file has not yet been run.

This patch solves the problem by emitting a mapping from source tree
config file to binary tree site config file in llvm-lit.py. Then, during
discovery when we find a config file, we check to see if we have a
target mapping for it, and if so we use that instead.

This mechanism is generic enough that it does not affect external users
of lit. They will just not have a config mapping defined, and everything
will work as normal.

On the other hand, for us it allows us to make many simplifications:

* We are guaranteed that a site config will be executed first
* Inside of a main config, we no longer have to assume that attributes
  might not be present and use getattr everywhere.
* We no longer have to pass parameters such as --param llvm_site_config=<path>
  on the command line.
* It is future-proof, meaning you don't have to edit llvm-lit.in to add
  support for new projects.
* All of the duplicated logic of trying various fallback mechanisms of
  finding a site config from the main config are now gone.

One potentially noteworthy thing that was required to implement this
change is that whereas the ninja check targets previously used the first
method to spawn lit, they now use the second. In particular, you can no
longer run lit.py against the source tree while specifying the various
`foo_site_config=<path>` parameters.  Instead, you need to run
llvm-lit.py.

Differential Revision: https://reviews.llvm.org/D37756

llvm-svn: 313270
2017-09-14 16:47:58 +00:00
Krzysztof Parzyszek 6ca02b25a7 [IfConversion] More simple, correct dead/kill liveness handling
Patch by Jesper Antonsson.

Differential Revision: https://reviews.llvm.org/D37611

llvm-svn: 313268
2017-09-14 15:53:11 +00:00
Simon Dardis 6f83ae38a3 [mips] Implement the 'dins' aliases.
Traditionally GAS has provided automatic selection between dins, dinsm and
dinsu. Binutils also disassembles all instructions in that family as 'dins'
rather than the actual instruction.

Reviewers: slthakur

Differential Revision: https://reviews.llvm.org/D34877

llvm-svn: 313267
2017-09-14 15:17:50 +00:00
Sanjay Patel 0d4fd5b668 [InstSimplify] fold sdiv/srem based on compare of dividend and divisor
This should bring signed div/rem analysis up to the same level as unsigned. 
We use icmp simplification to determine when the divisor is known greater than the dividend.

Each positive test is followed by a negative test to show that we're not overstepping the boundaries of the known bits.
There are extra tests for the signed-min-value special cases.

Alive proofs:
http://rise4fun.com/Alive/WI5

Differential Revision: https://reviews.llvm.org/D37713

llvm-svn: 313264
2017-09-14 14:59:07 +00:00
Chad Rosier 4d6d74e236 Add newline to end of test file. NFC.
llvm-svn: 313263
2017-09-14 14:48:59 +00:00
Simon Pilgrim 0b220c7524 [X86] Regenerate test. NFCI.
llvm-svn: 313259
2017-09-14 13:00:27 +00:00
Simon Pilgrim 47d8f62472 Regenerate test (broadcast comment). NFCI.
llvm-svn: 313258
2017-09-14 12:41:19 +00:00
Ayman Musa ab68449c53 [X86] When applying the shuffle-to-zero-extend transformation on floating point, bitcast to integer first.
Fix issue described in PR34577.

Differential Revision: https://reviews.llvm.org/D37803

llvm-svn: 313256
2017-09-14 12:06:38 +00:00
Simon Dardis 28365b33ad [mips] Pick the right variant of DINS upfront and enable target instruction verification
This patch complements D16810 "[mips] Make isel select the correct DEXT variant
up front.". Now ISel picks the right variant of DINS, so now there is no need
to replace DINS with the appropriate variant during
MipsMCCodeEmitter::encodeInstruction().

This patch also enables target specific instruction verification for ins, dins,
dinsm, dinsu, ext, dext, dextm, dextu. These instructions have constraints that
are checked when generating MipsISD::Ins and MipsISD::Ext nodes, but these
constraints are not checked during instruction selection. Adding machine
verification should catch outstanding cases.

Finally, correct a bug that instruction verification uncovered, where the
position operand of a DINSU generated during lowering was being silently
and accidently corrected to the correct value.

Reviewers: slthakur

Differential Revision: https://reviews.llvm.org/D34809

llvm-svn: 313254
2017-09-14 10:58:00 +00:00
Simon Pilgrim 8bd2d8780a [DAGCombine] (shl (or x, c1), c2) -> (or (shl x, c2), c1 << c2)
We already have a combine for this pattern when the input to shl is add, so we just need to enable the transformation when the input is or.

Original patch by @tstellar

Differential Revision: https://reviews.llvm.org/D19325

llvm-svn: 313251
2017-09-14 10:38:30 +00:00
Simon Pilgrim 337b2d007a Fix line endings. NFCI.
llvm-svn: 313247
2017-09-14 10:30:54 +00:00
Simon Pilgrim 11e2969a35 Fix line endings. NFCI.
llvm-svn: 313246
2017-09-14 10:30:22 +00:00
Dean Michael Berris d7b3add7c4 [XRay][DebugInfo] Update the test to use a specific target
Follow-up to D37791.

llvm-svn: 313243
2017-09-14 09:58:25 +00:00
Chandler Carruth 7376ae88eb [PM/CGSCC] Teach the CGSCC pass manager components to gracefully handle
invalidated SCCs even when we do not have an updated SCC to redirect
towards.

This comes up in a fairly subtle and surprising circumstance: we need to
have a connected but internal node in the call graph which later becomes
a disconnected island, and then gets deleted. All of this needs to
happen mid-CGSCC walk. Because it is disconnected, we have no way of
computing a new "current" SCC when it gets deleted. Instead, we need to
explicitly check for a deleted "current" SCC and bail out of the current
CGSCC step. This will bubble all the way up to the post-order walk and
then resume correctly.

I've included minimal tests for this bug. The specific behavior
matches something we've seen in the wild with the new PM combined with
ThinLTO and sample PGO, but I've not yet confirmed whether this is the
only issue there.

llvm-svn: 313242
2017-09-14 08:33:57 +00:00
Dean Michael Berris b21f580ddc [XRay][DebugInfo] Remove -debug-compile from test invocation of llc
This breaks bootstrap builds, and is actually unnecessary. Tested
locally and it seems we can remove -debug-comile just fine.

Follow-up to D37791.

llvm-svn: 313238
2017-09-14 07:54:54 +00:00
Alon Kom 682cfc1d4c [LV] Fix maximum legal VF calculation
This patch fixes pr34283, which exposed that the computation of
maximum legal width for vectorization was wrong, because it relied
on MaxInterleaveFactor to obtain the maximum stride used in the loop,
however not all strided accesses in the loop have an interleave-group
associated with them.
Instead of recording the maximum stride in the loop, which can be over
conservative (e.g. if the access with the maximum stride is not involved
in the dependence limitation), this patch tracks the actual maximum legal
width imposed by accesses that are involved in dependencies.

Differential Revision: https://reviews.llvm.org/D37507

llvm-svn: 313237
2017-09-14 07:40:02 +00:00
Dean Michael Berris 01fd7c8bd4 [XRay][CodeGen] Use the current function symbol as the associated symbol for the instrumentation map
Summary:
XRay had been assuming that the previous section is the "text" section
of the function when lowering the instrumentation map. Unfortunately
this is not a safe assumption, because we may be coming from lowering
debug type information for the function being lowered.

This fixes an issue with combining -gsplit-dwarf, -generate-type-units,
-debug-compile and -fxray-instrument for sole member functions. When the
split dwarf section is stripped, we're left with references from the
xray_instr_map to the debug section. The change now uses the function's
symbol instead of the previous section's start symbol.

We found the bug while attempting to strip the split debug sections off
an XRay-instrumented object file, which had a peculiar edge-case for
single-function classes where the single function is being lowered.
Because XRay had assocaited the instrumentation map for a function to
the debug types section instead of the function's section, the objcopy
call will fail due to the misplaced reference from the xray_instr_map
section.

Reviewers: pcc, dblaikie, echristo

Subscribers: llvm-commits, aprantl

Differential Revision: https://reviews.llvm.org/D37791

llvm-svn: 313233
2017-09-14 07:08:23 +00:00
Vitaly Buka 48624d327a Revert "Invoke GetInlineCost for legality check before inline functions in SampleProfileLoader."
Patch introduced uninitialized value.

This reverts commit r313195.

llvm-svn: 313230
2017-09-14 05:40:33 +00:00
Peter Collingbourne cfbd089237 Reland r313157, "ThinLTO: Correctly follow aliasee references when dead stripping." which was reverted in r313222.
This reland includes a fix for the LowerTypeTests pass so that it
looks past aliases when determining which type identifiers are live.

Differential Revision: https://reviews.llvm.org/D37842

llvm-svn: 313229
2017-09-14 05:02:59 +00:00
Hans Wennborg ae050afeb9 Revert r313157 "ThinLTO: Correctly follow aliasee references when dead stripping."
This broke Chromium's CFI build; see crbug.com/765004.

> We were previously handling aliases during dead stripping by adding
> the aliased global's "original name" GUID to the worklist. This will
> lead to incorrect behaviour if the global has local linkage because
> the original name GUID will not correspond to the global's GUID in
> the summary.
>
> Because an alias is just another name for the global that it
> references, there is no need to mark the referenced global as used,
> or to follow references from any other copies of the global. So all
> we need to do is to follow references from the aliasee's summary
> instead of the alias.
>
> Differential Revision: https://reviews.llvm.org/D37789

llvm-svn: 313222
2017-09-14 00:40:14 +00:00
NAKAMURA Takumi 38fac5905e Move llvm/test/CodeGen/X86/clear-liverange-spillreg.mir to SystemZ. It was in wrong place.
llvm-svn: 313218
2017-09-14 00:03:23 +00:00
Matt Arsenault ecb43ef1bc AMDGPU: Don't spill SP reg like a normal CSR
llvm-svn: 313217
2017-09-13 23:47:01 +00:00
Hans Wennborg 06e2a384c2 Revert r312719 "[MachineCombiner] Update instruction depths incrementally for large BBs."
This caused PR34596.

> [MachineCombiner] Update instruction depths incrementally for large BBs.
>
> Summary:
> For large basic blocks with lots of combinable instructions, the
> MachineTraceMetrics computations in MachineCombiner can dominate the compile
> time, as computing the trace information is quadratic in the number of
> instructions in a BB and it's relevant successors/predecessors.
>
> In most cases, knowing the instruction depth should be enough to make
> combination decisions. As we already iterate over all instructions in a basic
> block, the instruction depth can be computed incrementally. This reduces the
> cost of machine-combine drastically in cases where lots of instructions
> are combined. The major drawback is that AFAIK, computing the critical path
> length cannot be done incrementally. Therefore we only compute
> instruction depths incrementally, for basic blocks with more
> instructions than inc_threshold. The -machine-combiner-inc-threshold
> option can be used to set the threshold and allows for easier
> experimenting and checking if using incremental updates for all basic
> blocks has any impact on the performance.
>
> Reviewers: sanjoy, Gerolf, MatzeB, efriedma, fhahn
>
> Reviewed By: fhahn
>
> Subscribers: kiranchandramohan, javed.absar, efriedma, llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D36619

llvm-svn: 313213
2017-09-13 23:23:09 +00:00
Adrian Prantl c113801489 Update testcase that was XFAILed on Darwin for llvm-dwarfdump change.
llvm-svn: 313209
2017-09-13 22:30:43 +00:00
Stanislav Mekhanoshin 7fe9a5d9b4 Allow target to decide when to cluster loads/stores in misched
MachineScheduler when clustering loads or stores checks if base
pointers point to the same memory. This check is done through
comparison of base registers of two memory instructions. This
works fine when instructions have separate offset operand. If
they require a full calculated pointer such instructions can
never be clustered according to such logic.

Changed shouldClusterMemOps to accept base registers as well and
let it decide what to do about it.

Differential Revision: https://reviews.llvm.org/D37698

llvm-svn: 313208
2017-09-13 22:20:47 +00:00
Adrian Prantl 3ae35eb56b llvm-dwarfdump: automatically dump both regular and .dwo variant of sections
Since users typically don't really care about the .dwo / non.dwo
distinction, this patch makes it so dwarfdump --debug-<info,...> dumps
.debug_info and (if available) also .debug_info.dwo. This simplifies
the command line interface (I've removed all dwo-specific dump
options) and makes the tool friendlier to use.

Differential Revision: https://reviews.llvm.org/D37771

llvm-svn: 313207
2017-09-13 22:09:01 +00:00
Reid Kleckner 89af112cf5 [codeview] VLAs and unsized arrays should use a size of zero
Previously we used a size of '1' for VLAs because we weren't sure what
MSVC did. However, MSVC does support declaring an array without a size,
for which it emits an array type with a size of zero. Clang emits the
same DI metadata for VLAs and arrays without bound, so we would describe
arrays without bound as having one element. This lead to Microsoft
debuggers only printing a single element.

Emitting a size of zero appears to cause these debuggers to search the
symbol information to find a definition of the variable with accurate
array bounds.

Fixes http://crbug.com/763580

llvm-svn: 313203
2017-09-13 21:54:20 +00:00
Dehao Chen be707f2930 Update the early_inline test to explicitly add attribute for all functions. (NFC)
llvm-svn: 313202
2017-09-13 21:49:36 +00:00
Wei Mi a2a135a01c Add a comment for the test. NFC.
llvm-svn: 313199
2017-09-13 21:47:13 +00:00
Wei Mi c0d066468e [RegAlloc] Keep a copy of live interval for the spilled vregs in HoistSpillHelper.
This is to fix PR34502. After rL311401, the live range of spilled vreg will be
cleared. HoistSpill need to use the live range of the original vreg before splitting
to know the moving range of the spills. The patch saves a copy of live interval for
the spilled vreg inside of HoistSpillHelper.

Differential Revision: https://reviews.llvm.org/D37578

llvm-svn: 313197
2017-09-13 21:41:30 +00:00
Vedant Kumar 4c76c45fed [Bitcode] Add a compatibility test for 5.0.0 bitcode
llvm-svn: 313196
2017-09-13 21:40:59 +00:00
Dehao Chen 15c86ef970 Invoke GetInlineCost for legality check before inline functions in SampleProfileLoader.
Summary: SampleProfileLoader inlines hot functions if it is inlined in the profiled binary. However, the inline needs to be guarded by legality check, otherwise it could lead to correctness issues.

Reviewers: eraman, davidxl

Reviewed By: eraman

Subscribers: sanjoy, llvm-commits

Differential Revision: https://reviews.llvm.org/D37779

llvm-svn: 313195
2017-09-13 21:22:55 +00:00
Kevin Enderby 44550600f2 Fix a crash in llvm-nm for a bad Mach-O file that has an N_SECT type symbol and a zero n_sect value.
The code in llvm-nm for Mach-O files to determine the section type for an
N_SECT type symbol it will call getSymbolSection() and check for the error,
but in the case the n_sect value is zero it will return section_end() (aka nullptr).
And the code was using that and crashing instead of just returning a ’s’ for a
section or printing (?,?) as it would if getSymbolSection() returned an error.

rdar://33136604

llvm-svn: 313193
2017-09-13 21:01:49 +00:00
Adrian McCarthy d91bf3998f Mark static member functions as static in CodeViewDebug
Summary:
To improve CodeView quality for static member functions, we need to make the
static explicit.  In addition to a small change in LLVM's CodeViewDebug to
return the appropriate MethodKind, this requires a small change in Clang to
note the staticness in the debug info metadata.

Subscribers: aprantl, hiraditya

Differential Revision: https://reviews.llvm.org/D37715

llvm-svn: 313192
2017-09-13 20:53:55 +00:00
Anna Thomas 19529f75b9 [LV] Avoid computing the register usage for default VF. NFC
These are changes to reduce redundant computations when calculating a
feasible vectorization factor:
1. early return when target has no vector registers
2. don't compute register usage for the default VF.

Suggested during review for D37702.

llvm-svn: 313176
2017-09-13 19:35:45 +00:00
Adrian Prantl 3dcd122151 llvm-dwarfdump: support dumping UUIDs of Mach-O binaries.
This is a feature supported by Darwin dwarfdump. UUIDs are used to
associate executables with their .dSYM bundles.

llvm-svn: 313165
2017-09-13 18:22:59 +00:00
Sanjay Patel dc3002de9d [InstSimplify] regenerate checks; NFC
llvm-svn: 313161
2017-09-13 17:39:39 +00:00
Peter Collingbourne d067c8ed59 ThinLTO: Correctly follow aliasee references when dead stripping.
We were previously handling aliases during dead stripping by adding
the aliased global's "original name" GUID to the worklist. This will
lead to incorrect behaviour if the global has local linkage because
the original name GUID will not correspond to the global's GUID in
the summary.

Because an alias is just another name for the global that it
references, there is no need to mark the referenced global as used,
or to follow references from any other copies of the global. So all
we need to do is to follow references from the aliasee's summary
instead of the alias.

Differential Revision: https://reviews.llvm.org/D37789

llvm-svn: 313157
2017-09-13 17:09:20 +00:00
Teresa Johnson b1bb468aa9 Fix bot failures by requiring x86 target in new test
The test added in r313151 requires a target triple since it is
running through code generation. Fix bot failures by requiring
an x86 target.

llvm-svn: 313153
2017-09-13 15:35:35 +00:00
Teresa Johnson 1958083d35 [ThinLTO] For SamplePGO, need to handle ICP targets consistently in thin link
Summary:
SamplePGO indirect call profiles record the target as the original GUID
for statics. The importer had special handling to map to the normal GUID
in that case. The dead global analysis needs the same treatment or
inconsistencies arise, resulting in linker unsats due to some dead
symbols being exported and kept, leaving in references to other dead
symbols that are removed.

This can happen when a SamplePGO profile collected by one binary is used
for a different binary, so the indirect call profiles may not accurately
reflect live targets.

Reviewers: danielcdh

Subscribers: mehdi_amini, inglorion, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D37783

llvm-svn: 313151
2017-09-13 15:16:38 +00:00
Petar Jovanovic 50e068158b [mips] correct operand range for DINSM instruction
This patch corrects the definition of the DINSM instruction.
Specification for DINSM instruction for Mips64 says that size operand should
be 2 <= size <= 64, but it is defined as uimm5_inssize_plus1 which gives
range of 1 .. 32.

Patch by Aleksandar Beserminji.

Differential Revision: https://reviews.llvm.org/D37683

llvm-svn: 313149
2017-09-13 14:09:13 +00:00
Stefan Pintilie dff606ec3e [Power9] Add missing instructions: extswsli, popcntb
Added the following P9 instructions: extswsli, extswsli., popcntb

Differential Revision: https://reviews.llvm.org/D37342

llvm-svn: 313147
2017-09-13 14:05:27 +00:00
Jonas Devlieghere 81f5abe1ad [MachO] Prevent heap overflow when load command extends past EOF
This patch fixes a heap-buffer-overflow when a malformed Mach-O has a
load command who's size extends past the end of the binary.

Fixes: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3225

Differential revision: https://reviews.llvm.org/D37439

llvm-svn: 313145
2017-09-13 13:43:01 +00:00
Gadi Haber 35f4d7ca46 [X86][Skylake] Replacing -mcpu=skx by -mattr in a codegen test. NFC.
NFC.
Replacing -mcpu=skx by -mattr in the run command of the codegen test: avx512-gather-scatter-intrin.ll.

Reviewers: delena
Revision: https://reviews.llvm.org/D37799
llvm-svn: 313144
2017-09-13 12:39:18 +00:00
Simon Pilgrim f613a45bf3 [X86][FMA4] Test FMA4 commutation with repeated ops as well as FMA3
llvm-svn: 313143
2017-09-13 11:21:38 +00:00
Simon Pilgrim 322fc53725 [X86][FMA] Added *213 fma instructions to scheduling tests
Annoyingly the 132/231 variants are pretty tricky to create when you need to due to weak FMA commutation patterns.

llvm-svn: 313142
2017-09-13 11:12:56 +00:00
Jonas Devlieghere 27476ce24b [dwarfdump] Rename Brief to Verbose in DIDumpOptions
This patches renames "brief" to "verbose" in de DIDumpOptions and
inverts the logic to match the new behavior where brief is the default.
Changing the default value uncovered some bugs related to the
DIDumpOptions not being propagated and have been fixed as well.

Differential revision: https://reviews.llvm.org/D37745

llvm-svn: 313139
2017-09-13 09:43:05 +00:00
Gadi Haber a753080d1e [X86][Skylake][KNL] Updating code gen regression test to use the KNL and SKYLAKE prefixes. NFC.
NFC.
Updating the code gen regression test bmi2-schedule.ll to use the KNL and SKYLAKE prefixes for the run commands that use the knl and Skylake mcpu options.
The fix is in preparation for a large patch of adding all SKL scheduling information.

Reviewers: delena, zvi, RKSimon
Revision: https://reviews.llvm.org/D37796
llvm-svn: 313138
2017-09-13 09:28:25 +00:00
Gadi Haber 04de4ce9e2 [X86][Skylake][KNL] Updating code gen regression test to use the KNL and SKYLAKE prefixes. NFC.
NFC.
Updating the code gen regression test bmi2-schedule.ll to use the KNL and SKYLAKE prefixes for the run commands that use the knl and Skylake mcpu options.
The fix is in preparation for a large patch of adding all SKL scheduling information.

Reviewers: delena, zvi
Revision: https://reviews.llvm.org/D37796
llvm-svn: 313137
2017-09-13 09:28:18 +00:00
Gadi Haber fb47ab7cdd NFC.
Updating codegen test bmi2-schedule.ll to use the SKYLAKE and KNL prefix as preparatipn for an upcoming patch to add all SKL scheduling information.

llvm-svn: 313136
2017-09-13 09:27:39 +00:00
Igor Breger 5c721199dd [GlobalISel][X86] support G_FPEXT operation.
Summary: Support G_FPEXT operation. Selection done via TableGen'erated code.

Reviewers: zvi, guyblank, aymanmus, m_zuckerman

Reviewed By: zvi

Subscribers: rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D34816

llvm-svn: 313135
2017-09-13 09:05:23 +00:00
Uriel Korach 5d5da5f531 [X86] [PATCH] [intrinsics] Lowering X86 ABS intrinsics to IR. (llvm)
This patch, together with a matching clang patch (https://reviews.llvm.org/D37694), implements the lowering of X86 ABS intrinsics to IR.

differential revision: https://reviews.llvm.org/D37693.

llvm-svn: 313134
2017-09-13 09:02:36 +00:00
Uriel Korach 53872a2d89 [X86] Add explicit mc-encoding checks to X86/viabs.ll. NFC.
Add explicit mc-encoding checks showing that the AVX512VL ABS intrinsics are actually mapped to EVEX encoding.
This is a pre-commit for a soon to come patch which will lower x86 target specific ABS intrinsics to IR.

Differential Revision: https://reviews.llvm.org/D37688

llvm-svn: 313131
2017-09-13 08:33:55 +00:00
Craig Topper 2b6bfda561 [X86] Make sure we emit a SUBREG_TO_REG after the MOV32ri when creating a BEXTR64rr instruction from a shift/and pair.
Fixes PR34589.

llvm-svn: 313126
2017-09-13 07:53:21 +00:00
Elena Demikhovsky 6cab129464 [X86 CodeGen] Optimization of ZeroExtendLoad for v2i8 vector
Load with zero-extend and sign-extend from v2i8 to v2i32 is "Legal" since SSE4.1 and may be performed using PMOVZXBD , PMOVSXBD instructions.

llvm-svn: 313121
2017-09-13 06:40:26 +00:00
Ayal Zaks e2a8c0758f [LV] Fix PR34523 - avoid generating redundant selects
When converting a PHI into a series of 'select' instructions to combine the
incoming values together according their edge masks, initialize the first
value to the incoming value In0 of the first predecessor, instead of
generating a redundant assignment 'select(Cond[0], In0, In0)'. The latter
fails when the Cond[0] mask is null, representing a full mask, which can
happen only when there's a single incoming value.

No functional changes intended nor expected other than surviving null Cond[0]'s.

This fix follows D35725, which introduced using null to represent full masks.

Differential Revision: https://reviews.llvm.org/D37619

llvm-svn: 313119
2017-09-13 06:28:37 +00:00
Aditya Kumar dfa8741c96 [GVNHoist] Factor out reachability to search for anticipable instructions quickly
Factor out the reachability such that multiple queries to find reachability of values are fast. This is based on finding
the ANTIC points
in the CFG which do not change during hoisting. The ANTIC points are basically the dominance-frontiers in the inverse
graph. So we introduce a data structure (CHI nodes)
to keep track of values flowing out of a basic block. We only do this for values with multiple occurrences in the
function as they are the potential hoistable candidates.

This patch allows us to hoist instructions to a basic block with >2 successors, as well as deal with infinite loops in a
trivial way.
Relevant test cases are added to show the functionality as well as regression fixes from PR32821.

Regression from previous GVNHoist:
We do not hoist fully redundant expressions because fully redundant expressions are already handled by NewGVN

Differential Revision: https://reviews.llvm.org/D35918
Reviewers: dberlin, sebpop, gberry,

llvm-svn: 313116
2017-09-13 05:28:03 +00:00
Petr Hosek c113577d15 [llvm-objcopy] Add e_machine validity check for reserved section indexes
As discussed on llvm-commits it was decided it would be best to check
e_machine before declaring that a reserved section index is valid. The
only special e_machine value that matters here is EM_HEXAGON. This
change adds a special check for EM_HEXAGON.

Patch by Jake Ehrlich

Differential Revision: https://reviews.llvm.org/D37767

llvm-svn: 313114
2017-09-13 03:04:50 +00:00
Reid Kleckner feda6a0496 Fix dwarfdump cmdline test on Windows
llvm-svn: 313110
2017-09-13 01:50:27 +00:00
Reid Kleckner 8a1cd91016 [InstCombine] Add a flag to disable LowerDbgDeclare
Summary:
This should improve optimized debug info for address-taken variables at
the cost of inaccurate debug info in some situations.

We patched this into clang and deployed this change to Chromium
developers, and this significantly improved debuggability of optimized
code. The long-term solution to PR34136 seems more and more like it's
going to take a while, so I would like to commit this change under a
flag so that it can be used as a stop-gap measure.

This flag should really help so for C++ aggregates like std::string and
std::vector, which are typically address-taken, even after inlining, and
cannot be SROA-ed.

Reviewers: aprantl, dblaikie, probinson, dberlin

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D36596

llvm-svn: 313108
2017-09-13 01:43:25 +00:00
Derek Schuff a519fe5a37 [WebAssembly] Add sign extend instructions from atomics proposal
Select them from ISD::SIGN_EXTEND_INREG

Differential Revision: https://reviews.llvm.org/D37603

remove spurious change

llvm-svn: 313101
2017-09-13 00:29:06 +00:00
Peter Collingbourne 8b30b96d2e Add Linux target triple to hopefully fix Mac bots.
llvm-svn: 313093
2017-09-12 23:40:19 +00:00
Sanjay Patel ce2da1e6e4 [SimplifyCFG] update test comments; NFC
llvm-svn: 313090
2017-09-12 23:28:11 +00:00
Sanjay Patel 659279450e [x86] eliminate unnecessary vector compare for AVX masked store
The masked store instruction only cares about the sign-bit of each mask element,
so the compare s<0 isn't needed.

As noted in PR11210:
https://bugs.llvm.org/show_bug.cgi?id=11210
...fixing this should allow us to eliminate x86-specific masked store intrinsics in IR.
(Although more testing will be needed to confirm that.)

I filed a bug to track improvements for AVX512:
https://bugs.llvm.org/show_bug.cgi?id=34584

Differential Revision: https://reviews.llvm.org/D37446

llvm-svn: 313089
2017-09-12 23:24:05 +00:00
Adrian Prantl 7c5b45d330 Clean up the --help output of llvm-dwarfdump by hiding irrelevant options.
llvm-svn: 313085
2017-09-12 22:32:53 +00:00
Peter Collingbourne 876da0294a Remove -generate-dwarf-pub-sections flag.
This flag is unnecessary for testing because we can get the coverage
we need by adjusting CU attributes.

Differential Revision: https://reviews.llvm.org/D37725

llvm-svn: 313079
2017-09-12 21:50:55 +00:00
Peter Collingbourne b52e23669c IR: Represent -ggnu-pubnames with a flag on the DICompileUnit.
This allows the flag to be persisted through to LTO.

Differential Revision: https://reviews.llvm.org/D37655

llvm-svn: 313078
2017-09-12 21:50:41 +00:00
Ahmed Bougacha 106dd035a8 [AArch64][GlobalISel] Select all fpexts.
Tablegen already can select these: mark them as legal, remove the
c++ code, and add tests for all types.

llvm-svn: 313074
2017-09-12 21:04:11 +00:00
Ahmed Bougacha a7aa2a9fb1 [AArch64][GlobalISel] Select all fptruncs.
We already support these in tablegen, but we're matching the wrong
operator (libm ftrunc).  Fix that.

While there, drop the c++ code, support COPYs of FPR16, and add tests
for the other types.

llvm-svn: 313073
2017-09-12 21:04:10 +00:00
Lei Huang 34e6621724 Update branch coalescing to be a PowerPC specific pass
Implementing this pass as a PowerPC specific pass.  Branch coalescing utilizes
the analyzeBranch method which currently does not include any implicit operands.
This is not an issue on PPC but must be handled on other targets.

Pass is currently off by default. Enabled via -enable-ppc-branch-coalesce.

Differential Revision : https: // reviews.llvm.org/D32776

llvm-svn: 313061
2017-09-12 18:39:11 +00:00
Robert Lougher 51529eb0c2 Revert "[DWARF] Incorrect prologue end line record."
This reverts commit r313047 as it is causing buildbot failure (lldb inline
stepping tests).

llvm-svn: 313057
2017-09-12 18:23:15 +00:00
Yonghong Song 06ff655e59 bpf: Add BPF AsmParser support in LLVM
Reviewed-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
llvm-svn: 313055
2017-09-12 17:55:23 +00:00
Craig Topper 958106d0f1 [X86] Move matching of (and (srl/sra, C), (1<<C) - 1) to BEXTR/BEXTRI instruction to custom isel
Recognizing this pattern during DAG combine hides information about the 'and' and the shift from other combines. I think it should be recognized at isel so its as late as possible. But it can't be done with table based isel because you need to be able to look at both immediates. This patch moves it to custom isel in X86ISelDAGToDAG.cpp.

This does break a couple tests in tbm_patterns because we are now emitting an and_flag node or (cmp and, 0) that we dont' recognize yet. We already had this problem for several other TBM patterns so I think this fine and we can address of them together.

I've also fixed a bug where the combine to BEXTR was preventing us from using a trick of zero extending AH to handle extracts of bits 15:8. We might still want to use BEXTR if it enables load folding. But honestly I hope we narrowed the load instead before got to isel.

I think we should probably also support matching BEXTR from (srl/srl (and mask << C), C). But that should be a different patch.

Differential Revision: https://reviews.llvm.org/D37592

llvm-svn: 313054
2017-09-12 17:40:25 +00:00
Elena Demikhovsky 18ff5c1374 Added "zext" from v2i8 to v2i32. In the next patch I'll optimize the sequence.
llvm-svn: 313052
2017-09-12 17:27:53 +00:00
Robert Lougher f696a22d3c [DWARF] Incorrect prologue end line record.
A prologue-end line record is emitted with an incorrect associated address,
which causes a debugger to show the beginning of function body to be inside
the prologue.

Patch written by Carlos Alberto Enciso.

Differential Revision: https://reviews.llvm.org/D37625

llvm-svn: 313047
2017-09-12 16:35:25 +00:00
Anna Thomas 9f1be02fa3 [LV] Clamp the VF to the trip count
Summary:
When the MaxVectorSize > ConstantTripCount, we should just clamp the
vectorization factor to be the ConstantTripCount.
This vectorizes loops where the TinyTripCountThreshold >= TripCount < MaxVF.

Earlier we were finding the maximum vector width, which could be greater than
the trip count itself. The Loop vectorizer does all the work for generating a
vectorizable loop, but in the end we would always choose the scalar loop (since
the VF > trip count). This allows us to choose the VF keeping in mind the trip
count if available.

This is a fix on top of rL312472.

Reviewers: Ayal, zvi, hfinkel, dneilson

Reviewed by: Ayal

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37702

llvm-svn: 313046
2017-09-12 16:32:45 +00:00
Hans Wennborg 8c1eb106bd Revert r313009 "[ARM] Use ADDCARRY / SUBCARRY"
This was causing PR34045 to fire again.

> This is a preparatory step for D34515 and also is being recommitted as its
> first version caused PR34045.
>
> This change:
>  - makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32
>  - lowering is done by first converting the boolean value into the carry flag
>    using (_, C) ← (ARMISD::ADDC R, -1) and converted back to an integer value
>    using (R, _) ← (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two
>    operations does the actual addition.
>  - for subtraction, given that ISD::SUBCARRY second result is actually a
>    borrow, we need to invert the value of the second operand and result before
>    and after using ARMISD::SUBE. We need to invert the carry result of
>    ARMISD::SUBE to preserve the semantics.
>  - given that the generic combiner may lower ISD::ADDCARRY and
>    ISD::SUBCARRYinto ISD::UADDO and ISD::USUBO we need to update their lowering
>    as well otherwise i64 operations now would require branches. This implies
>    updating the corresponding test for unsigned.
>  - add new combiner to remove the redundant conversions from/to carry flags
>    to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) → C
>  - fixes PR34045
>
> Differential Revision: https://reviews.llvm.org/D35192

Also revert follow-up r313010:

> [ARM] Fix typo when creating ISD::SUB nodes
>
> In D35192, I accidentally introduced a typo when creating ISD::SUB nodes,
> giving them two values instead of one.
>
> This fails when the merge_values combiner finds one of these nodes.
>
> This change fixes PR34564.
>
> Differential Revision: https://reviews.llvm.org/D37690

llvm-svn: 313044
2017-09-12 16:24:17 +00:00
Alexey Bataev 7fac4b2f25 [SLP] Test with mutiple uses of conditional op and wrong parent.
llvm-svn: 313042
2017-09-12 16:15:04 +00:00
Simon Pilgrim 76418aae74 [X86][AVX2] Add gather/movntdqa/pmaskmov/pmovmskb/pslldq/psrldq instructions to scheduling tests
llvm-svn: 313039
2017-09-12 15:52:01 +00:00
Sanjay Patel 2d4e6504af [InstCombine] move related tests together; NFC
llvm-svn: 313036
2017-09-12 15:29:28 +00:00
Simon Pilgrim 0af5a772e0 [X86][AVX2] Add further instructions to scheduling tests
llvm-svn: 313032
2017-09-12 15:01:20 +00:00
Simon Pilgrim d2d2b37cc9 [X86][AVX2] Add integer broadcast scheduling tests
llvm-svn: 313026
2017-09-12 12:59:20 +00:00
Jonas Paulsson fc4f323ac1 [SystemZ] Add the CoveredBySubRegs bit to GPR64, GPR128 and FPR128 registers.
This bit is needed in order for the CalleeSavedRegs list to automatically
include the super registers if all of their subregs are present.

Thanks to Wei Mi for initially indicating this deficiency in the SystemZ
backend.

Review: Ulrich Weigand.
https://bugs.llvm.org/show_bug.cgi?id=34550

llvm-svn: 313023
2017-09-12 12:11:29 +00:00
Simon Pilgrim 5a931c641e [X86][AVX2] Add additional fp-broadcast/subvector/shuffle scheduling tests
llvm-svn: 313022
2017-09-12 11:17:01 +00:00
Simon Pilgrim ef9a9d709a [X86][AVX] Add vperm2f128 scheduling test
llvm-svn: 313021
2017-09-12 11:10:59 +00:00
Simon Pilgrim f336d9ce3c [X86][AVX2] Remove old (unused) intrinsic declarations
llvm-svn: 313020
2017-09-12 11:09:30 +00:00
Yael Tsafrir 47668b5e03 [X86] Lower _mm[256|512]_[mask[z]]_avg_epu[8|16] intrinsics to native llvm IR
Differential Revision: https://reviews.llvm.org/D37560

llvm-svn: 313013
2017-09-12 07:50:35 +00:00
Silviu Baranga ac920f7716 [LAA] Allow more run-time alias checks by coercing pointer expressions to AddRecExprs
Summary:
LAA can only emit run-time alias checks for pointers with affine AddRec
SCEV expressions. However, non-AddRecExprs can be now be converted to
affine AddRecExprs using SCEV predicates.

This change tries to add the minimal set of SCEV predicates in order
to enable run-time alias checking.

Reviewers: anemet, mzolotukhin, mkuper, sanjoy, hfinkel

Reviewed By: hfinkel

Subscribers: mssimpso, Ayal, dorit, roman.shirokiy, mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D17080

llvm-svn: 313012
2017-09-12 07:48:22 +00:00
Roger Ferrer Ibanez 4f92b4162f [ARM] Use ADDCARRY / SUBCARRY
This is a preparatory step for D34515 and also is being recommitted as its
first version caused PR34045.

This change:
 - makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32
 - lowering is done by first converting the boolean value into the carry flag
   using (_, C) ← (ARMISD::ADDC R, -1) and converted back to an integer value
   using (R, _) ← (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two
   operations does the actual addition.
 - for subtraction, given that ISD::SUBCARRY second result is actually a
   borrow, we need to invert the value of the second operand and result before
   and after using ARMISD::SUBE. We need to invert the carry result of
   ARMISD::SUBE to preserve the semantics.
 - given that the generic combiner may lower ISD::ADDCARRY and
   ISD::SUBCARRYinto ISD::UADDO and ISD::USUBO we need to update their lowering
   as well otherwise i64 operations now would require branches. This implies
   updating the corresponding test for unsigned.
 - add new combiner to remove the redundant conversions from/to carry flags
   to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) → C
 - fixes PR34045

Differential Revision: https://reviews.llvm.org/D35192

llvm-svn: 313009
2017-09-12 07:40:09 +00:00
Craig Topper afdc36ed74 [X86] Add an extra instruction to TruncAssertSext.ll to prevent the 'or' from being narrowed so that the movl is really required to avoid a miscompile.
If we allow the OR to be narrowed then the upper bits really are zero and we can't tell if the zeroing movl was removed on purpose.

While here regenerate the test with update_llc_test_checks.py

llvm-svn: 312995
2017-09-12 03:50:44 +00:00
Craig Topper 66e4ace1c8 [X86] Rename TruncAssertZext.ll test to TruncAssertSext.ll. Since its testing AssertSext.
llvm-svn: 312991
2017-09-12 01:30:10 +00:00
Adrian Prantl 1e9a4ab031 Update testcases that are XFAILed on Darwin for llvm-dwarfdump changes.
llvm-svn: 312988
2017-09-12 01:20:29 +00:00
Hans Wennborg 075e5a2e2b Revert r312898 "[ARM] Use ADDCARRY / SUBCARRY"
It caused PR34564.

> This is a preparatory step for D34515 and also is being recommitted as its
> first version caused PR34045.
>
> This change:
>  - makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32
>  - lowering is done by first converting the boolean value into the carry flag
>    using (_, C) ← (ARMISD::ADDC R, -1) and converted back to an integer value
>    using (R, _) ← (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two
>    operations does the actual addition.
>  - for subtraction, given that ISD::SUBCARRY second result is actually a
>    borrow, we need to invert the value of the second operand and result before
>    and after using ARMISD::SUBE. We need to invert the carry result of
>    ARMISD::SUBE to preserve the semantics.
>  - given that the generic combiner may lower ISD::ADDCARRY and
>    ISD::SUBCARRYinto ISD::UADDO and ISD::USUBO we need to update their lowering
>    as well otherwise i64 operations now would require branches. This implies
>    updating the corresponding test for unsigned.
>  - add new combiner to remove the redundant conversions from/to carry flags
>    to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) → C
>  - fixes PR34045
>
> Differential Revision: https://reviews.llvm.org/D35192

llvm-svn: 312980
2017-09-11 23:52:02 +00:00
Yonghong Song be9c00347f bpf: add " ll" in the LD_IMM64 asmstring
This partially revert previous fix in commit f5858045aa0b
("bpf: proper print imm64 expression in inst printer").

In that commit, the original suffix "ll" is removed from
LD_IMM64 asmstring. In the customer print method, the "ll"
suffix is printed if the rhs is an immediate. For example,
"r2 = 5ll" => "r2 = 5ll", and "r3 = varll" => "r3 = var".

This has an issue though for assembler. Since assembler
relies on asmstring to do pattern matching, it will not
be able to distiguish between "mov r2, 5" and
"ld_imm64 r2, 5" since both asmstring is "r2 = 5".
In such cases, the assembler uses 64bit load for all
"r = <val>" asm insts.

This patch adds back " ll" suffix for ld_imm64 with one
additional space for "#reg = #global_var" case.

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 312978
2017-09-11 23:43:35 +00:00
Adrian Prantl 424aac3687 Update testcases that are XFAILed on Darwin for llvm-dwarfdump changes.
llvm-svn: 312977
2017-09-11 23:40:44 +00:00
Vedant Kumar 8eae1f9d39 [llvm-cov] Try to fix a test on Windows
Failing bot:
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/4791

This looks like another stderr redirection issue.

llvm-svn: 312975
2017-09-11 23:32:30 +00:00
Adrian Prantl 16aa4cf7ef llvm-dwarfdump: Make -brief the default and add a -verbose option instead.
Differential Revision: https://reviews.llvm.org/D37717

llvm-svn: 312972
2017-09-11 23:05:20 +00:00
Adrian Prantl 7bc1b28291 llvm-dwarfdump: Replace -debug-dump=sect option with individual options.
As discussed on llvm-dev in
http://lists.llvm.org/pipermail/llvm-dev/2017-September/117301.html
this changes the command line interface of llvm-dwarfdump to match the
one used by the dwarfdump utility shipping on macOS. In addition to
being shorter to type this format also has the advantage of allowing
more than one section to be specified at the same time.

In a nutshell, with this change

  $ llvm-dwarfdump --debug-dump=info
  $ llvm-dwarfdump --debug-dump=apple-objc

becomes

  $ dwarfdump --debug-info --apple-objc

Differential Revision: https://reviews.llvm.org/D37714

llvm-svn: 312970
2017-09-11 22:59:45 +00:00
Eli Friedman 50479f60c4 [llvm-cov] Allow hiding instantiation/region coverage from summary tables
Region coverage is difficult to explain without going deep into how
coverage is implemented. Instantiation coverage is easier to explain,
but probably not useful in most cases (templates don't exist in C, and
most C++ code contains relatively few templates).

This patch adds the options "-show-region-summary" and
"-show-instantiation-summary" to allow hiding those columns.
"-show-instantiation-summary" is turned off by default.

llvm-svn: 312969
2017-09-11 22:56:20 +00:00
Peter Collingbourne b9b6025328 LowerTypeTests: Add import/export support for targets without absolute symbol constants.
The rationale is the same as for r312967.

Differential Revision: https://reviews.llvm.org/D37408

llvm-svn: 312968
2017-09-11 22:49:10 +00:00
Peter Collingbourne b15a35e604 WholeProgramDevirt: Add import/export support for targets without absolute symbol constants.
Not all targets support the use of absolute symbols to export
constants. In particular, ARM has a wide variety of constant encodings
that cannot currently be relocated by linkers. So instead of exporting
the constants using symbols, export them directly in the summary.
The values of the constants are left as zeroes on targets that support
symbolic exports.

This may result in more cache misses when targeting those architectures
as a result of arbitrary changes in constant values, but this seems
somewhat unavoidable for now.

Differential Revision: https://reviews.llvm.org/D37407

llvm-svn: 312967
2017-09-11 22:34:42 +00:00
Vedant Kumar 71bc1afaab [llvm-cov] Don't attach exec counts to lines which start a skipped region
These lines by definition don't have an execution count.

This is the final part of the fix for:
https://bugs.llvm.org/show_bug.cgi?id=34166

llvm-svn: 312955
2017-09-11 21:31:32 +00:00
Sanjay Patel bb1b1c97fa [InstSimplify] fix some test names; NFC
Too much division...the quotient is the answer.

llvm-svn: 312943
2017-09-11 20:38:31 +00:00
Sanjay Patel a36eb77118 [InstSimplify] add tests for possible sdiv/srem simplifications; NFC
As noted in PR34517, the handling of signed div/rem is not on par with
unsigned div/rem. Signed is harder to reason about, but it should be
possible to handle at least some of these using the same technique that
we use for unsigned: use icmp logic to see if there's a relationship
between the quotient and divisor.

llvm-svn: 312938
2017-09-11 19:42:41 +00:00
Matt Arsenault 537bd3b906 AMDGPU: Allow coldcc calls
llvm-svn: 312936
2017-09-11 18:54:20 +00:00
Petar Jovanovic d4f3723c56 [mips][microMIPS] add lapc instruction
Implement LAPC instruction for mips32r6, mips64r6 and micromips32r6.

Patch by Milos Stojanovic.

Differential Revision: https://reviews.llvm.org/D35984

llvm-svn: 312934
2017-09-11 18:34:04 +00:00
Hiroshi Yamauchi 9364432cec Unmerge GEPs to reduce register pressure on IndirectBr edges.
Summary:
GEP merging can sometimes increase the number of live values and register
pressure across control edges and cause performance problems particularly if the
increased register pressure results in spills.

This change implements GEP unmerging around an IndirectBr in certain cases to
mitigate the issue. This is in the CodeGenPrepare pass (after all the GEP
merging has happened.)

With this patch, the Python interpreter loop runs faster by ~5%.

Reviewers: sanjoy, hfinkel

Reviewed By: hfinkel

Subscribers: eastig, junbuml, llvm-commits

Differential Revision: https://reviews.llvm.org/D36772

llvm-svn: 312930
2017-09-11 17:52:08 +00:00
Stanislav Mekhanoshin 710da42b86 [AMDGPU] Produce madak and madmk from the two-address pass
These two instructions are normally selected, but when the
two address pass converts mac into mad we end up with the
mad where we could have one of these.

Differential Revision: https://reviews.llvm.org/D37389

llvm-svn: 312928
2017-09-11 17:13:57 +00:00
Zvi Rackover 255488a1e0 X86 Tests: More AVX512 conversions tests. NFC
Adding more tests for AVX512 fp<->int conversions that were missing.

llvm-svn: 312921
2017-09-11 15:54:38 +00:00
Simon Pilgrim b092bd321a [X86][SSE] Add support for X86ISD::PACKSS to ComputeNumSignBitsForTargetNode
Helps improve combineLogicBlendIntoPBLENDV support by allowing us to peek into through PACKSS truncations of vector comparison results.

Differential Revision: https://reviews.llvm.org/D37680

llvm-svn: 312916
2017-09-11 14:03:47 +00:00
Tim Renouf 660ba2b8af [AMDGPU] exp should not be in WQM mode
A mrt exp with vm=1 must be in exact (non-WQM) mode, as it also exports
the exec mask as the valid mask to determine which pixels to render.

This commit marks any exp as needing to be in exact mode.

Actually, if there are multiple mrt exps, only one needs to have vm=1,
and only that one needs to be in exact mode. But that is an optimization
for another day.

Differential Revision: https://reviews.llvm.org/D36305

llvm-svn: 312915
2017-09-11 13:55:39 +00:00
Simon Pilgrim d0ff65b50e [X86][SSE] Add further test cases showing failure to compute sign bits through PACKSS
Suggested in D37680

Note: had to drop AVX512VL tests as there is an infinite loop in the new tests that needs further investigation (not relevant to D37680).
llvm-svn: 312910
2017-09-11 12:18:43 +00:00
Gadi Haber 3ddffced43 [X86][SKX][KNL] Updating several CodeGen tests to use the attr flag instead of mcpu flag
NFC.
 Updated 3 Codegen regression tests to use the -mattr flag instead of the -mcpu flags as follows:
 Instead of -mcpu=skx use -mattr=+avx512f,+avx512bw,+avx512vl,+avx512dq
 Instead of -mcpu=knl use -mattr=+avx512f

Reviewers: delena
Revision: https://reviews.llvm.org/D37674
llvm-svn: 312909
2017-09-11 11:26:20 +00:00
Andre Vieira c429aabb91 [ARM] Enable the use of SVC anywhere in an IT block
Differential Revision: https://reviews.llvm.org/D37374

llvm-svn: 312908
2017-09-11 11:11:17 +00:00
Michael Zuckerman 9707ba0957 [Interleved][Stride 3]Adding test for case the VF=64 target with AVX512.
llvm-svn: 312907
2017-09-11 10:57:15 +00:00
Simon Pilgrim f6fa1d0369 [X86][SSE] Add test showing failure to compute sign bits through PACKSS
Prevents combineLogicBlendIntoPBLENDV from merging to PBLENDV

llvm-svn: 312906
2017-09-11 10:50:03 +00:00
Dylan McKay 0fc5fe0a58 [AVR] Enable the '__do_copy_data' function
Also enables '__do_clear_bss'.

These functions are automaticalled called by the CRT if they are
declared.

We need these to be called otherwise RAM will start completely
uninitialised, even though we need to copy RAM variables from progmem to
RAM.

llvm-svn: 312905
2017-09-11 10:32:51 +00:00
Igor Breger 1f14364d64 [GlobalISel][X86] G_ANYEXT support.
Summary: G_ANYEXT support

Reviewers: zvi, delena

Reviewed By: delena

Subscribers: rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D37675

llvm-svn: 312903
2017-09-11 09:41:13 +00:00
Ilya Biryukov d386c299a2 Fixed a typo in llvm-cov/deferred-region.cpp test.
Input redirection was using `2&>1` instead of `2>&1`.

llvm-svn: 312902
2017-09-11 09:22:44 +00:00
Roger Ferrer Ibanez 12b20f2307 [ARM] Use ADDCARRY / SUBCARRY
This is a preparatory step for D34515 and also is being recommitted as its
first version caused PR34045.

This change:
 - makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32
 - lowering is done by first converting the boolean value into the carry flag
   using (_, C) ← (ARMISD::ADDC R, -1) and converted back to an integer value
   using (R, _) ← (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two
   operations does the actual addition.
 - for subtraction, given that ISD::SUBCARRY second result is actually a
   borrow, we need to invert the value of the second operand and result before
   and after using ARMISD::SUBE. We need to invert the carry result of
   ARMISD::SUBE to preserve the semantics.
 - given that the generic combiner may lower ISD::ADDCARRY and
   ISD::SUBCARRYinto ISD::UADDO and ISD::USUBO we need to update their lowering
   as well otherwise i64 operations now would require branches. This implies
   updating the corresponding test for unsigned.
 - add new combiner to remove the redundant conversions from/to carry flags
   to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) → C
 - fixes PR34045

Differential Revision: https://reviews.llvm.org/D35192

llvm-svn: 312898
2017-09-11 07:38:05 +00:00
Elena Demikhovsky cc477bbcea Fixed a bug in splitting Scatter operation in the Type Legalizer.
After the split of the Scatter operation, the order of the new instructions is well defined - Lo goes before Hi. Otherwise the semantic of Scatter (from LSB to MSB) is broken.
I'm chaining 2 nodes to prevent reordering.

Differential Revision https://reviews.llvm.org/D37670

llvm-svn: 312894
2017-09-11 06:18:15 +00:00
Sanjay Patel 5876189ff1 [InstSimplify] refactor udiv/urem code and add tests; NFCI
This removes some duplicated code and makes it easier to support signed div/rem
in a similar way if we want to do that. Note that the existing comments were not
accurate - we don't need a constant divisor to simplify; icmp simplification does
more than that. But as the added tests show, it could go even further.

llvm-svn: 312885
2017-09-10 17:55:08 +00:00
Elena Demikhovsky 9afc3d7b82 Added a test that demonstrates a ug in Scatter scheduling.
The bug is going to be fixed in an upcomming patch.
 

llvm-svn: 312883
2017-09-10 13:20:42 +00:00
Simon Pilgrim ed27bea373 [X86] Add v2i4 store test case (PR20012)
llvm-svn: 312874
2017-09-09 20:28:50 +00:00
Simon Pilgrim e932c7fafa [X86] Add v2i2 test case (PR20011)
llvm-svn: 312873
2017-09-09 20:22:35 +00:00
Simon Pilgrim da41ca5a25 [X86][FMA] Regenerate FMA tests
llvm-svn: 312871
2017-09-09 19:25:59 +00:00
Nuno Lopes 404f106d71 Merge isKnownNonNull into isKnownNonZero
It now knows the tricks of both functions.
Also, fix a bug that considered allocas of non-zero address space to be always non null

Differential Revision: https://reviews.llvm.org/D37628

llvm-svn: 312869
2017-09-09 18:23:11 +00:00
Simon Pilgrim 97a56866a2 [X86][SSE] i32 vector multiplications test cases from PR6399
llvm-svn: 312868
2017-09-09 18:18:17 +00:00
Simon Pilgrim a866a190d6 [X86][MOVBE] Fix typo in MOVBE scheduling test names
Copy+paste is not your friend

llvm-svn: 312867
2017-09-09 17:52:44 +00:00
Craig Topper 3be1db82b6 [X86] Don't disable slow INC/DEC if optimizing for size
Summary:
Just because INC/DEC is a little slow on some processors doesn't mean we shouldn't prefer it when optimizing for size.

This appears to match gcc behavior.

Reviewers: chandlerc, zvi, RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37177

llvm-svn: 312866
2017-09-09 17:11:59 +00:00
Sanjay Patel 59562ecb35 [DivRemPairs] split tests per target to account for bots that don't build for all targets
llvm-svn: 312863
2017-09-09 14:10:59 +00:00
Sanjay Patel 6fd4391ddd [DivRempairs] add a pass to optimize div/rem pairs (PR31028)
This is intended to be a superset of the functionality from D31037 (EarlyCSE) but implemented 
as an independent pass, so there's no stretching of scope and feature creep for an existing pass. 
I also proposed a weaker version of this for SimplifyCFG in D30910. And I initially had almost 
this same functionality as an addition to CGP in the motivating example of PR31028:
https://bugs.llvm.org/show_bug.cgi?id=31028

The advantage of positioning this ahead of SimplifyCFG in the pass pipeline is that it can allow 
more flattening. But it needs to be after passes (InstCombine) that could sink a div/rem and
undo the hoisting that is done here.

Decomposing remainder may allow removing some code from the backend (PPC and possibly others).

Differential Revision: https://reviews.llvm.org/D37121 

llvm-svn: 312862
2017-09-09 13:38:18 +00:00
Kyle Butt 8c0314c3ed PPC: Don't select lxv/stxv for insufficiently aligned stack slots.
The lxv/stxv instructions require an offset that is 0 % 16. Previously we were
selecting lxv/stxv for loads and stores to the stack where the offset from the
slot was a multiple of 16, but the stack slot was not 16 or more byte aligned.
When the frame gets lowered these transform to r(1|31) + slot + offset.
If slot is not aligned, slot + offset may not be 0 % 16.
Now we require 16 byte or more alignment for select lxv/stxv to stack slots.

Includes a testcase that shows both sufficiently and insufficiently aligned
stack slots.

llvm-svn: 312843
2017-09-09 00:37:56 +00:00
Yonghong Song 6807778e52 bpf: fix test failures due to previous bpf change of assembly code syntax
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 312840
2017-09-09 00:11:13 +00:00
Guozhi Wei 62d6414465 [TargetTransformInfo] Add a new public interface getInstructionCost
Current TargetTransformInfo can support throughput cost model and code size model, but sometimes we also need instruction latency cost model in different optimizations. Hal suggested we need a single public interface to query the different cost of an instruction. So I proposed following interface:

  enum TargetCostKind {
    TCK_RecipThroughput, ///< Reciprocal throughput.
    TCK_Latency,         ///< The latency of instruction.
    TCK_CodeSize         ///< Instruction code size.
  };

  int getInstructionCost(const Instruction *I, enum TargetCostKind kind) const;

All clients should mainly use this function to query the cost of an instruction, parameter <kind> specifies the desired cost model.

This patch also provides a simple default implementation of getInstructionLatency.

The default getInstructionLatency provides latency numbers for only small number of instruction classes, those latency numbers are only reasonable for modern OOO processors. It can be extended in following ways:

   Add more detail into this function.
   Add getXXXLatency function and call it from here.
   Implement target specific getInstructionLatency function.

Differential Revision: https://reviews.llvm.org/D37170

llvm-svn: 312832
2017-09-08 22:29:17 +00:00
David Blaikie bd20f84a63 Migrate llvm-symbolizer tests to not use %T
(context around the %T removal here: https://reviews.llvm.org/D35396 )

llvm-svn: 312828
2017-09-08 21:10:01 +00:00
Vedant Kumar 3292c33110 [llvm-cov] Use portable output redirection in a test
A follow-up to a test fix (r312825).

llvm-svn: 312826
2017-09-08 20:24:23 +00:00
Vedant Kumar e1e301a08a [llvm-cov] Try to appease a Windows bot
On a Windows bot, I see a FileCheck error where the source being matched
over no longer exists, i.e it seems like it's FileCheck'ing some stale
output:

http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/4747

You can see "// CHECK: [[@LINE]]|{{ +}Marker at 19:3 = 1" in the
FileCheck stderr, but that CHECK line doesn't exist.

Remove the input file to FileCheck before running the test, to try and
appease the bot.

llvm-svn: 312825
2017-09-08 20:18:17 +00:00
Vedant Kumar 57acd0ad01 [llvm-cov] Disable name-compression in a test binary
This should fix the lld bot:

The Buildbot has detected a new failure on builder llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast while building cfe.
Full details are available at:
http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/16993

llvm-svn: 312821
2017-09-08 19:08:39 +00:00
Matt Arsenault 2f4df7ec41 AMDGPU: Recompute scc liveness
The various scalar bit operations set SCC,
so one is erased or moved it needs to be recomputed.
Not sure why the existing tests don't fail on this.

llvm-svn: 312819
2017-09-08 18:51:26 +00:00
Vedant Kumar 79a1b5ee5a [Coverage] Build sorted and unique segments
A coverage segment contains a starting line and column, an execution
count, and some other metadata. Clients of the coverage library use
segments to prepare line-oriented reports.

Users of the coverage library depend on segments being unique and sorted
in source order. Currently this is not guaranteed (this is why the clang
change which introduced deferred regions was reverted).

This commit documents the "unique and sorted" condition and asserts that
it holds. It also fixes the SegmentBuilder so that it produces correct
output in some edge cases.

Testing: I've added unit tests for some edge cases. I've also checked
that the new SegmentBuilder implementation is fully covered. Apart from
running check-profile and the llvm-cov tests, I've successfully used a
stage1 llvm-cov to prepare a coverage report for an instrumented clang
binary.

Differential Revision: https://reviews.llvm.org/D36813

llvm-svn: 312817
2017-09-08 18:44:50 +00:00
Vedant Kumar 72c3a11488 [llvm-cov] Fix a lifetime issue
This fixes an issue where a std::string was moved to a constructor
which accepted a StringRef.

llvm-svn: 312816
2017-09-08 18:44:49 +00:00
Vedant Kumar bae8397006 [Coverage] Report errors when reading malformed source regions
Each source region has a start and end location. Report an error when
the end location does not precede the begin location.

The old lineExecutionCounts.covmapping test actually had a buggy source
region in it. This commit introduces a regenerated copy of the coverage
and moves the old copy to malformedRegions.covmapping, for a test.

Differential Revision: https://reviews.llvm.org/D37387

llvm-svn: 312814
2017-09-08 18:44:47 +00:00
Vedant Kumar 933b37f99f [llvm-cov] Unify region marker placement between text/html modes
Make sure that the text and html emitters always emit the same set of
region markers, and avoid emitting redundant markers for line segments
which don't end on the line they start on.

This is related to D35925, and depends on D36014

Differential Revision: https://reviews.llvm.org/D36020

llvm-svn: 312813
2017-09-08 18:44:46 +00:00
Craig Topper 56af2cad89 [X86] Simplify the slow-incdec test and add test cases with optsize.
I think we want to consider using inc/dec with optsize.

llvm-svn: 312804
2017-09-08 17:33:54 +00:00
Wei Mi 5d84d9b35c Fix a bug for rL312641.
rL312641 Allowed llvm.memcpy/memset/memmove to be tail calls when parent
function return the intrinsics's first argument. However on arm-none-eabi
platform, llvm.memcpy will be expanded to __aeabi_memcpy which doesn't
have return value. The fix is to check the libcall name after expansion
to match "memcpy/memset/memmove" before allowing those intrinsic to be
tail calls.

llvm-svn: 312799
2017-09-08 16:44:52 +00:00
Krzysztof Parzyszek f78eca8fb5 Preserve existing regs when adding pristines to LivePhysRegs/LiveRegUnits
Differential Revision: https://reviews.llvm.org/D37600

llvm-svn: 312797
2017-09-08 16:29:50 +00:00
Alexey Bataev 6dd29fccb8 [SLP] Support for horizontal min/max reduction.
SLP vectorizer supports horizontal reductions for Add/FAdd binary
operations. Patch adds support for horizontal min/max reductions.
Function getReductionCost() is split to getArithmeticReductionCost() for
binary operation reductions and getMinMaxReductionCost() for min/max
reductions.
Patch fixes PR26956.

Differential revision: https://reviews.llvm.org/D27846

llvm-svn: 312791
2017-09-08 13:49:36 +00:00
Simon Pilgrim 2e4fb24173 [X86] Added PR31045 test case
Reduced version of 'addr-calc-crash.ll' that was included in D27044, that had been fixed already by D31286/rL298633

llvm-svn: 312786
2017-09-08 10:49:11 +00:00
Max Kazantsev d7b0f74c64 Re-enable "[IRCE] Identify loops with latch comparison against current IV value"
Re-applying after the found bug was fixed.

Differential Revision: https://reviews.llvm.org/D36215

llvm-svn: 312783
2017-09-08 10:15:05 +00:00
Jatin Bhateja a251312719 [X86] Adding a test point for PR34149 'Suboptimal codegen for "fast" minnum and maxnum'
Differential Revision: https://reviews.llvm.org/D37614

llvm-svn: 312778
2017-09-08 09:15:36 +00:00
Max Kazantsev 57db44838d diff --git a/lib/Transforms/Scalar/InductiveRangeCheckElimination.cpp b/lib/Transforms/Scalar/InductiveRangeCheckElimination.cpp
index f72a808..9fa49fd 100644
--- a/lib/Transforms/Scalar/InductiveRangeCheckElimination.cpp
+++ b/lib/Transforms/Scalar/InductiveRangeCheckElimination.cpp
@@ -450,20 +450,10 @@ struct LoopStructure {
   // equivalent to:
   //
   // intN_ty inc = IndVarIncreasing ? 1 : -1;
-  // pred_ty predicate = IndVarIncreasing
-  //                         ? IsSignedPredicate ? ICMP_SLT : ICMP_ULT
-  //                         : IsSignedPredicate ? ICMP_SGT : ICMP_UGT;
+  // pred_ty predicate = IndVarIncreasing ? ICMP_SLT : ICMP_SGT;
   //
-  //
-  // for (intN_ty iv = IndVarStart; predicate(IndVarBase, LoopExitAt);
-  //      iv = IndVarNext)
+  // for (intN_ty iv = IndVarStart; predicate(iv, LoopExitAt); iv = IndVarBase)
   //   ... body ...
-  //
-  // Here IndVarBase is either current or next value of the induction variable.
-  // in the former case, IsIndVarNext = false and IndVarBase points to the
-  // Phi node of the induction variable. Otherwise, IsIndVarNext = true and
-  // IndVarBase points to IV increment instruction.
-  //
 
   Value *IndVarBase;
   Value *IndVarStart;
@@ -471,13 +461,12 @@ struct LoopStructure {
   Value *LoopExitAt;
   bool IndVarIncreasing;
   bool IsSignedPredicate;
-  bool IsIndVarNext;
 
   LoopStructure()
       : Tag(""), Header(nullptr), Latch(nullptr), LatchBr(nullptr),
         LatchExit(nullptr), LatchBrExitIdx(-1), IndVarBase(nullptr),
         IndVarStart(nullptr), IndVarStep(nullptr), LoopExitAt(nullptr),
-        IndVarIncreasing(false), IsSignedPredicate(true), IsIndVarNext(false) {}
+        IndVarIncreasing(false), IsSignedPredicate(true) {}
 
   template <typename M> LoopStructure map(M Map) const {
     LoopStructure Result;
@@ -493,7 +482,6 @@ struct LoopStructure {
     Result.LoopExitAt = Map(LoopExitAt);
     Result.IndVarIncreasing = IndVarIncreasing;
     Result.IsSignedPredicate = IsSignedPredicate;
-    Result.IsIndVarNext = IsIndVarNext;
     return Result;
   }
 
@@ -841,42 +829,21 @@ LoopStructure::parseLoopStructure(ScalarEvolution &SE,
     return false;
   };
 
-  // `ICI` can either be a comparison against IV or a comparison of IV.next.
-  // Depending on the interpretation, we calculate the start value differently.
+  // `ICI` is interpreted as taking the backedge if the *next* value of the
+  // induction variable satisfies some constraint.
 
-  // Pair {IndVarBase; IsIndVarNext} semantically designates whether the latch
-  // comparisons happens against the IV before or after its value is
-  // incremented. Two valid combinations for them are:
-  //
-  // 1) { phi [ iv.start, preheader ], [ iv.next, latch ]; false },
-  // 2) { iv.next; true }.
-  //
-  // The latch comparison happens against IndVarBase which can be either current
-  // or next value of the induction variable.
   const SCEVAddRecExpr *IndVarBase = cast<SCEVAddRecExpr>(LeftSCEV);
   bool IsIncreasing = false;
   bool IsSignedPredicate = true;
-  bool IsIndVarNext = false;
   ConstantInt *StepCI;
   if (!IsInductionVar(IndVarBase, IsIncreasing, StepCI)) {
     FailureReason = "LHS in icmp not induction variable";
     return None;
   }
 
-  const SCEV *IndVarStart = nullptr;
-  // TODO: Currently we only handle comparison against IV, but we can extend
-  // this analysis to be able to deal with comparison against sext(iv) and such.
-  if (isa<PHINode>(LeftValue) &&
-      cast<PHINode>(LeftValue)->getParent() == Header)
-    // The comparison is made against current IV value.
-    IndVarStart = IndVarBase->getStart();
-  else {
-    // Assume that the comparison is made against next IV value.
-    const SCEV *StartNext = IndVarBase->getStart();
-    const SCEV *Addend = SE.getNegativeSCEV(IndVarBase->getStepRecurrence(SE));
-    IndVarStart = SE.getAddExpr(StartNext, Addend);
-    IsIndVarNext = true;
-  }
+  const SCEV *StartNext = IndVarBase->getStart();
+  const SCEV *Addend = SE.getNegativeSCEV(IndVarBase->getStepRecurrence(SE));
+  const SCEV *IndVarStart = SE.getAddExpr(StartNext, Addend);
   const SCEV *Step = SE.getSCEV(StepCI);
 
   ConstantInt *One = ConstantInt::get(IndVarTy, 1);
@@ -1060,7 +1027,6 @@ LoopStructure::parseLoopStructure(ScalarEvolution &SE,
   Result.IndVarIncreasing = IsIncreasing;
   Result.LoopExitAt = RightValue;
   Result.IsSignedPredicate = IsSignedPredicate;
-  Result.IsIndVarNext = IsIndVarNext;
 
   FailureReason = nullptr;
 
@@ -1350,9 +1316,8 @@ LoopConstrainer::RewrittenRangeInfo LoopConstrainer::changeIterationSpaceEnd(
                                       BranchToContinuation);
 
     NewPHI->addIncoming(PN->getIncomingValueForBlock(Preheader), Preheader);
-    auto *FixupValue =
-        LS.IsIndVarNext ? PN->getIncomingValueForBlock(LS.Latch) : PN;
-    NewPHI->addIncoming(FixupValue, RRI.ExitSelector);
+    NewPHI->addIncoming(PN->getIncomingValueForBlock(LS.Latch),
+                        RRI.ExitSelector);
     RRI.PHIValuesAtPseudoExit.push_back(NewPHI);
   }
 
@@ -1735,10 +1700,7 @@ bool InductiveRangeCheckElimination::runOnLoop(Loop *L, LPPassManager &LPM) {
   }
   LoopStructure LS = MaybeLoopStructure.getValue();
   const SCEVAddRecExpr *IndVar =
-      cast<SCEVAddRecExpr>(SE.getSCEV(LS.IndVarBase));
-  if (LS.IsIndVarNext)
-    IndVar = cast<SCEVAddRecExpr>(SE.getMinusSCEV(IndVar,
-                                                  SE.getSCEV(LS.IndVarStep)));
+      cast<SCEVAddRecExpr>(SE.getMinusSCEV(SE.getSCEV(LS.IndVarBase), SE.getSCEV(LS.IndVarStep)));
 
   Optional<InductiveRangeCheck::Range> SafeIterRange;
   Instruction *ExprInsertPt = Preheader->getTerminator();
diff --git a/test/Transforms/IRCE/latch-comparison-against-current-value.ll b/test/Transforms/IRCE/latch-comparison-against-current-value.ll
deleted file mode 100644
index afea0e6..0000000
--- a/test/Transforms/IRCE/latch-comparison-against-current-value.ll
+++ /dev/null
@@ -1,182 +0,0 @@
-; RUN: opt -verify-loop-info -irce-print-changed-loops -irce -S < %s 2>&1 | FileCheck %s
-
-; Check that IRCE is able to deal with loops where the latch comparison is
-; done against current value of the IV, not the IV.next.
-
-; CHECK: irce: in function test_01: constrained Loop at depth 1 containing: %loop<header><exiting>,%in.bounds<latch><exiting>
-; CHECK: irce: in function test_02: constrained Loop at depth 1 containing: %loop<header><exiting>,%in.bounds<latch><exiting>
-; CHECK-NOT: irce: in function test_03: constrained Loop at depth 1 containing: %loop<header><exiting>,%in.bounds<latch><exiting>
-; CHECK-NOT: irce: in function test_04: constrained Loop at depth 1 containing: %loop<header><exiting>,%in.bounds<latch><exiting>
-
-; SLT condition for increasing loop from 0 to 100.
-define void @test_01(i32* %arr, i32* %a_len_ptr) #0 {
-
-; CHECK:      test_01
-; CHECK:        entry:
-; CHECK-NEXT:     %exit.mainloop.at = load i32, i32* %a_len_ptr, !range !0
-; CHECK-NEXT:     [[COND2:%[^ ]+]] = icmp slt i32 0, %exit.mainloop.at
-; CHECK-NEXT:     br i1 [[COND2]], label %loop.preheader, label %main.pseudo.exit
-; CHECK:        loop:
-; CHECK-NEXT:     %idx = phi i32 [ %idx.next, %in.bounds ], [ 0, %loop.preheader ]
-; CHECK-NEXT:     %idx.next = add nuw nsw i32 %idx, 1
-; CHECK-NEXT:     %abc = icmp slt i32 %idx, %exit.mainloop.at
-; CHECK-NEXT:     br i1 true, label %in.bounds, label %out.of.bounds.loopexit1
-; CHECK:        in.bounds:
-; CHECK-NEXT:     %addr = getelementptr i32, i32* %arr, i32 %idx
-; CHECK-NEXT:     store i32 0, i32* %addr
-; CHECK-NEXT:     %next = icmp slt i32 %idx, 100
-; CHECK-NEXT:     [[COND3:%[^ ]+]] = icmp slt i32 %idx, %exit.mainloop.at
-; CHECK-NEXT:     br i1 [[COND3]], label %loop, label %main.exit.selector
-; CHECK:        main.exit.selector:
-; CHECK-NEXT:     %idx.lcssa = phi i32 [ %idx, %in.bounds ]
-; CHECK-NEXT:     [[COND4:%[^ ]+]] = icmp slt i32 %idx.lcssa, 100
-; CHECK-NEXT:     br i1 [[COND4]], label %main.pseudo.exit, label %exit
-; CHECK-NOT: loop.preloop:
-; CHECK:        loop.postloop:
-; CHECK-NEXT:    %idx.postloop = phi i32 [ %idx.copy, %postloop ], [ %idx.next.postloop, %in.bounds.postloop ]
-; CHECK-NEXT:     %idx.next.postloop = add nuw nsw i32 %idx.postloop, 1
-; CHECK-NEXT:     %abc.postloop = icmp slt i32 %idx.postloop, %exit.mainloop.at
-; CHECK-NEXT:     br i1 %abc.postloop, label %in.bounds.postloop, label %out.of.bounds.loopexit
-
-entry:
-  %len = load i32, i32* %a_len_ptr, !range !0
-  br label %loop
-
-loop:
-  %idx = phi i32 [ 0, %entry ], [ %idx.next, %in.bounds ]
-  %idx.next = add nsw nuw i32 %idx, 1
-  %abc = icmp slt i32 %idx, %len
-  br i1 %abc, label %in.bounds, label %out.of.bounds
-
-in.bounds:
-  %addr = getelementptr i32, i32* %arr, i32 %idx
-  store i32 0, i32* %addr
-  %next = icmp slt i32 %idx, 100
-  br i1 %next, label %loop, label %exit
-
-out.of.bounds:
-  ret void
-
-exit:
-  ret void
-}
-
-; ULT condition for increasing loop from 0 to 100.
-define void @test_02(i32* %arr, i32* %a_len_ptr) #0 {
-
-; CHECK:      test_02
-; CHECK:        entry:
-; CHECK-NEXT:     %exit.mainloop.at = load i32, i32* %a_len_ptr, !range !0
-; CHECK-NEXT:     [[COND2:%[^ ]+]] = icmp ult i32 0, %exit.mainloop.at
-; CHECK-NEXT:     br i1 [[COND2]], label %loop.preheader, label %main.pseudo.exit
-; CHECK:        loop:
-; CHECK-NEXT:     %idx = phi i32 [ %idx.next, %in.bounds ], [ 0, %loop.preheader ]
-; CHECK-NEXT:     %idx.next = add nuw nsw i32 %idx, 1
-; CHECK-NEXT:     %abc = icmp ult i32 %idx, %exit.mainloop.at
-; CHECK-NEXT:     br i1 true, label %in.bounds, label %out.of.bounds.loopexit1
-; CHECK:        in.bounds:
-; CHECK-NEXT:     %addr = getelementptr i32, i32* %arr, i32 %idx
-; CHECK-NEXT:     store i32 0, i32* %addr
-; CHECK-NEXT:     %next = icmp ult i32 %idx, 100
-; CHECK-NEXT:     [[COND3:%[^ ]+]] = icmp ult i32 %idx, %exit.mainloop.at
-; CHECK-NEXT:     br i1 [[COND3]], label %loop, label %main.exit.selector
-; CHECK:        main.exit.selector:
-; CHECK-NEXT:     %idx.lcssa = phi i32 [ %idx, %in.bounds ]
-; CHECK-NEXT:     [[COND4:%[^ ]+]] = icmp ult i32 %idx.lcssa, 100
-; CHECK-NEXT:     br i1 [[COND4]], label %main.pseudo.exit, label %exit
-; CHECK-NOT: loop.preloop:
-; CHECK:        loop.postloop:
-; CHECK-NEXT:    %idx.postloop = phi i32 [ %idx.copy, %postloop ], [ %idx.next.postloop, %in.bounds.postloop ]
-; CHECK-NEXT:     %idx.next.postloop = add nuw nsw i32 %idx.postloop, 1
-; CHECK-NEXT:     %abc.postloop = icmp ult i32 %idx.postloop, %exit.mainloop.at
-; CHECK-NEXT:     br i1 %abc.postloop, label %in.bounds.postloop, label %out.of.bounds.loopexit
-
-entry:
-  %len = load i32, i32* %a_len_ptr, !range !0
-  br label %loop
-
-loop:
-  %idx = phi i32 [ 0, %entry ], [ %idx.next, %in.bounds ]
-  %idx.next = add nsw nuw i32 %idx, 1
-  %abc = icmp ult i32 %idx, %len
-  br i1 %abc, label %in.bounds, label %out.of.bounds
-
-in.bounds:
-  %addr = getelementptr i32, i32* %arr, i32 %idx
-  store i32 0, i32* %addr
-  %next = icmp ult i32 %idx, 100
-  br i1 %next, label %loop, label %exit
-
-out.of.bounds:
-  ret void
-
-exit:
-  ret void
-}
-
-; Same as test_01, but comparison happens against IV extended to a wider type.
-; This test ensures that IRCE rejects it and does not falsely assume that it was
-; a comparison against iv.next.
-; TODO: We can actually extend the recognition to cover this case.
-define void @test_03(i32* %arr, i64* %a_len_ptr) #0 {
-
-; CHECK:      test_03
-
-entry:
-  %len = load i64, i64* %a_len_ptr, !range !1
-  br label %loop
-
-loop:
-  %idx = phi i32 [ 0, %entry ], [ %idx.next, %in.bounds ]
-  %idx.next = add nsw nuw i32 %idx, 1
-  %idx.ext = sext i32 %idx to i64
-  %abc = icmp slt i64 %idx.ext, %len
-  br i1 %abc, label %in.bounds, label %out.of.bounds
-
-in.bounds:
-  %addr = getelementptr i32, i32* %arr, i32 %idx
-  store i32 0, i32* %addr
-  %next = icmp slt i32 %idx, 100
-  br i1 %next, label %loop, label %exit
-
-out.of.bounds:
-  ret void
-
-exit:
-  ret void
-}
-
-; Same as test_02, but comparison happens against IV extended to a wider type.
-; This test ensures that IRCE rejects it and does not falsely assume that it was
-; a comparison against iv.next.
-; TODO: We can actually extend the recognition to cover this case.
-define void @test_04(i32* %arr, i64* %a_len_ptr) #0 {
-
-; CHECK:      test_04
-
-entry:
-  %len = load i64, i64* %a_len_ptr, !range !1
-  br label %loop
-
-loop:
-  %idx = phi i32 [ 0, %entry ], [ %idx.next, %in.bounds ]
-  %idx.next = add nsw nuw i32 %idx, 1
-  %idx.ext = sext i32 %idx to i64
-  %abc = icmp ult i64 %idx.ext, %len
-  br i1 %abc, label %in.bounds, label %out.of.bounds
-
-in.bounds:
-  %addr = getelementptr i32, i32* %arr, i32 %idx
-  store i32 0, i32* %addr
-  %next = icmp ult i32 %idx, 100
-  br i1 %next, label %loop, label %exit
-
-out.of.bounds:
-  ret void
-
-exit:
-  ret void
-}
-
-!0 = !{i32 0, i32 50}
-!1 = !{i64 0, i64 50}

llvm-svn: 312775
2017-09-08 04:26:41 +00:00
Adrian Prantl 99ba97726a Fix a crash when emitting debug info for multi-reg function arguments
by reusing more of the existing machinery

This is a follow-up to r312169.
Thanks to Björn Pettersson for the testcase!

llvm-svn: 312773
2017-09-08 02:31:37 +00:00
Dean Michael Berris 711dec260f [XRay][CodeGen][PowerPC] Fix tail exit codegen for XRay in PPC
Summary:
This fixes code-gen for XRay in PPC. The regression wasn't caught by
codegen tests  which we add in this change.

What happened was the following:

- For tail exits, we used to unconditionally prepend the returns/exits
  with a pseudo-instruction that gets lowered to the instrumentation
  sled (and leave the actual return/exit instruction as-is).
- Changes to the XRay instrumentation pass caused the tail exits to
  suddenly also emit the tail exit pseudo-instruction, since the check
  for whether a return instruction was also a call instruction meant it
  was a tail exit instruction.
- None of the tests caught the regression either due to non-existent
  tests, or the tests being disabled/removed for continuous breakage.

This change re-introduces some of the basic tests and verifies that
we're back to a state that allows the back-end to generate appropriate
XRay instrumented binaries for PPC in the presence of tail exits.

Reviewers: echristo, timshen

Subscribers: nemanjai, kbarton, llvm-commits

Differential Revision: https://reviews.llvm.org/D37570

llvm-svn: 312772
2017-09-08 01:47:56 +00:00
Chandler Carruth acbcf06f03 [x86] Flesh out the custom ISel for RMW aritmetic ops with used flags to
cover the bitwise operators.

Nothing really exciting here, this just stamps out the rest of the core
operations that can RMW memory and set flags.

Still not implemented here: ADC, SBB. Those will require more
interesting logic to channel the flags *in*, and I'm not currently
planning to try to tackle that. It might be interesting for someone who
wants to improve our code generation for bignum implementations.

Differential Revision: https://reviews.llvm.org/D37141

llvm-svn: 312768
2017-09-08 00:17:12 +00:00