Commit Graph

17034 Commits

Author SHA1 Message Date
Matthias Braun fdc4c6b426 Revert "RegScavenging: Add scavengeRegisterBackwards()"
The ppc64 multistage bot fails on this.

This reverts commit r279124.

Also Revert "CodeGen: Add/Factor out LiveRegUnits class; NFCI" because it depends on the previous change
This reverts commit r279171.

llvm-svn: 279199
2016-08-19 03:03:24 +00:00
Kyle Butt 780b517d6b CodeGen: If Convert blocks that would form a diamond when tail-merged.
The following function currently relies on tail-merging for if
conversion to succeed. The common tail of cond_true and cond_false is
extracted, and this then forms a diamond pattern that can be
successfully if converted.

If this block does not get extracted, either because tail-merging is
disabled or the threshold is higher, we should still recognize this
pattern and if-convert it.

Fixed a regression in the original commit. Need to un-reverse branches after
reversing them, or other conversions go awry.

Regression on self-hosting bots with no obvious explanation. Tidied up range
handling to be more obviously correct, but there was no smoking gun.

define i32 @t2(i32 %a, i32 %b) nounwind {
entry:
        %tmp1434 = icmp eq i32 %a, %b           ; <i1> [#uses=1]
        br i1 %tmp1434, label %bb17, label %bb.outer

bb.outer:               ; preds = %cond_false, %entry
        %b_addr.021.0.ph = phi i32 [ %b, %entry ], [ %tmp10, %cond_false ]
        %a_addr.026.0.ph = phi i32 [ %a, %entry ], [ %a_addr.026.0, %cond_false ]
        br label %bb

bb:             ; preds = %cond_true, %bb.outer
        %indvar = phi i32 [ 0, %bb.outer ], [ %indvar.next, %cond_true ]
        %tmp. = sub i32 0, %b_addr.021.0.ph
        %tmp.40 = mul i32 %indvar, %tmp.
        %a_addr.026.0 = add i32 %tmp.40, %a_addr.026.0.ph
        %tmp3 = icmp sgt i32 %a_addr.026.0, %b_addr.021.0.ph
        br i1 %tmp3, label %cond_true, label %cond_false

cond_true:              ; preds = %bb
        %tmp7 = sub i32 %a_addr.026.0, %b_addr.021.0.ph
        %tmp1437 = icmp eq i32 %tmp7, %b_addr.021.0.ph
        %indvar.next = add i32 %indvar, 1
        br i1 %tmp1437, label %bb17, label %bb

cond_false:             ; preds = %bb
        %tmp10 = sub i32 %b_addr.021.0.ph, %a_addr.026.0
        %tmp14 = icmp eq i32 %a_addr.026.0, %tmp10
        br i1 %tmp14, label %bb17, label %bb.outer

bb17:           ; preds = %cond_false, %cond_true, %entry
        %a_addr.026.1 = phi i32 [ %a, %entry ], [ %tmp7, %cond_true ], [ %a_addr.026.0, %cond_false ]
        ret i32 %a_addr.026.1
}

Without tail-merging or diamond-tail if conversion:
LBB1_1:                                 @ %bb
                                        @ =>This Inner Loop Header: Depth=1
        cmp     r0, r1
        ble     LBB1_3
@ BB#2:                                 @ %cond_true
                                        @   in Loop: Header=BB1_1 Depth=1
        subs    r0, r0, r1
        cmp     r1, r0
        it      ne
        cmpne   r0, r1
        bgt     LBB1_4
LBB1_3:                                 @ %cond_false
                                        @   in Loop: Header=BB1_1 Depth=1
        subs    r1, r1, r0
        cmp     r1, r0
        bne     LBB1_1
LBB1_4:                                 @ %bb17
        bx      lr

With diamond-tail if conversion, but without tail-merging:
@ BB#0:                                 @ %entry
        cmp     r0, r1
        it      eq
        bxeq    lr
LBB1_1:                                 @ %bb
                                        @ =>This Inner Loop Header: Depth=1
        cmp     r0, r1
        ite     le
        suble   r1, r1, r0
        subgt   r0, r0, r1
        cmp     r1, r0
        bne     LBB1_1
@ BB#2:                                 @ %bb17
        bx      lr

llvm-svn: 279168
2016-08-18 22:09:27 +00:00
Zhan Jun Liau cf2f4b3251 [SystemZ] Use valid base/index regs for inline asm
Summary:
Inline asm memory constraints can have the base or index register be assigned
to %r0 right now. Make sure that we assign only ADDR64 registers to the base
and index.

Reviewers: uweigand

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23367

llvm-svn: 279157
2016-08-18 21:44:15 +00:00
Tom Stellard a1619cd9aa AMDGPU/SI: Fix a test in wqm.ll to always use s_cbranch_vcc*
Summary:
We need to use floating-point compares to ensure that s_cbranch_vcc*
instructions are always generated.  With integer compares, future
optimizations could cause s_cbranch_scc* to be generated instead.

Reviewers: arsenm, nhaehnle

Subscribers: llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D23401

llvm-svn: 279148
2016-08-18 21:21:53 +00:00
Wei Ding 52bb661dec AMDGPU : Fix QSAD and MQSAD instructions' incorrect data type.
Differential Revision: http://reviews.llvm.org/D23689

llvm-svn: 279126
2016-08-18 19:51:14 +00:00
Matthias Braun 075d0c23d5 RegScavenging: Add scavengeRegisterBackwards()
Re-apply r276044 with off-by-1 instruction fix for the reload placement.

This is a variant of scavengeRegister() that works for
enterBasicBlockEnd()/backward(). The benefit of the backward mode is
that it is not affected by incomplete kill flags.

This patch also changes
PrologEpilogInserter::doScavengeFrameVirtualRegs() to use the register
scavenger in backwards mode.

Differential Revision: http://reviews.llvm.org/D21885

llvm-svn: 279124
2016-08-18 19:47:59 +00:00
Simon Pilgrim 99fd9c5f56 [X86][SSE] Missed insertps shuffle patterns
llvm-svn: 279111
2016-08-18 18:19:28 +00:00
Valery Pykhtin 609c2f8137 [AMDGPU] add s_incperflevel/s_decperflevel intrinsics.
Differential revision: https://reviews.llvm.org/D23666

llvm-svn: 279106
2016-08-18 18:06:20 +00:00
Elliot Colp 687691aeac Fix SystemZ compilation abort caused by negative AND mask
Normally, when an AND with a constant is lowered to NILL, the constant value is truncated to 16 bits. However, since r274066, ANDs whose results are used in a shift are caught by a different pattern that does not truncate. The instruction printer expects a 16-bit unsigned immediate operand for NILL, so this results in an abort.

This patch adds code to manually truncate the constant in this situation. The rest of the bits are then set, so we will detect a case for NILL "naturally" rather than using peephole optimizations.

Differential Revision: http://reviews.llvm.org/D21854

llvm-svn: 279105
2016-08-18 18:04:26 +00:00
Duncan P. N. Exon Smith 84c2da47f9 AArch64: Don't call getIterator() on iterators
Remove an unnecessary round-trip:

    iterator => operator->() => getIterator()

In some cases, the iterator is end(), so the dereference of operator->
is invalid (UB).

The testcase only crashes with r278974 (currently reverted to
investigate this), which adds an assertion for invalid dereferences of
ilist nodes.

Fixes PR29035.

llvm-svn: 279104
2016-08-18 17:58:09 +00:00
Dan Gohman c9623db884 [WebAssembly] Disable the store-results optimization.
The WebAssemly spec removing the return value from store instructions, so
remove the associated optimization from LLVM.

This patch leaves the store instruction operands in place for now, so stores
now always write to "$drop"; these will be removed in a seperate patch.

llvm-svn: 279100
2016-08-18 17:51:27 +00:00
Ahmed Bougacha 33e19fe1c4 [AArch64][GlobalISel] Select floating-point binary ops.
There is no FREM instruction, but the others are straightforward.

llvm-svn: 279081
2016-08-18 16:05:11 +00:00
Ahmed Bougacha 71d033a17f [GlobalISel] Add floating-point binary ops.
llvm-svn: 279080
2016-08-18 16:05:06 +00:00
Derek Schuff ccdceda128 [WebAssembly] Refactor WebAssemblyLowerEmscriptenException pass for setjmp/longjmp
This patch changes the code structure of
WebAssemblyLowerEmscriptenException pass to support both exception
handling and setjmp/longjmp. It also changes the name of the pass and
the source file.

1. Change the file/pass name to WebAssemblyLowerEmscriptenExceptions ->
WebAssemblyLowerEmscriptenEHSjLj to make it clear that it supports both
EH and SjLj
2. List function / global variable names at the top so they
can be changed easily
3. Some cosmetic changes

Patch by Heejin Ahn

Differential Revision: https://reviews.llvm.org/D23588

llvm-svn: 279075
2016-08-18 15:27:25 +00:00
Ahmed Bougacha 1d0560b14d [AArch64][GlobalISel] Select G_SDIV/G_UDIV.
There is no REM instruction; that will require an expansion.
It's not obvious that should be done in select, rather than as a
(custom?) legalization.

llvm-svn: 279074
2016-08-18 15:17:13 +00:00
Ahmed Bougacha 13db94540c [GlobalISel] Add support for DIV/REM.
llvm-svn: 279073
2016-08-18 15:17:01 +00:00
Krzysztof Parzyszek b1b0372337 [Hexagon] Create vcombine in HexagonCopyToCombine
llvm-svn: 279067
2016-08-18 14:12:34 +00:00
Simon Pilgrim ab7c46eccf [X86][SSE] Add SSE1 tests to make sure we don't merge loads on illegal types
llvm-svn: 279065
2016-08-18 13:41:26 +00:00
Simon Dardis ea3431598e [mips] Correct tail call encoding for MIPSR6
r277708 enabled tails calls for MIPS but used the 'jr' instruction when the 
jump target was held in a register. For MIPSR6, 'jalr $zero, $reg' should
have been used. Additionally, add missing patterns for external and global
symbols for tail calls.

Reviewers: dsanders, vkalintiris

Differential Review: https://reviews.llvm.org/D23301

llvm-svn: 279064
2016-08-18 13:22:43 +00:00
Matthias Braun c9130ea6a3 Testcase for r279022
llvm-svn: 279031
2016-08-18 02:21:54 +00:00
Tim Northover de3aea0412 GlobalISel: support irtranslation of icmp instructions.
llvm-svn: 278969
2016-08-17 20:25:25 +00:00
Marina Yatsina 53ce3f9d02 Fix for PR29010
This is a fix for https://llvm.org/bugs/show_bug.cgi?id=29010
Root cause of the bug is that the register class of the machine instruction operand does not fully reflect if this registers that can be allocated.
Both for i386 and x86_64 the operand's register class is VR128RegClass and thus contains xmm0-xmm15, though in i386 we can only use xmm0-xmm8.
In order to get the actual allocable registers of the class we need to use RegisterClassInfo.

Differential Revision: https://reviews.llvm.org/D23613

llvm-svn: 278954
2016-08-17 19:07:40 +00:00
Jonas Paulsson 7a79422536 [LoopStrenghtReduce] Refactoring and addition of a new target cost function.
Refactored so that a LSRUse owns its fixups, as oppsed to letting the
LSRInstance own them. This makes it easier to rate formulas for
LSRUses, since the fixups are available directly. The Offsets vector
has been removed since it was no longer necessary.

New target hook isFoldableMemAccessOffset(), which is used during formula
rating.

For SystemZ, this is useful to express that loads and stores with
float or vector types with a big/negative offset should be avoided in
loops. Without this, LSR will generate a lot of negative offsets that
would require extra instructions for loading the address.

Updated tests:
test/CodeGen/SystemZ/loop-01.ll

Reviewed by: Quentin Colombet and Ulrich Weigand.
https://reviews.llvm.org/D19152

llvm-svn: 278927
2016-08-17 13:24:19 +00:00
Ayman Musa 71b43c5c1d Fix bug in DAGBuilder for getelementptr with expanded vector.
Replacing the usage of MVT with EVT in case the vector type is expanded.
Differential Revision: https://reviews.llvm.org/D23306

llvm-svn: 278913
2016-08-17 07:52:15 +00:00
Chuang-Yu Cheng f7ba716bcb [ppc64] Don't apply sibling call optimization if callee has any byval arg
This is a quick work around, because in some cases, e.g. caller's stack
size > callee's stack size, we are still able to apply sibling call
optimization even callee has any byval arg.

This patch fix: https://llvm.org/bugs/show_bug.cgi?id=28328

Reviewers: hfinkel kbarton nemanjai amehsan
Subscribers: hans, tjablin

https://reviews.llvm.org/D23441

llvm-svn: 278900
2016-08-17 03:17:44 +00:00
Kyle Butt 07d61425e3 Codegen: Don't tail-duplicate blocks with un-analyzable fallthrough.
If AnalyzeBranch can't analyze a block and it is possible to
fallthrough, then duplicating the block doesn't make sense, as only one
block can be the layout predecessor for the un-analyzable fallthrough.

Submitted wit a test case, but NOTE: the test case doesn't currently
fail. However, the test case fails with D20505 and would have saved me
some time debugging.

llvm-svn: 278866
2016-08-16 22:56:14 +00:00
Sanjay Patel 904cd39b05 [x86] Allow merging multiple instances of an immediate within a basic block for code size savings, for 64-bit constants.
This patch handles 64-bit constants which can be encoded as 32-bit immediates.

It extends the functionality added by https://reviews.llvm.org/D11363 for 32-bit constants to 64-bit constants.

Patch by Sunita Marathe!

Differential Revision: https://reviews.llvm.org/D23391

llvm-svn: 278857
2016-08-16 21:35:16 +00:00
Haicheng Wu 9780df5385 [BranchFolding] Change a test case of r278575.
Rename the operands to make the test less brittle.

llvm-svn: 278841
2016-08-16 20:06:25 +00:00
Sjoerd Meijer 15c81b05ea [MBP] do not reorder and move up loop latch block
Do not reorder and move up a loop latch block before a loop header
when optimising for size because this will generate an extra 
unconditional branch.

Differential Revision: https://reviews.llvm.org/D22521

llvm-svn: 278840
2016-08-16 19:50:33 +00:00
Simon Dardis 4893aff94e [mips] Enforce compact branch restrictions
Check both operands for use of the $zero register which cannot be used with
a compact branch instruction.

Reviewers: dsanders, vkalintris

Differential Review: https://reviews.llvm.org/D23547

llvm-svn: 278824
2016-08-16 17:16:11 +00:00
Wei Mi db68c9adbd Remove a stale comment from the test, NFC.
llvm-svn: 278821
2016-08-16 16:57:15 +00:00
Ahmed Bougacha e4c03abddd [AArch64][GlobalISel] Select G_MUL.
llvm-svn: 278810
2016-08-16 14:37:46 +00:00
Brendon Cahoon 65b6ebccad [Pipeliner] Fix an asssert due to invalid Phi in the epilog
The pipeliner was generating an invalid Phi name for an operand
in the epilog block, which caused an assert in the live variable
analysis pass. The fix is to the code that generates new Phis
in the epilog block. In this case, there is an existing Phi that
needs to be reused rather than creating a new Phi instruction.

Differential Revision: https://reviews.llvm.org/D23513

llvm-svn: 278805
2016-08-16 14:29:24 +00:00
Ahmed Bougacha 2ac5bf94bc [AArch64][GlobalISel] Select (variable) shifts.
For now, no support for immediates.

llvm-svn: 278804
2016-08-16 14:02:47 +00:00
Ahmed Bougacha 7e508a8fcd [AArch64][GlobalISel] Robustize select tests. NFC.
Using the same register means nothing was checking for operand order.

llvm-svn: 278803
2016-08-16 14:02:44 +00:00
Ahmed Bougacha 0306b5ef07 [AArch64][GlobalISel] Select p0 G_FRAME_INDEX.
And mark it as legal.

llvm-svn: 278802
2016-08-16 14:02:42 +00:00
Simon Pilgrim 25d2506029 [X86][AVX] Fixed typo in zero element insertion
llvm-svn: 278798
2016-08-16 13:33:33 +00:00
Ron Lieberman a481c7db93 [Hexagon] Improve test to check for @PCREL, only run llc, not opt -> llc.
llvm-svn: 278796
2016-08-16 13:10:09 +00:00
Simon Pilgrim cc316f013a [X86][SSE] Add support for combining v2f64 target shuffles to VZEXT_MOVL byte rotations
The combine was only matching v2i64 as it assumed lowering to MOVQ - but we have v2f64 patterns that match in a similar fashion

llvm-svn: 278794
2016-08-16 12:52:06 +00:00
Simon Pilgrim d2d3202532 [X86][AVX512BW] Updated tests to demonstrate AVX512BW's inability to vectorize v64i8 shifts
llvm-svn: 278790
2016-08-16 11:05:47 +00:00
Simon Pilgrim f16cd361d4 [X86][SSE] Add support for combining target shuffles to PALIGNR byte rotations
llvm-svn: 278787
2016-08-16 10:03:23 +00:00
Guy Blank 722caebdae [X86] Add xgetbv/xsetbv intrinsics to non-windows platforms
Differential Revision: https://reviews.llvm.org/D21958

llvm-svn: 278782
2016-08-16 06:41:00 +00:00
Eli Friedman 98151d6440 Fix typo in lowering for fp128 ueq.
Regression from r259791.

Differential Revision: https://reviews.llvm.org/D23374

llvm-svn: 278750
2016-08-15 21:46:19 +00:00
Jan Vesely 0486f739a4 AMDGPU/R600: Convert buffer id to VTX_READ input
Use patterns instead of multiple instructions
Add buffer id to asm string

https://reviews.llvm.org/D22650

llvm-svn: 278749
2016-08-15 21:38:30 +00:00
Tim Northover 28fdc4272d GlobalISel: support loads and stores of strange types.
Before we mischaracterized structs and i1 types as a scalar with size 0 in
various ways.

llvm-svn: 278744
2016-08-15 21:13:17 +00:00
Reid Kleckner bb8652312a Fix WAsm test after LSR change in r278658
Now the increment is done in a different location

llvm-svn: 278713
2016-08-15 18:51:42 +00:00
Matt Arsenault 3661e90e71 AMDGPU: Don't fold subregister extracts into tied operands
llvm-svn: 278676
2016-08-15 16:18:36 +00:00
Reid Kleckner 70a600b8bb Revert "[SimplifyCFG] Rewrite SinkThenElseCodeToEnd"
This reverts commit r278660.

It causes downstream assertion failure in InstCombine on shuffle
instructions. Comes up in __mm_swizzle_epi32.

llvm-svn: 278672
2016-08-15 15:42:31 +00:00
James Molloy 9a3c82f5cf [SimplifyCFG] Rewrite SinkThenElseCodeToEnd
The new version has several advantages:
  1) IMSHO it's more readable and neater
  2) It handles loads and stores properly
  3) It can handle any number of incoming blocks rather than just two. I'll be taking advantage of this in a followup patch.

With this change we can now finally sink load-modify-store idioms such as:

    if (a)
      return *b += 3;
    else
      return *b += 4;

    =>

    %z = load i32, i32* %y
    %.sink = select i1 %a, i32 5, i32 7
    %b = add i32 %z, %.sink
    store i32 %b, i32* %y
    ret i32 %b

When this works for switches it'll be even more powerful.

llvm-svn: 278660
2016-08-15 08:04:56 +00:00
James Molloy 196ad0823e [LSR] Don't try and create post-inc expressions on non-rotated loops
If a loop is not rotated (for example when optimizing for size), the latch is not the backedge. If we promote an expression to post-inc form, we not only increase register pressure and add a COPY for that IV expression but for all IVs!

Motivating testcase:

    void f(float *a, float *b, float *c, int n) {
      while (n-- > 0)
        *c++ = *a++ + *b++;
    }

It's imperative that the pointer increments be located in the latch block and not the header block; if not, we cannot use post-increment loads and stores and we have to keep both the post-inc and pre-inc values around until the end of the latch which bloats register usage.

llvm-svn: 278658
2016-08-15 07:53:03 +00:00