Commit Graph

19486 Commits

Author SHA1 Message Date
Hans Wennborg 1c9d800fbc Revert r296865 "[ARM] fpscr read/write intrinsics not aware of each other"
It caused PR32134: "Cannot select: intrinsic %llvm.arm.get.fpscr".

llvm-svn: 296926
2017-03-03 23:19:31 +00:00
Tim Northover 3e6a7afd81 GlobalISel: constrain G_INSERT to inserting just one value per instruction.
It's much easier to reason about single-value inserts and no-one was actually
using the variadic variants before.

llvm-svn: 296923
2017-03-03 23:05:47 +00:00
Tim Northover bf017293af GlobalISel: add merge/unmerge nodes for legalization.
These are simplified variants of the current G_SEQUENCE and G_EXTRACT, which
assume the individual parts will be contiguous, homogeneous, and occupy the
entirity of the larger register. This makes reasoning about them much easer
since you only have to look at the first register being merged and the result
to know what the instruction is doing.

I intend to gradually replace all uses of the more complicated sequence/extract
with these (or single-element insert/extracts), and then remove the older
variants. For now we start with legalization.

llvm-svn: 296921
2017-03-03 22:46:09 +00:00
Sanjay Patel 000d61acfd [x86] regenerate checks; NFC
llvm-svn: 296908
2017-03-03 20:48:54 +00:00
Sanjay Patel 1716aa45f1 [x86] regenerate checks; NFC
llvm-svn: 296883
2017-03-03 16:58:51 +00:00
Sanjay Patel 9f5db7d4e0 [x86] regenerate checks; NFC
llvm-svn: 296881
2017-03-03 16:45:57 +00:00
Sanjay Patel 872e0b86eb [x86] regenerate checks; NFC
llvm-svn: 296880
2017-03-03 16:42:43 +00:00
Sanjay Patel 6fa6316769 [x86] regenerate checks; NFC
llvm-svn: 296877
2017-03-03 16:34:35 +00:00
Ranjeet Singh 7b60a9ed0c [ARM] fpscr read/write intrinsics not aware of each other
The intrinsics __builtin_arm_get_fpscr and __builtin_arm_set_fpscr read and
write to the fpscr (Floating-Point Status and Control Register) register.

A bug exists in the __builtin_arm_get_fpscr intrinsic definition in llvm which
treats this intrinsic as a IntroNoMem which means it's not a memory access and
doesn't have any other side-effects. Having this property on this intrinsic
means that various optimizations can be done on this such as common
sub-expression elimination with other reads. This can cause issues if there has
been write to this register, e.g.

void foo(int *p) {
     p[0] = __builtin_arm_get_fpscr();
     __builtin_arm_set_fpscr(1);
     p[1] = __builtin_arm_get_fpscr();
}

in the above example the second read is currently CSE'd into the first read,
this is because llvm isn't aware that the write done by __builtin_arm_set_fpscr
effects the same register that __builtin_arm_get_fpscr reads from, to fix this
problem I've removed the property IntrNoMem so that __builtin_arm_get_fpscr is
treated as a memory access.

Differential Revision: https://reviews.llvm.org/D30542

llvm-svn: 296865
2017-03-03 11:40:07 +00:00
Chandler Carruth ce52b80744 [SDAG] Revert r296476 (and r296486, r296668, r296690).
This patch causes compile times for some patterns to explode. I have
a (large, unreduced) test case that slows down by more than 20x and
several test cases slow down by 2x. I'm sending some of the test cases
directly to Nirav and following up with more details in the review log,
but this should unblock anyone else hitting this.

llvm-svn: 296862
2017-03-03 10:02:25 +00:00
Amjad Aboud 4f97751798 [X86] Generate VZEROUPPER for Skylake-avx512.
VZEROUPPER should not be issued on Knights Landing (KNL), but on Skylake-avx512 it should be.

Differential Revision: https://reviews.llvm.org/D29874

llvm-svn: 296859
2017-03-03 09:03:24 +00:00
Igor Breger 321cf3c650 [GlobalISel][X86] Support float/double and vector types.
Summary: [GlobalISel][X86] Add support for f32/f64 and vector types in RegisterBank and InstructionSelector.

Reviewers: delena, zvi

Reviewed By: zvi

Subscribers: dberris, rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30533

llvm-svn: 296856
2017-03-03 08:06:46 +00:00
Kyle Butt 1fa6030767 CodeGen: BlockPlacement: Precompute layout for chains of triangles.
For chains of triangles with small join blocks that can be tail duplicated, a
simple calculation of probabilities is insufficient. Tail duplication
can be profitable in 3 different ways for these cases:

1) The post-dominators marked 50% are actually taken 56% (This shrinks with
   longer chains)
2) The chains are statically correlated. Branch probabilities have a very
   U-shaped distribution.
   [http://nrs.harvard.edu/urn-3:HUL.InstRepos:24015805]
   If the branches in a chain are likely to be from the same side of the
   distribution as their predecessor, but are independent at runtime, this
   transformation is profitable. (Because the cost of being wrong is a small
   fixed cost, unlike the standard triangle layout where the cost of being
   wrong scales with the # of triangles.)
3) The chains are dynamically correlated. If the probability that a previous
   branch was taken positively influences whether the next branch will be
   taken
We believe that 2 and 3 are common enough to justify the small margin in 1.

The code pre-scans a function's CFG to identify this pattern and marks the edges
so that the standard layout algorithm can use the computed results.

llvm-svn: 296845
2017-03-03 01:00:22 +00:00
Taewook Oh 96c6415697 [DAGCombiner] Fix DebugLoc propagation when folding !(x cc y) -> (x !cc y)
Summary:
Currently, when 't1: i1 = setcc t2, t3, cc' followed by 't4: i1 = xor t1, Constant:i1<-1>' is folded into 't5: i1 = setcc t2, t3 !cc', SDLoc of newly created SDValue 't5' follows SDLoc of 't4', not 't1'. However, as the opcode of newly created SDValue is 'setcc', it make more sense to take DebugLoc from 't1' than 't4'. For the code below

```
extern int bar();
extern int baz();

int foo(int x, int y) {
  if (x != y)
    return bar();
  else
    return baz();
}
```

, following is the bitcode representation of 'foo' at the end of llvm-ir level optimization:

```
define i32 @foo(i32 %x, i32 %y) !dbg !4 {
entry:
  tail call void @llvm.dbg.value(metadata i32 %x, i64 0, metadata !9, metadata !11), !dbg !12
  tail call void @llvm.dbg.value(metadata i32 %y, i64 0, metadata !10, metadata !11), !dbg !13
  %cmp = icmp ne i32 %x, %y, !dbg !14
  br i1 %cmp, label %if.then, label %if.else, !dbg !16

if.then:                                          ; preds = %entry
  %call = tail call i32 (...) @bar() #3, !dbg !17
  br label %return, !dbg !18

if.else:                                          ; preds = %entry
  %call1 = tail call i32 (...) @baz() #3, !dbg !19
  br label %return, !dbg !20

return:                                           ; preds = %if.else, %if.then
  %retval.0 = phi i32 [ %call, %if.then ], [ %call1, %if.else ]
  ret i32 %retval.0, !dbg !21
}

!14 = !DILocation(line: 5, column: 9, scope: !15)
!16 = !DILocation(line: 5, column: 7, scope: !4)

```

As you can see, in 'entry' block, 'icmp' instruction and 'br' instruction have different debug locations. However, with current implementation, there's no distinction between debug locations of these two when they are lowered to asm instructions. This is because 'icmp' and 'br' become 'setcc' 'xor' and 'brcond' in SelectionDAG, where SDLoc of 'setcc' follows the debug location of 'icmp' but SDLOC of 'xor' and 'brcond' follows the debug location of 'br' instruction, and SDLoc of 'xor' overwrites SDLoc of 'setcc' when they are folded. This patch addresses this issue.

Reviewers: atrick, bogner, andreadb, craig.topper, aprantl

Reviewed By: andreadb

Subscribers: jlebar, mkuper, jholewinski, andreadb, llvm-commits

Differential Revision: https://reviews.llvm.org/D29813

llvm-svn: 296825
2017-03-02 21:58:35 +00:00
Krzysztof Parzyszek e720feb1c6 [Hexagon] Pick the right branch opcode depending on branch probabilities
Specifically, pick the opcode with the correct branch prediction, i.e.
jump:t or jump:nt.

llvm-svn: 296821
2017-03-02 21:49:49 +00:00
Tobias Grosser 02d86b80ec Revert "AMDGPU: Re-do update for branch-relaxation test"
This commit also relied on r296812, which I just reverted. We should probably
apply it again, after the r296812 has been discussed and been reapplied in some
variant.

llvm-svn: 296820
2017-03-02 21:47:51 +00:00
Kyle Butt 1393761e0c CodeGen: MachineBlockPlacement: Remove the unused outlining heuristic.
Outlining optional branches isn't a good heuristic, and it's never been
on by default. Remove it to clean things up.

llvm-svn: 296818
2017-03-02 21:44:24 +00:00
Eli Friedman bb821276d0 [ARM] Fix insert point for store rescheduling.
In ARMPreAllocLoadStoreOpt::RescheduleOps, LastOp should be the last
operation which we want to merge. If we break out of the loop because
an operation has the wrong offset, we shouldn't use that operation
as LastOp.

This patch fixes some cases where we would move stores to the wrong
insert point.

Re-commit with a fix to increment NumMove in the right place.

Differential Revision: https://reviews.llvm.org/D30124

llvm-svn: 296815
2017-03-02 21:39:39 +00:00
Guozhi Wei ed28e742ee [PPC] Fix code generation for bswap(int32) followed by store16
This patch fixes pr32063.

Current code in PPCTargetLowering::PerformDAGCombine can transform

bswap
store

into a single PPCISD::STBRX instruction. but it doesn't consider the case that the operand size of bswap may be larger than store size. When it occurs, we need 2 modifications,

1 For the last operand of PPCISD::STBRX, we should not use DAG.getValueType(N->getOperand(1).getValueType()), instead we should use cast<StoreSDNode>(N)->getMemoryVT().

2 Before PPCISD::STBRX, we need to shift the original operand of bswap to the right side.

Differential Revision: https://reviews.llvm.org/D30362

llvm-svn: 296811
2017-03-02 21:07:59 +00:00
Chad Rosier ea25eca04a [AArch64] Extend redundant copy elimination pass to handle non-zero stores.
This patch extends the current functionality of the AArch64 redundant copy
elimination pass to handle non-zero cases such as:

BB#0:
  cmp x0, #1
  b.eq .LBB0_1
.LBB0_1:
  orr x0, xzr, #0x1  ; <-- redundant copy; x0 known to hold #1.

Differential Revision: https://reviews.llvm.org/D29344

llvm-svn: 296809
2017-03-02 20:48:11 +00:00
Vadzim Dambrouski eafb805506 [MSP430] Add SRet support to MSP430 target
This patch adds support for struct return values to the MSP430
target backend. It also reverses the order of argument and return
registers in the calling convention to bring it into closer
alignment with the published EABI from TI.

Patch by Andrew Wygle (awygle).

Differential Revision: https://reviews.llvm.org/D29069

llvm-svn: 296807
2017-03-02 20:25:10 +00:00
Simon Pilgrim b3067dc374 [X86][MMX] Fixed i32 extraction on 32-bit targets
MMX extraction often ends up as extract_i32(bitcast_v2i32(extract_i64(bitcast_v1i64(x86mmx v), 0)), 0) which fails to simplify on 32-bit targets as i64 isn't legal

llvm-svn: 296782
2017-03-02 18:56:06 +00:00
Krzysztof Parzyszek 056c945a5d [Hexagon] Skip blocks that define vector predicate registers in early-if
llvm-svn: 296777
2017-03-02 18:10:59 +00:00
Krzysztof Parzyszek fcbb7d10fe [Hexagon] Properly handle 'q' constraint in 128-byte vector mode
llvm-svn: 296772
2017-03-02 17:50:24 +00:00
Nemanja Ivanovic db8425eff0 [PowerPC][ELFv2ABI] Allocate parameter area on-demand to reduce stack frame size
This patch reduces the stack frame size by not allocating the parameter area if
it is not required. In the current implementation LowerFormalArguments_64SVR4
already handles the parameter area, but LowerCall_64SVR4 does not
(when calculating the stack frame size). What this patch does is make
LowerCall_64SVR4 consistent with LowerFormalArguments_64SVR4.

Committing on behalf of Hiroshi Inoue.

Differential Revision: https://reviews.llvm.org/D29881

llvm-svn: 296771
2017-03-02 17:38:59 +00:00
Sanjay Patel fffa179837 [DAGCombiner] avoid assertion when folding binops with opaque constants
This bug was introduced with:
https://reviews.llvm.org/rL296699

There may be a way to loosen the restriction, but for now just bail out
on any opaque constant.

The tests show that opacity is target-specific. This goes back to cost
calculations in ConstantHoisting based on TTI->getIntImmCost().

llvm-svn: 296768
2017-03-02 17:18:56 +00:00
Tim Northover e80d6d1360 GlobalISel: record correct stack usage for signext parameters.
The CallingConv.td rules allocate 8 bytes for these kinds of arguments
on AAPCS targets, but we were only recording the smaller amount. The
difference is theoretical on AArch64 because we don't actually store
more than the smaller amount, but it's still much better to have these
two components in agreement.

Based on Diana Picus's ARM equivalent patch (where it matters a lot
more).

llvm-svn: 296754
2017-03-02 15:34:18 +00:00
Andrew V. Tischenko 2855dc7ddc Added special test covering a problem with PIC relocation model on SLM architecture. The fix will come in D26855.
llvm-svn: 296746
2017-03-02 13:47:03 +00:00
Serge Pavlov e2bf69715f Do not verify MachimeDominatorTree if it is not calculated
If dominator tree is not calculated or is invalidated, set corresponding
pointer in the pass state to nullptr. Such pointer value will indicate
that operations with dominator tree are not allowed. In particular, it
allows to skip verification for such pass state. The dominator tree is
not calculated if the machine dominator pass was skipped, it occures in
the case of entities with linkage available_externally.

The change fixes some test fails observed when expensive checks
are enabled.

Differential Revision: https://reviews.llvm.org/D29280

llvm-svn: 296742
2017-03-02 12:00:10 +00:00
Matthias Braun dbcf9e2ee4 LiveRegMatrix: Fix some subreg interference checks
Surprisingly, one of the three interference checks in LiveRegMatrix was
using the main live range instead of the apropriate subregister range
resulting in unnecessarily conservative results.

llvm-svn: 296722
2017-03-02 00:35:08 +00:00
Eli Friedman 933863ce61 Revert r296708; causing test failures on ARM hosts.
Original commit message:

[ARM] Fix insert point for store rescheduling.
    
In ARMPreAllocLoadStoreOpt::RescheduleOps, LastOp should be the last
operation which we want to merge. If we break out of the loop because
an operation has the wrong offset, we shouldn't use that operation as
LastOp.
    
This patch fixes some cases where we would sink stores for no reason.

llvm-svn: 296718
2017-03-02 00:08:50 +00:00
Amaury Sechet 71f511fd1e [DAGCombiner] mulhi + 1 never overflow.
Summary:
This can be used to optimize large multiplications after legalization.

Depends on D29565

Reviewers: mkuper, spatel, RKSimon, zvi, bkramer, aaboud, craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29587

llvm-svn: 296711
2017-03-01 23:44:17 +00:00
Ahmed Bougacha 120ae22d70 [GlobalISel] Add a way for targets to enable GISel.
Until now, we've had to use -global-isel to enable GISel.  But using
that on other targets that don't support it will result in an abort, as we
can't build a full pipeline.
Additionally, we want to experiment with enabling GISel by default for
some targets: we can't just enable GISel by default, even among those
target that do have some support, because the level of support varies.

This first step adds an override for the target to explicitly define its
level of support.  For AArch64, do that using
a new command-line option (I know..):
  -aarch64-enable-global-isel-at-O=<N>
Where N is the opt-level below which GISel should be used.

Default that to -1, so that we still don't enable GISel anywhere.
We're not there yet!

While there, remove a couple LLVM_UNLIKELYs.  Building the pipeline is
such a cold path that in practice that shouldn't matter at all.

llvm-svn: 296710
2017-03-01 23:33:08 +00:00
Amaury Sechet 683f5743f6 Improve mulhi overflow test. NFC
llvm-svn: 296709
2017-03-01 23:31:19 +00:00
Eli Friedman 1c9216b003 [ARM] Fix insert point for store rescheduling.
In ARMPreAllocLoadStoreOpt::RescheduleOps, LastOp should be the last
operation which we want to merge. If we break out of the loop because
an operation has the wrong offset, we shouldn't use that operation as
LastOp.
    
This patch fixes some cases where we would sink stores for no reason.
    
Differential Revision: https://reviews.llvm.org/D30124

llvm-svn: 296708
2017-03-01 23:20:29 +00:00
Eli Friedman 28c2c0e311 [ARM] Check correct instructions for load/store rescheduling.
This code starts from the high end of the sorted vector of offsets, and
works backwards: it tries to find contiguous offsets, process them, then
pops them from the end of the vector. Most of the code agrees with this
order of processing, but one loop doesn't: it instead processes elements
from the low end of the vector (which are nodes with unrelated offsets).
Fix that loop to process the correct elements.
    
This has a few implications. One, we don't incorrectly return early when
processing multiple groups of offsets in the same block (which allows
rescheduling prera-ldst-insertpt.mir). Two, we pick the correct insert
point for loads, so they're correctly sorted (which affects the
scheduling of vldm-liveness.ll). I think it might also impact some of
the heuristics slightly.
    
Differential Revision: https://reviews.llvm.org/D30368

llvm-svn: 296701
2017-03-01 22:56:20 +00:00
Sanjay Patel 92938657a0 [DAGCombiner] fold binops with constant into select-of-constants
This is part of the ongoing attempt to improve select codegen for all targets and select 
canonicalization in IR (see D24480 for more background). The transform is a subset of what
is done in InstCombine's FoldOpIntoSelect().

I first noticed a regression in the x86 avx512-insert-extract.ll tests with a patch that 
hopes to convert more selects to basic math ops. This appears to be a general missing DAG
transform though, so I added tests for all standard binops in rL296621 
(PowerPC was chosen semi-randomly; it has scripted FileCheck support, but so do ARM and x86).

The poor output for "sel_constants_shl_constant" is tracked with:
https://bugs.llvm.org/show_bug.cgi?id=32105

Differential Revision: https://reviews.llvm.org/D30502

llvm-svn: 296699
2017-03-01 22:51:31 +00:00
Amaury Sechet 250b4a7491 Add test case for mulhi's overflow. NFC
llvm-svn: 296696
2017-03-01 22:27:21 +00:00
Reid Kleckner f7c0980c10 Elide argument copies during instruction selection
Summary:
Avoids tons of prologue boilerplate when arguments are passed in memory
and left in memory. This can happen in a debug build or in a release
build when an argument alloca is escaped.  This will dramatically affect
the code size of x86 debug builds, because X86 fast isel doesn't handle
arguments passed in memory at all. It only handles the x86_64 case of up
to 6 basic register parameters.

This is implemented by analyzing the entry block before ISel to identify
copy elision candidates. A copy elision candidate is an argument that is
used to fully initialize an alloca before any other possibly escaping
uses of that alloca. If an argument is a copy elision candidate, we set
a flag on the InputArg. If the the target generates loads from a fixed
stack object that matches the size and alignment requirements of the
alloca, the SelectionDAG builder will delete the stack object created
for the alloca and replace it with the fixed stack object. The load is
left behind to satisfy any remaining uses of the argument value. The
store is now dead and is therefore elided. The fixed stack object is
also marked as mutable, as it may now be modified by the user, and it
would be invalid to rematerialize the initial load from it.

Supersedes D28388

Fixes PR26328

Reviewers: chandlerc, MatzeB, qcolombet, inglorion, hans

Subscribers: igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D29668

llvm-svn: 296683
2017-03-01 21:42:00 +00:00
Sanjay Patel f8edc3e870 [x86] add vector tests for more coverage of D30502; NFC
llvm-svn: 296671
2017-03-01 20:31:23 +00:00
Nemanja Ivanovic b223cfabcc Improve scheduling with branch coalescing
This patch adds a MachineSSA pass that coalesces blocks that branch
on the same condition.

Committing on behalf of Lei Huang.

Differential Revision: https://reviews.llvm.org/D28249

llvm-svn: 296670
2017-03-01 20:29:34 +00:00
Nirav Dave 0a4703b5ec [DAG] Prevent Stale nodes from entering worklist
Add check that deleted nodes do not get added to worklist. This can
occur when a node's operand is simplified to an existing node.

This fixes PR32108.

Reviewers: jyknight, hfinkel, chandlerc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30506

llvm-svn: 296668
2017-03-01 20:19:38 +00:00
Nirav Dave 3de7fce3ac Add test cases for merging stores of multiply used stores
llvm-svn: 296667
2017-03-01 20:18:14 +00:00
Artur Pilipenko e1b2d31468 [DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combine
Resubmit r295336 after the bug with non-zero offset patterns on BE targets is fixed (r296336).

Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters.

Reviewed By: filcab

Differential Revision: https://reviews.llvm.org/D29591

llvm-svn: 296651
2017-03-01 18:12:29 +00:00
Krzysztof Parzyszek 8f23dd6d68 [Hexagon] Fix lowering of formal arguments of type i1
On Hexagon, values of type i1 are passed in registers of type i32,
even though i1 is not a legal value for these registers. This is a
special case and needs special handling to maintain consistency of
the lowering information.

This fixes PR32089.

llvm-svn: 296645
2017-03-01 17:30:10 +00:00
Diana Picus 9c52309b37 [ARM] GlobalISel: Lower call params that need extensions
Lower i1, i8 and i16 call parameters by extending them before storing them on
the stack. Also make sure we encode the correct, extended size in the
corresponding memory operand, and that we compute the correct stack size in the
end.

The latter is a bit more complicated because we used to compute the stack size
in the getStackAddress method, based on the Size and Offset of the parameters.
However, if the last parameter is sign extended, we'd be using the wrong,
non-extended size, and we'd end up with a smaller stack than we need to hold the
extended value. Instead of hacking this up based on the value of Size in
getStackAddress, we move our stack size handling logic to assignArg, where we
have access to the CCState which knows everything we could possibly want to know
about the stack. This way we don't need to duplicate any knowledge or resort to
any ugly hacks.

On this same occasion, update the IRTranslator test to check the sizes of the
stores everywhere, not just for sign extended paramteres.

llvm-svn: 296631
2017-03-01 15:35:14 +00:00
Sanjay Patel 88a1b8b466 [x86] auto-generate checks; NFC
llvm-svn: 296629
2017-03-01 14:46:59 +00:00
Sanjay Patel f0496a6a5c [x86] regenerate checks; NFC
llvm-svn: 296628
2017-03-01 14:41:57 +00:00
Sanjay Patel ffc6943011 [PPC] add tests for select-of-constants with binop; NFC
llvm-svn: 296621
2017-03-01 14:26:49 +00:00
Matt Arsenault 103af90034 AMDGPU: Re-do update for branch-relaxation test
Modify the test so that it is still testing something
closer to what it was intended to originally.

I think the original intent was to test the situation where
there was a branch on execz and then unconditional branch
required relaxing.With the change in r296539,
there was no longer and execz branch.

Change the test so that there is now an execz branch inserted.
There is no longer an unconditional branch after the execz branch,
so this might need to be tricked in some other way to keep that
there.

llvm-svn: 296574
2017-03-01 03:36:04 +00:00
Daniel Berlin 65f8cf945d Only run the overloaded-intrinsic-name.ll test once, with FileCheck.
llvm-svn: 296564
2017-03-01 01:56:41 +00:00
Daniel Berlin 3f91004ce7 Keep attributes, calling convention, etc, when remangling intrinsic
Summary: Fix issue reported where intrinsic calling convention is dropped after r295253.

Reviewers: sanjoy

Subscribers: materi, llvm-commits

Differential Revision: https://reviews.llvm.org/D30422

llvm-svn: 296563
2017-03-01 01:49:13 +00:00
Ahmed Bougacha 20b3e9a835 [CodeGen] Remove dead FastISel code after SDAG emitted a tailcall.
When SDAGISel (top-down) selects a tail-call, it skips the remainder
of the block.

If, before that, FastISel (bottom-up) selected some of the (no-op) next
few instructions, we can end up with dead instructions following the
terminator (selected by SDAGISel).

We need to erase them, as we know they aren't necessary (in addition to
being incorrect).

We already do this when FastISel falls back on the tail-call itself.
Also remove the FastISel-emitted code if we fallback on the
instructions between the tail-call and the return.

llvm-svn: 296552
2017-03-01 00:43:42 +00:00
Ahmed Bougacha 67d1c7c3c2 [GlobalISel] Replace all combined G_EXTRACT uses.
Iterating on the use-list we're modifying doesn't work: after the first
iteration, the use-list iterator will point to a MachineOperand
referencing the new register.  This caused us to skip the other uses to
replace.

Instead, use MRI.replaceRegWith(), which accounts for this behavior.

llvm-svn: 296551
2017-03-01 00:43:39 +00:00
Dan Gohman 7d7409e553 [WebAssembly] Convert the remaining unit tests to the new wasm-object-file target.
To facilitate this, add a new hidden command-line option to disable
the explicit-locals pass. That causes llc to emit invalid code that doesn't
have all locals converted to get_local/set_local, however it simplifies
testwriting in many cases.

llvm-svn: 296540
2017-02-28 23:37:04 +00:00
Daniel Berlin 06f92e6dcb Update AMDGPU test branch-relaxation.ll for changes after post-dom fixes
llvm-svn: 296539
2017-02-28 23:35:24 +00:00
Eli Friedman 36795239f5 [ARM] Don't generate deprecated T1 STM.
This prevents generating stm r1!, {r0, r1} on Thumb1, where value
stored for r1 is UNKONWN.

Patch by Zhaoshi Zheng.

Differential Revision: https://reviews.llvm.org/D27910

llvm-svn: 296538
2017-02-28 23:32:55 +00:00
Krzysztof Parzyszek 33fd0bbbe8 [Hexagon] Generate extract instructions more aggressively
llvm-svn: 296537
2017-02-28 23:27:33 +00:00
Krzysztof Parzyszek f208681731 [Hexagon] Fix instruction selection for sign-extending i1 to i64
llvm-svn: 296532
2017-02-28 22:37:01 +00:00
Paul Robinson cddd60445e [DWARFv5] Emit new unit header format.
Requesting DWARF v5 will now get you the new compile-unit and
type-unit headers.  llvm-dwarfdump will also recognize them.

Differential Revision: http://reviews.llvm.org/D30206

llvm-svn: 296514
2017-02-28 20:24:55 +00:00
Sanjay Patel 74ca880749 [x86] add alternate IR tests for select of constants; NFC
llvm-svn: 296496
2017-02-28 18:02:38 +00:00
David Bozier 3246aecb42 Fix issue with test case. Make test x86_64 specific
llvm-svn: 296492
2017-02-28 17:25:38 +00:00
Craig Topper 419f145ebb [DAGISel] When checking if chain node is foldable, make sure the intermediate nodes have a single use across all results not just the result that was used to reach the chain node.
This recovers a test case that was severely broken by r296476, my making sure we don't create ADD/ADC that loads and stores when there is also a flag dependency.

llvm-svn: 296486
2017-02-28 16:52:05 +00:00
David Bozier 5159968786 [Stack Protection] Add diagnostic information for why stack protection was applied to a function
Stack Smash Protection is not completely free, so in hot code, the overhead it causes can cause performance issues. By adding diagnostic information for which functions have SSP and why, a user can quickly determine what they can do to stop SSP being applied to a specific hot function.

This change adds a remark that is reported by the stack protection code when an instruction or attribute is encountered that causes SSP to be applied.

Patch by: James Henderson

Differential Revision: https://reviews.llvm.org/D29023

llvm-svn: 296483
2017-02-28 16:02:37 +00:00
Nirav Dave f830dec3f2 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 296476
2017-02-28 14:24:15 +00:00
Diana Picus 1ffca2aeaf [ARM] GlobalISel: Lower i32 and fp call parameters on the stack
Lower i32, float and double parameters that need to live on the stack. This
boils down to creating some G_GEPs starting from the stack pointer and storing
the values there. During the process we also keep track of the stack size and
use the final value in the ADJCALLSTACKDOWN/UP instructions.

We currently assert for smaller types, since they usually require extensions.
They will be handled in a separate patch.

llvm-svn: 296473
2017-02-28 14:17:53 +00:00
Diana Picus 5a7203a0af [ARM] GlobalISel: Select 32-bit G_CONSTANT
Put it into a register by means of a MOVi.

llvm-svn: 296471
2017-02-28 13:05:42 +00:00
Diana Picus 5b8514559e [ARM] GlobalISel: Add mapping for G_CONSTANT
Like G_FRAME_INDEX, G_CONSTANT has one register operand and one non-register
operand.

llvm-svn: 296469
2017-02-28 12:13:58 +00:00
Diana Picus e6beac6742 [ARM] GlobalISel: Legalize 32-bit constants
llvm-svn: 296468
2017-02-28 11:33:46 +00:00
Diana Picus 9d07094913 [ARM] GlobalISel: Select G_GEP
At this point, G_GEP is just an add, so we treat it exactly like a G_ADD.

llvm-svn: 296462
2017-02-28 10:14:38 +00:00
Diana Picus 566a15d749 [ARM] GlobalISel: Add reg bank mapping for G_GEP
This should be the same as the mapping for G_ADD etc.

llvm-svn: 296455
2017-02-28 09:35:10 +00:00
Diana Picus 8598b17076 [ARM] GlobalISel: Legalize G_GEP with 32-bit offsets
At the moment we're only interested in GEPs for putting call parameters on the
stack, so we'll stick to 32-bit offsets.

llvm-svn: 296452
2017-02-28 09:02:42 +00:00
Artyom Skrobov 24a593fd20 Relate the CHECK: lines to the functions that they're checking [NFC]
llvm-svn: 296450
2017-02-28 08:58:40 +00:00
Sanjoy Das eef785c1a5 [ImplicitNullCheck] Add alias analysis usage
Summary:
With this change ImplicitNullCheck optimization uses alias analysis
and can use load/store memory access for implicit null check if there
are other load/store before but memory accesses do not alias.

Patch by Serguei Katkov!

Reviewers: sanjoy

Reviewed By: sanjoy

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30331

llvm-svn: 296440
2017-02-28 07:04:49 +00:00
Matthias Braun 81f68ec3a9 Revert "Add MIR-level outlining pass"
Revert Machine Outliner for now, as it breaks the asan bot.

This reverts commit r296418.

llvm-svn: 296426
2017-02-28 02:24:30 +00:00
Amaury Sechet 428f8f5a27 Add test case for usubo combine. NFC.
llvm-svn: 296420
2017-02-28 01:16:39 +00:00
Matthias Braun d36410945f Add MIR-level outlining pass
This is a patch for the outliner described in the RFC at:
http://lists.llvm.org/pipermail/llvm-dev/2016-August/104170.html

The outliner is a code-size reduction pass which works by finding
repeated sequences of instructions in a program, and replacing them with
calls to functions. This is useful to people working in low-memory
environments, where sacrificing performance for space is acceptable.

This adds an interprocedural outliner directly before printing assembly.
For reference on how this would work, this patch also includes X86
target hooks and an X86 test.

The outliner is run like so:

clang -mno-red-zone -mllvm -enable-machine-outliner file.c

Patch by Jessica Paquette<jpaquette@apple.com>!

rdar://29166825

Differential Revision: https://reviews.llvm.org/D26872

llvm-svn: 296418
2017-02-28 00:33:32 +00:00
Amaury Sechet 5a605fc6c6 Add test case for computing known bits of substraction operations. NFC
llvm-svn: 296417
2017-02-28 00:15:13 +00:00
Michael Kuperstein 13bf8a2684 [CGP] Split some critical edges coming out of indirect branches
Splitting critical edges when one of the source edges is an indirectbr
is hard in general (because it requires changing the memory the indirectbr
reads). But if a block only has a single indirectbr predecessor (which is
the common case), we can simulate splitting that edge by splitting
the destination block, and retargeting the *direct* branches.

This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame()
ends up using an indirect branch with ~100 successors, and passing a constant to
each of those. Since MachineSink can't break indirect critical edges on demand
(and doing this in MIR doesn't look feasible), this causes us to emit about ~100
defs of registers containing constants, which we in the predecessor block, where
only one of those constants is used in each successor. So, at each computed goto,
we needlessly spill about a 100 constants to stack. The end result is that a
clang-compiled python interpreter can be about ~2.5x slower on a simple python
reduction loop than a gcc-compiled interpreter.

Differential Revision: https://reviews.llvm.org/D29916

llvm-svn: 296416
2017-02-28 00:11:34 +00:00
Matt Arsenault 10268f93e8 AMDGPU: Use v_med3_{f16|i16|u16}
llvm-svn: 296401
2017-02-27 22:40:39 +00:00
Matt Arsenault eb522e68bc AMDGPU: Support v2i16/v2f16 packed operations
llvm-svn: 296396
2017-02-27 22:15:25 +00:00
Arnold Schwaighofer b2605f31ed ISel: We need to notify FastIS of the IMPLICIT_DEF we created in createSwiftErrorEntriesInEntryBlock
Otherwise, it will insert instructions before it.

rdar://30536186

llvm-svn: 296395
2017-02-27 22:12:06 +00:00
Sanjay Patel ae7873fe55 [ARM] don't transform an add(ext Cond), C to select unless there's a setcc of the condition
The transform in question claims to be doing:

// fold (add (select cc, 0, c), x) -> (select cc, x, (add, x, c))

...starting in PerformADDCombineWithOperands(), but it wasn't actually checking for a setcc node
for the sext/zext patterns.

This is exactly the opposite of a transform I'd like to add to DAGCombiner's foldSelectOfConstants(),
so I was seeing infinite loops with my draft of a patch applied.

The changes in select_const.ll look positive (less instructions). The change in arm-and-tst-peephole.ll
is unrelated. We're changing the input IR in that test to preserve the intent of the test, but that's 
not affected by this code change.

Differential Revision:
https://reviews.llvm.org/D30355

llvm-svn: 296389
2017-02-27 21:30:54 +00:00
Simon Pilgrim 5c4efcdddf [X86][SSE] Attempt to extract vector elements through target shuffles
DAGCombiner already supports peeking thorough shuffles to improve vector element extraction, but legalization often leaves us in situations where we need to extract vector elements after shuffles have already been lowered.

This patch adds support for VECTOR_EXTRACT_ELEMENT/PEXTRW/PEXTRB instructions to attempt to handle target shuffles as well. I've covered some basic scenarios including handling shuffle mask scaling and the implicit zero-extension of PEXTRW/PEXTRB, there is more that could be done here (that I've mentioned in TODOs) but I haven't found many cases where its worth it.

Differential Revision: https://reviews.llvm.org/D30176

llvm-svn: 296381
2017-02-27 21:01:57 +00:00
Matt Arsenault 7596f13d15 AMDGPU: Support inlineasm for packed instructions
Add packed types as legal so they may be used with inlineasm.
Keep all operations expanded for now.

llvm-svn: 296379
2017-02-27 20:52:10 +00:00
Matt Arsenault 2ed2193218 AMDGPU: Don't fold immediate if clamp/omod are set
Doesn't fix any practical problems because clamp/omod
are currently folded after peephole optimizer.

llvm-svn: 296375
2017-02-27 20:21:31 +00:00
Matt Arsenault 3cb390498e AMDGPU: Fold omod into instructions
llvm-svn: 296372
2017-02-27 19:35:42 +00:00
Taewook Oh a49eb8578a [TailDuplicator] Maintain DebugLoc for branch instructions
Summary: Existing implementation of duplicateSimpleBB function drops DebugLoc metadata of branch instructions during the transformation. This patch addresses this issue by making newly created branch instructions to keep the metadata of replaced branch instructions.

Reviewers: qcolombet, craig.topper, aprantl, MatzeB, sanjoy, dblaikie

Reviewed By: dblaikie

Subscribers: dblaikie, llvm-commits

Differential Revision: https://reviews.llvm.org/D30026

llvm-svn: 296371
2017-02-27 19:30:01 +00:00
Matt Arsenault e2d1d3a940 AMDGPU: Add f16 to shader calling conventions
Mostly useful for writing tests for f16 features.

llvm-svn: 296370
2017-02-27 19:24:47 +00:00
Amaury Sechet 10c7fb4187 Refactor xaluo.ll and xmulo.ll tests. NFC
llvm-svn: 296367
2017-02-27 18:32:54 +00:00
Amaury Sechet 00bf6f52dc Remove an empty line in icmp-illegal.ll . NFC
llvm-svn: 296350
2017-02-27 16:09:44 +00:00
Artur Pilipenko f7196c8d9e [DAGCombine] Fix for a load combine bug with non-zero offset patterns on BE targets
This pattern is essentially a i16 load from p+1 address:

  %p1.i16 = bitcast i8* %p to i16*
  %p2.i8 = getelementptr i8, i8* %p, i64 2
  %v1 = load i16, i16* %p1.i16
  %v2.i8 = load i8, i8* %p2.i8
  %v2 = zext i8 %v2.i8 to i16
  %v1.shl = shl i16 %v1, 8
  %res = or i16 %v1.shl, %v2

Current implementation would identify %v1 load as the first byte load and would mistakenly emit a i16 load from %p1.i16 address. This patch adds a check that the first byte is loaded from a non-zero offset of the first load address. This way this address can be used as the base address for the combined value. Otherwise just give up combining.

llvm-svn: 296336
2017-02-27 13:04:23 +00:00
Amaury Sechet 681472cd0f Do full codegen for various tests. NFC
llvm-svn: 296305
2017-02-27 01:15:57 +00:00
Daniel Jasper 3ca4525612 Revert "[CGP] Split some critical edges coming out of indirect branches"
This reverts commit r296149 as it leads to crashes when compiling for
PPC.

llvm-svn: 296295
2017-02-26 11:09:12 +00:00
Craig Topper 6028584d8c [X86] Fix execution domain for cmpss/sd instructions.
llvm-svn: 296293
2017-02-26 06:45:59 +00:00
Craig Topper e70231be51 [AVX-512] Fix execution domain for vmovhpd/lpd/hps/lps.
llvm-svn: 296291
2017-02-26 06:45:54 +00:00
Craig Topper fe25988c68 [AVX-512] Fix the execution domain for AVX-512 integer broadcasts.
llvm-svn: 296290
2017-02-26 06:45:51 +00:00
Craig Topper 6bf9b809ce [AVX-512] Fix execution domain for VPMADD52 instructions.
llvm-svn: 296288
2017-02-26 06:45:45 +00:00
Craig Topper 9ef7e44d2f [AVX-512] Use update_llc_test_checks.py to regenerate a test.
llvm-svn: 296287
2017-02-26 06:45:43 +00:00
Craig Topper ed64904c74 [X86] Fix the execution domain for scalar SQRT intrinsic instruction.
llvm-svn: 296284
2017-02-26 06:45:35 +00:00