Commit Graph

44558 Commits

Author SHA1 Message Date
Amaury Sechet 153911f71d [DAGCombine] (add X, (addcarry Y, 0, Carry)) -> (addcarry X, Y, Carry)
Summary: Common pattern when legalizing large integers operations. Similar to D32687, when the carry isn't used.

Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer

Differential Revision: https://reviews.llvm.org/D32738

llvm-svn: 301919
2017-05-02 13:34:25 +00:00
Simon Pilgrim 6615bde93b [X86][SSE] Add test for PR30264 (combining multiple constants inputs in a shuffle)
llvm-svn: 301915
2017-05-02 12:25:17 +00:00
Simon Pilgrim 89ad89cc73 [SelectionDAG] Improve support for promotion of <1 x fX> floating point argument types (PR31088)
PR31088 demonstrated that we were assuming that only integers require promotion from <1 x iX> types, when in fact float types may require it as well - in this case half floats.

This patch adds support for extension/truncation for both integer and float types.

Differential Revision: https://reviews.llvm.org/D32391

llvm-svn: 301910
2017-05-02 10:33:08 +00:00
Simon Pilgrim 8deb87a6c0 [DAGCombiner] Improve MatchBswapHword logic (PR31357)
The existing code only looks at half of the tree when matching bswap + rol patterns ending in an OR tree (as opposed to a cascade).

Patch originally introduced by Jim Lewis.

Submitted on the behalf of Dinar Temirbulatov.

Differential Revision: https://reviews.llvm.org/D32039

llvm-svn: 301907
2017-05-02 10:16:19 +00:00
Xinliang David Li 6133846be1 [PartialInlining] Hook up inline cost analysis
Differential Revision: http://reviews.llvm.org/D32666

llvm-svn: 301894
2017-05-02 02:44:14 +00:00
Dylan McKay 28355efdad [AVR] Save/restore the frame pointer for all functions
A recent commit I made made it so that we only did this for signal or
interrupt handlers. This broke normal functions.

llvm-svn: 301893
2017-05-02 01:57:48 +00:00
Nemanja Ivanovic b89c27f515 [PowerPC] Emit VMX loads/stores for aligned ops to avoid adding swaps on LE
Fixes PR30730.
This is a re-commit of a pulled commit. The commit was pulled because some
software projects contained uses of Altivec vectors that violated alignment
requirements. Known issues have now been fixed.

Committing on behalf of Lei Huang.

Differential Revision: https://reviews.llvm.org/D26861

llvm-svn: 301892
2017-05-02 01:47:34 +00:00
Ahmed Bougacha 899a75cefe [AArch64] armv8-A doesn't have LSE.
r288279 mistakenly added it to all arches, but it's only available
from v8.1 onwards.

The testcase is awkward, because (I suspect) of PR32873.

Spotted by inspection.

llvm-svn: 301890
2017-05-02 00:45:01 +00:00
George Burgess IV 7bc507a2e8 Revert r301880
This change caused buildbot failures, apparently because we're not
passing around types that InstSimplify is used to seeing. I'm not overly
familiar with InstSimplify, so I'm reverting this until I can figure out
what exactly is wrong.

llvm-svn: 301885
2017-05-01 23:54:41 +00:00
Zachary Turner 8a2ebfb1cd [CodeView] Write CodeView line information.
Differential Revision: https://reviews.llvm.org/D32716

llvm-svn: 301882
2017-05-01 23:27:42 +00:00
George Burgess IV 6935aefdf0 [InstSimplify] Handle selects of GEPs with 0 offset
In particular (since it wouldn't fit nicely in the summary):
(select (icmp eq V 0) P (getelementptr P V)) -> (getelementptr P V)

Differential Revision: https://reviews.llvm.org/D31435

llvm-svn: 301880
2017-05-01 23:12:08 +00:00
Matthias Braun ab9438cb03 MachineFrameInfo: Track whether MaxCallFrameSize is computed yet; NFC
This tracks whether MaxCallFrameSize is computed yet. Ideally we would
assert and fail when the value is queried before it is computed, however
this fails various targets that need to be fixed first.

Differential Revision: https://reviews.llvm.org/D32570

llvm-svn: 301851
2017-05-01 22:32:25 +00:00
Davide Italiano 2dfd46bf08 [NewGVN] Don't derive incorrect implications.
In the testcase attached,  we believe %tmp1 implies %tmp4.
where:
  br i1 %tmp1, label %bb2, label %bb7
  br i1 %tmp4, label %bb5, label %bb7

because Wwhile looking at PredicateInfo stuffs we end up calling
isImpliedTrueByMatchingCmp() with the arguments backwards.

Differential Revision:  https://reviews.llvm.org/D32718

llvm-svn: 301849
2017-05-01 22:26:28 +00:00
Sanjay Patel 59d0aeaafe [InstCombine] check one-use before applying DeMorgan nor/nand folds
If we have ~(~X & Y), it only makes sense to transform it to (X | ~Y) when we do not need 
the intermediate (~X & Y) value. In that case, we would need an extra instruction to 
generate ~Y + 'or' (as shown in the test changes).

It's ok if we have multiple uses of ~X or Y, however. In those cases, we may not reduce the
instruction count or critical path, but we might improve throughput because we can generate 
~X and ~Y in parallel. Whether that actually makes perf sense or not for a target is something 
we can't answer in IR.

Differential Revision: https://reviews.llvm.org/D32703

llvm-svn: 301848
2017-05-01 22:25:42 +00:00
Peter Collingbourne c15d60b772 Object: Remove ModuleSummaryIndexObjectFile class.
Differential Revision: https://reviews.llvm.org/D32195

llvm-svn: 301832
2017-05-01 20:42:32 +00:00
Krzysztof Parzyszek 55db483a46 [Hexagon] Improving error reporting for writing to read only registers
Patch by Colin LeMahieu.

llvm-svn: 301828
2017-05-01 20:10:41 +00:00
Krzysztof Parzyszek e96d27a997 [Hexagon] Give better error messages for solo instruction errors
Patch by Colin LeMahieu.

llvm-svn: 301827
2017-05-01 20:06:01 +00:00
Xin Tong a4b9b9f42a Take indirect branch into account as well when folding.
We may not be able to rewrite indirect branch target, but we also want to take it into
account when folding, i.e. if it and all its successor's predecessors go to the same
destination, we can fold, i.e. no need to thread.

llvm-svn: 301816
2017-05-01 17:15:37 +00:00
Sanjoy Das ddebb703fc Use WeakVH instead of WeakTrackingVH in AliasSetTracker's UnkownInsts
In cases where an instruction (a call site, say) is RAUW'ed with some
other value (this is possible via the `returned` attribute, for
instance), we want the slot in UnknownInsts to point to the original
Instruction we wanted to track, not the value it got replaced by.

Fixes PR32587.

This relands r301426.

llvm-svn: 301814
2017-05-01 17:07:56 +00:00
Craig Topper 6b1b630a98 [SelectionDAG] Use known ones to provide a better bound for the known zeros for CTTZ/CTLZ operations.
This is the SelectionDAG version of D32521. If know where at least one 1 is located in the input to these intrinsics we can place an upper bound on the number of bits needed to represent the count and thus increase the number of known zeros in the output.

I think we can also refine this further for CTTZ_UNDEF/CTLZ_UNDEF by assuming that the answer will never be BitWidth. I've left this out for now because it caused other test failures across multiple targets. Usually because of turning ADD into OR based on this new information.

I'll fix CTPOP in a future patch.

Differential Revision: https://reviews.llvm.org/D32692

llvm-svn: 301806
2017-05-01 16:08:06 +00:00
Xin Tong 21f8ac235e [JumpThread] Do RAUW in case Cond folds to a constant in the CFG
Summary: [JumpThread] Do RAUW in case Cond folds to a constant in the CFG

Reviewers: sanjoy

Reviewed By: sanjoy

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32407

llvm-svn: 301804
2017-05-01 15:34:17 +00:00
Sanjay Patel d2f13b62d9 [InstCombine] add multi-use variants for DeMorgan folds; NFC
llvm-svn: 301802
2017-05-01 14:52:17 +00:00
Sanjay Patel c526fbcfa9 [InstCombine] use FileCheck and auto-generate checks; NFC
llvm-svn: 301801
2017-05-01 14:20:30 +00:00
Sanjay Patel 4e312203af [InstCombine] consolidate more DeMorgan tests; NFC
llvm-svn: 301800
2017-05-01 14:10:59 +00:00
Michael Zuckerman da4b52e4bf Fix test for altmacro
llvm-svn: 301799
2017-05-01 14:00:54 +00:00
Michael Zuckerman 56704618aa [LLVM][inline-asm] Altmacro absolute expression '%' feature
In this patch, I introduce a new alt macro feature.
This feature adds meaning for the % when using it as a prefix to the calling macro arguments.

In the altmacro mode, the percent sign '%' before an absolute expression convert the expression first to a string. 
As described in the https://sourceware.org/binutils/docs-2.27/as/Altmacro.html
"Expression results as strings
You can write `%expr' to evaluate the expression expr and use the result as a string."

expression assumptions:

1. '%' can only evaluate an absolute expression.
2. Altmacro '%' must be the first character of the evaluated expression.
3. If no '%' is located before the expression, a regular module operation is expected.
4. The result of Absolute Expressions can be only integer.

Differential Revision: https://reviews.llvm.org/D32526

llvm-svn: 301797
2017-05-01 13:20:12 +00:00
Dylan McKay 59e7fe3da8 [AVR] Implement non-constant bit rotations
This lets us do bit rotations of variable amount.

llvm-svn: 301794
2017-05-01 09:48:55 +00:00
Igor Breger 4064dc76c5 [GlobalISel][X86] rename test file. NFC.
llvm-svn: 301793
2017-05-01 08:11:02 +00:00
Craig Topper c8b5693948 [X86] Add tests for opportunities to improve known bits for CTTZ and CTLZ.
llvm-svn: 301791
2017-05-01 06:33:17 +00:00
Igor Breger c08a783521 [GlobalISel][X86] G_SEXT/G_ZEXT support.
Reviewers: zvi, guyblank

Reviewed By: zvi

Subscribers: rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D32591

llvm-svn: 301790
2017-05-01 06:30:16 +00:00
Igor Breger a9edb88d46 [GlobalISel][X86] G_LOAD/G_STORE pointer selection support.
Summary: [GlobalISel][X86] G_LOAD/G_STORE pointer selection support.

Reviewers: zvi, guyblank

Reviewed By: zvi, guyblank

Subscribers: dberris, rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D32217

llvm-svn: 301788
2017-05-01 06:08:32 +00:00
Dylan McKay 2e8718bcbb [AVR] Fix a bug so that we now emit R_AVR_16 fixups with the correct offset
Before this, the LDS/STS instructions would have their opcodes
overwritten while linking.

llvm-svn: 301782
2017-04-30 23:33:52 +00:00
Sanjay Patel ad13826aea [DAGCombiner] shrink/widen a vselect to match its condition operand size (PR14657)
We discussed shrinking/widening of selects in IR in D26556, and I'll try to get back to that
patch eventually. But I'm hoping that this transform is less iffy in the DAG where we can check
legality of the select that we want to produce.

A few things to note:

1. We can't wait until after legalization and do this generically because (at least in the x86
   tests from PR14657), we'll have PACKSS and bitcasts in the pattern.
2. This might benefit more of the SSE codegen if we lifted the legal-or-custom requirement, but
   that requires a closer look to make sure we don't end up worse.
3. There's a 'vblendv' opportunity that we're missing that results in andn/and/or in some cases. 
   That should be fixed next.
4. I'm assuming that AVX1 offers the worst of all worlds wrt uneven ISA support with multiple 
   legal vector sizes, but if there are other targets like that, we should add more tests.
5. There's a codegen miracle in the multi-BB tests from PR14657 (the gcc auto-vectorization tests):
   despite IR that is terrible for the target, this patch allows us to generate the optimal loop
   code because something post-ISEL is hoisting the splat extends above the vector loops.

Differential Revision: https://reviews.llvm.org/D32620

llvm-svn: 301781
2017-04-30 22:44:51 +00:00
Sanjoy Das 08989c7ecd Rename isKnownNotFullPoison to programUndefinedIfPoison; NFC
Summary:
programUndefinedIfPoison makes more sense, given what the function
does; and I'm about to add a function with a name similar to
isKnownNotFullPoison (so do the rename to avoid confusion).

Reviewers: broune, majnemer, bjarke.roune

Reviewed By: broune

Subscribers: mcrosier, llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D30444

llvm-svn: 301776
2017-04-30 19:41:19 +00:00
Amaury Sechet 8ac81f3924 Do not legalize large add with addc/adde, introduce addcarry and do it with uaddo/addcarry
Summary: As per discution on how to get better codegen an large int legalization, it became clear that using a glue for the carry was preventing several desirable optimizations. Passing the carry down as a value allow for more flexibility.

Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer

Subscribers: igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D29872

llvm-svn: 301775
2017-04-30 19:24:09 +00:00
Sanjay Patel 0c6086f493 [InstCombine] consolidate tests for DeMorgan folds; NFC
I'm proposing to add tests and change behavior in D32665.

llvm-svn: 301774
2017-04-30 18:57:12 +00:00
Zvi Rackover 4086e13e0d InstructionSimplify: Simplify a shuffle with a undef mask to undef
Summary:
Following the discussion in pr32486, adding the simplification:
 shuffle %x, %y, undef -> undef

Reviewers: spatel, RKSimon, andreadb, davide

Reviewed By: spatel

Subscribers: jroelofs, davide, llvm-commits

Differential Revision: https://reviews.llvm.org/D32293

llvm-svn: 301764
2017-04-30 06:06:26 +00:00
Simon Atanasyan 3979f43813 [mips] Emit R_MICROMIPS_TLS_GOTTPREL relocation for %gottprel in case of microMIPS
In case of microMIPS mode %gottprel operator should emit microMIPS
relocation R_MICROMIPS_TLS_GOTTPREL, not R_MIPS_TLS_GOTTPREL.

Differential Revision: http://reviews.llvm.org/D32617

llvm-svn: 301763
2017-04-30 04:27:23 +00:00
Daniel Sanders 887a141d4d [globalisel][tablegen] Fix the test after silencing the unused variable warning in r301755.
llvm-svn: 301756
2017-04-29 19:46:27 +00:00
Daniel Sanders e9fdba39e0 [globalisel][tablegen] Compute available feature bits correctly.
Summary:
Predicate<> now has a field to indicate how often it must be recomputed.
Currently, there are two frequencies, per-module (RecomputePerFunction==0)
and per-function (RecomputePerFunction==1). Per-function predicates are
currently recomputed more frequently than necessary since the only predicate
in this category is cheap to test. Per-module predicates are now computed in
getSubtargetImpl() while per-function predicates are computed in selectImpl().

Tablegen now manages the PredicateBitset internally. It should only be
necessary to add the required includes.

Also fixed a problem revealed by the test case where
constrainSelectedInstRegOperands() would attempt to tie operands that
BuildMI had already tied.

Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar

Reviewed By: rovka

Subscribers: kristof.beyls, igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D32491

llvm-svn: 301750
2017-04-29 17:30:09 +00:00
Simon Pilgrim 694cb2c838 [X86][AVX] Added codegen tests for _mm256_zext* helper intrinsics (PR32839)
Not great codegen, especially as VEX moves support implicit zeroing of upper bits....

llvm-svn: 301748
2017-04-29 17:15:12 +00:00
Simon Pilgrim ac7f3e24d3 [X86][SSE] Add initial <2 x half> tests for PR31088
As discussed on D32391, test X86/X64 SSE2 and X64 F16C.

llvm-svn: 301744
2017-04-29 14:29:06 +00:00
Matt Arsenault 2a80369ae4 AMDGPU: Fix copies from physical registers in SIFixSGPRCopies
This would assert when there were multiple defs of
a physical register.

We just need to move all of the users of it.

llvm-svn: 301730
2017-04-29 01:26:34 +00:00
Zachary Turner 5b6e4e0aed [llvm-pdbdump] Abstract some of the YAML/Raw printing code.
There is a lot of duplicate code for printing line info between
YAML and the raw output printer.  This introduces a base class
that can be shared between the two, and makes some minor
cleanups in the process.

llvm-svn: 301728
2017-04-29 01:13:21 +00:00
Akira Hatanaka 6fdcb3c2ce [ObjCARC] Do not move a release between a call and a
retainAutoreleasedReturnValue that retains the returned value.

This commit fixes a bug in ARC optimizer where it moves a release
between a call and a retainAutoreleasedReturnValue, causing the returned
object to be released before the retainAutoreleasedReturnValue can
retain it.

This commit accomplishes that by doing a lookahead and checking whether
the call prevents the release from moving upwards. In the long term, we
should treat the region between the retainAutoreleasedReturnValue and
the call as a critical section and disallow moving anything there
(possibly using operand bundles).

rdar://problem/20449878

llvm-svn: 301724
2017-04-29 00:23:11 +00:00
Davide Italiano 534e314356 [LoopUnswitch] Don't remove instructions with side effects.
This fixes PR32818.

Differential Revision:  https://reviews.llvm.org/D32664

llvm-svn: 301722
2017-04-29 00:12:18 +00:00
Sanjay Patel c8ab6bb27d [InstCombine] add tests to show potentially bogus application of DeMorgan (NFC)
llvm-svn: 301714
2017-04-28 23:14:33 +00:00
Matt Arsenault e0f9e984fd InferAddressSpaces: Search constant expressions for addrspacecasts
These are pretty common when using local memory, and the 64-bit generic
addressing is much more expensive to compute.

llvm-svn: 301711
2017-04-28 22:52:41 +00:00
Adrian Prantl fed4f399d3 Remove line and file from DINamespace.
Fixes the issue highlighted in
http://lists.llvm.org/pipermail/cfe-dev/2014-June/037500.html.

The DW_AT_decl_file and DW_AT_decl_line attributes on namespaces can
prevent LLVM from uniquing types that are in the same namespace. They
also don't carry any meaningful information.

rdar://problem/17484998
Differential Revision: https://reviews.llvm.org/D32648

llvm-svn: 301706
2017-04-28 22:25:46 +00:00
Matt Arsenault a1e734050c InferAddressSpaces: Infer from just addrspacecasts
Eliminates some more cases where some subset of the addressing
computation remains flat. Some cases with addrspacecasts
in nested constant expressions are still left behind however.

llvm-svn: 301704
2017-04-28 22:18:08 +00:00
Krzysztof Parzyszek 072ddb383c [RDF] Correctly calculate lane masks for defs
llvm-svn: 301700
2017-04-28 21:57:53 +00:00
Krzysztof Parzyszek 2065a2f4e6 Properly handle PHIs with subregisters in UnreachableBlockElim
When a PHI operand has a subregister, create a COPY instead of simply
replacing the PHI output with the input it.

Differential Revision: https://reviews.llvm.org/D32650

llvm-svn: 301699
2017-04-28 21:56:33 +00:00
Krzysztof Parzyszek 0b3acbb1dd [Hexagon] Do not move a block if it is on a fall-through path
llvm-svn: 301698
2017-04-28 21:54:11 +00:00
Sam Clegg a06de02889 [WebAssembly] Add size of section header to data relocation offsets.
Also, add test for data relocations and fix addend to
be signed.

Subscribers: jfb, dschuff

Differential Revision: https://reviews.llvm.org/D32513

llvm-svn: 301690
2017-04-28 21:22:38 +00:00
Matt Arsenault cf5e7fe358 [ValueTracking] Teach isSafeToSpeculativelyExecute() about the speculatable attribute
Patch by Tom Stellard

llvm-svn: 301688
2017-04-28 21:13:09 +00:00
Sam Clegg ff0730b3fc [WebAssembly] Write initial memory in pages not bytes
Subscribers: jfb, dschuff

Differential Revision: https://reviews.llvm.org/D32660

llvm-svn: 301687
2017-04-28 21:12:09 +00:00
Matt Arsenault b19b57ea60 Add speculatable function attribute
This attribute tells the optimizer that the function may be speculated.

Patch by Tom Stellard

llvm-svn: 301680
2017-04-28 20:25:27 +00:00
Marek Olsak 2d82590f64 AMDGPU: Add new amdgcn.init.exec intrinsics
v2: More tests, bug fixes, cosmetic changes.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D31762

llvm-svn: 301677
2017-04-28 20:21:58 +00:00
Alexei Starovoitov f7bd5ebd3b [bpf] add bigendian support to disassembler
. swap 4-bit register encoding, 16-bit offset and 32-bit imm to support big endian archs
. add a test

Reported-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 301653
2017-04-28 16:51:01 +00:00
Jun Bum Lim 919f9e8d65 [InlineCost] Improve the cost heuristic for Switch
Summary:
The motivation example is like below which has 13 cases but only 2 distinct targets

```
lor.lhs.false2:                                   ; preds = %if.then
  switch i32 %Status, label %if.then27 [
    i32 -7012, label %if.end35
    i32 -10008, label %if.end35
    i32 -10016, label %if.end35
    i32 15000, label %if.end35
    i32 14013, label %if.end35
    i32 10114, label %if.end35
    i32 10107, label %if.end35
    i32 10105, label %if.end35
    i32 10013, label %if.end35
    i32 10011, label %if.end35
    i32 7008, label %if.end35
    i32 7007, label %if.end35
    i32 5002, label %if.end35
  ]
```
which is compiled into a balanced binary tree like this on AArch64 (similar on X86)

```
.LBB853_9:                              // %lor.lhs.false2
        mov     w8, #10012
        cmp             w19, w8
        b.gt    .LBB853_14
// BB#10:                               // %lor.lhs.false2
        mov     w8, #5001
        cmp             w19, w8
        b.gt    .LBB853_18
// BB#11:                               // %lor.lhs.false2
        mov     w8, #-10016
        cmp             w19, w8
        b.eq    .LBB853_23
// BB#12:                               // %lor.lhs.false2
        mov     w8, #-10008
        cmp             w19, w8
        b.eq    .LBB853_23
// BB#13:                               // %lor.lhs.false2
        mov     w8, #-7012
        cmp             w19, w8
        b.eq    .LBB853_23
        b       .LBB853_3
.LBB853_14:                             // %lor.lhs.false2
        mov     w8, #14012
        cmp             w19, w8
        b.gt    .LBB853_21
// BB#15:                               // %lor.lhs.false2
        mov     w8, #-10105
        add             w8, w19, w8
        cmp             w8, #9          // =9
        b.hi    .LBB853_17
// BB#16:                               // %lor.lhs.false2
        orr     w9, wzr, #0x1
        lsl     w8, w9, w8
        mov     w9, #517
        and             w8, w8, w9
        cbnz    w8, .LBB853_23
.LBB853_17:                             // %lor.lhs.false2
        mov     w8, #10013
        cmp             w19, w8
        b.eq    .LBB853_23
        b       .LBB853_3
.LBB853_18:                             // %lor.lhs.false2
        mov     w8, #-7007
        add             w8, w19, w8
        cmp             w8, #2          // =2
        b.lo    .LBB853_23
// BB#19:                               // %lor.lhs.false2
        mov     w8, #5002
        cmp             w19, w8
        b.eq    .LBB853_23
// BB#20:                               // %lor.lhs.false2
        mov     w8, #10011
        cmp             w19, w8
        b.eq    .LBB853_23
        b       .LBB853_3
.LBB853_21:                             // %lor.lhs.false2
        mov     w8, #14013
        cmp             w19, w8
        b.eq    .LBB853_23
// BB#22:                               // %lor.lhs.false2
        mov     w8, #15000
        cmp             w19, w8
        b.ne    .LBB853_3
```
However, the inline cost model estimates the cost to be linear with the number
of distinct targets and the cost of the above switch is just 2 InstrCosts.
The function containing this switch is then inlined about 900 times.

This change use the general way of switch lowering for the inline heuristic. It
etimate the number of case clusters with the suitability check for a jump table
or bit test. Considering the binary search tree built for the clusters, this
change modifies the model to be linear with the size of the balanced binary
tree. The model is off by default for now :
  -inline-generic-switch-cost=false

This change was originally proposed by Haicheng in D29870.

Reviewers: hans, bmakam, chandlerc, eraman, haicheng, mcrosier

Reviewed By: hans

Subscribers: joerg, aemerson, llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D31085

llvm-svn: 301649
2017-04-28 16:04:03 +00:00
Teresa Johnson 51177295c4 Memory intrinsic value profile optimization: Avoid divide by 0
Summary:
Skip memops if the total value profiled count is 0, we can't correctly
scale up the counts and there is no point anyway.

Reviewers: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32624

llvm-svn: 301645
2017-04-28 14:30:54 +00:00
Simon Pilgrim 7ae9419dc0 [DAGCombiner] Add ComputeNumSignBits vector demanded elements support to ASHR and INSERT_VECTOR_ELT (reapplied)
Reapplied r299221 after fix for nondeterminism in ThinLTO builder (rL301599), with extra check for implicit truncation of inserted element.

llvm-svn: 301644
2017-04-28 13:21:18 +00:00
Simon Pilgrim ec93334317 [X86][SSE] Added new tests from D32416 to show codegen delta
llvm-svn: 301641
2017-04-28 11:53:08 +00:00
Simon Pilgrim 04928fd021 [X86][SSE] Renames all ones test to better match type.
Added 8f32/4f64 optsize tests discussed on D32416

llvm-svn: 301639
2017-04-28 11:12:30 +00:00
Simon Pilgrim 67b1a79985 [X86][SSE] Add codegen test for _mm_set_pd1 (PR32827)
llvm-svn: 301638
2017-04-28 10:31:42 +00:00
Andrew Ng 03e35b6bc0 [DebugInfo][X86] Improve X86 Optimize LEAs handling of debug values.
This is a follow up to the fix in r298360 to improve the handling of debug
values when redundant LEAs are removed. The fix in r298360 effectively
discarded the debug values. This patch now attempts to preserve the debug
values by using the DWARF DW_OP_stack_value operation via prependDIExpr.

Moved functions appendOffset and prependDIExpr from Local.cpp to
DebugInfoMetadata.cpp and made them available as static member functions of
DIExpression.

Differential Revision: https://reviews.llvm.org/D31604

llvm-svn: 301630
2017-04-28 08:44:30 +00:00
Diana Picus 0674a3ce97 [ARM] GlobalISel: Tighten test. NFC
Explicitly check types and load sizes in the IRTranslator test.

llvm-svn: 301627
2017-04-28 07:50:47 +00:00
Max Kazantsev 531db9a504 [EarlyCSE] Mark the condition of assume intrinsic as true
EarlyCSE should not just ignore assumes. It should use the fact that its condition is true for all dominated instructions.

Reviewers: sanjoy, reames, apilipenko, anna, skatkov

Reviewed By: reames, sanjoy

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32482

llvm-svn: 301625
2017-04-28 06:25:39 +00:00
Max Kazantsev 0589d9fa0f [EarlyCSE] Remove guards with conditions known to be true
If a condition is calculated only once, and there are multiple guards on this condition, we should be able
to remove all guards dominated by the first of them. This patch allows EarlyCSE to try to find the condition
of a guard among the known values, and if it is true, remove the guard. Otherwise we keep the guard and
mark its condition as 'true' for future consideration.

Reviewers: sanjoy, reames, apilipenko, skatkov, anna, dberlin

Reviewed By: reames, sanjoy

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32476

llvm-svn: 301623
2017-04-28 06:05:48 +00:00
Sanjoy Das ba0daee6b2 [StackMaps] Increase the size of the "location size" field
Summary:
In some cases LLVM (especially the SLP vectorizer) will create vectors
that are 256 bytes (or larger).  Given that this is intentional[0] is
likely to get more common, this patch updates the StackMap binary
format to deal with the spill locations for said vectors.

This change also bumps the stack map version from 2 to 3.

[0]: https://reviews.llvm.org/D32533#738350

Reviewers: reames, kavon, skatkov, javed.absar

Subscribers: mcrosier, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D32629

llvm-svn: 301615
2017-04-28 04:48:42 +00:00
Saleem Abdulrasool 41d9ef3ced COFF Import: expose both symbols
COFF Import libraries which use the obsolete CONSTANT export are
supposed to get two symbols, one with the `_imp_` prefix and one
without.  Ensure that we expose both for iteration.  This is necessary
to fix the librarian with COFF CONSTANT exports.

llvm-svn: 301614
2017-04-28 04:29:43 +00:00
Zachary Turner 7159ab95c7 [llvm-pdbdump] Allow printing only a portion of a stream.
When dumping raw data from a stream, you might know the offset
of a certain record you're interested in, as well as how long
that record is.  Previously, you had to dump the entire stream
and wade through the bytes to find the interesting record.

This patch allows you to specify an offset and length on the
command line, and it will only dump the requested range.

llvm-svn: 301607
2017-04-28 00:43:38 +00:00
Sam Clegg 10545c9c24 [WebAssembly] Add some tests for wasm MC layer
Subscribers: jfb, dschuff

Differential Revision: https://reviews.llvm.org/D32558

llvm-svn: 301606
2017-04-28 00:36:36 +00:00
Sanjay Patel 73d8c43da8 [InstCombine] fix matcher to bind to specific operand (PR32830)
Matching any random value would be very wrong:
https://bugs.llvm.org/show_bug.cgi?id=32830

llvm-svn: 301594
2017-04-27 21:55:03 +00:00
Evgeniy Stepanov 964f4663c4 [asan] Fix dead stripping of globals on Linux.
Use a combination of !associated, comdat, @llvm.compiler.used and
custom sections to allow dead stripping of globals and their asan
metadata. Sometimes.

Currently this works on LLD, which supports SHF_LINK_ORDER with
sh_link pointing to the associated section.

This also works on BFD, which seems to treat comdats as
all-or-nothing with respect to linker GC. There is a weird quirk
where the "first" global in each link is never GC-ed because of the
section symbols.

At this moment it does not work on Gold (as in the globals are never
stripped).

This is a second re-land of r298158. This time, this feature is
limited to -fdata-sections builds.

llvm-svn: 301587
2017-04-27 20:27:27 +00:00
Evgeniy Stepanov 716f0ff222 [asan] Put ctor/dtor in comdat.
When possible, put ASan ctor/dtor in comdat.

The only reason not to is global registration, which can be
TU-specific. This is not the case when there are no instrumented
globals. This is also limited to ELF targets, because MachO does
not have comdat, and COFF linkers may GC comdat constructors.

The benefit of this is a lot less __asan_init() calls: one per DSO
instead of one per TU. It's also necessary for the upcoming
gc-sections-for-globals change on Linux, where multiple references to
section start symbols trigger quadratic behaviour in gold linker.

This is a second re-land of r298756. This time with a flag to disable
the whole thing to avoid a bug in the gold linker:
  https://sourceware.org/bugzilla/show_bug.cgi?id=19002

llvm-svn: 301586
2017-04-27 20:27:23 +00:00
Simon Pilgrim 9a08ad8abd [X86][SSE] Add tests for broadcast from larger vector loads
llvm-svn: 301583
2017-04-27 20:19:00 +00:00
Zachary Turner 8d6396d3b0 [llvm-readobj] Dump COFF Resources section.
This patch dumps the raw bytes of the .rsrc sections that
are present in COFF object and executable files.  Subsequent
patches will parse this information and dump in a more human
readable format.

Differential Revision: https://reviews.llvm.org/D32463
Patch By: Eric Beckmann

llvm-svn: 301578
2017-04-27 19:38:38 +00:00
Chandler Carruth 1353f9a48b [PM/LoopUnswitch] Introduce a new, simpler loop unswitch pass.
Currently, this pass only focuses on *trivial* loop unswitching. At that
reduced problem it remains significantly better than the current loop
unswitch:
- Old pass is worse than cubic complexity. New pass is (I think) linear.
- New pass is much simpler in its design by focusing on full unswitching. (See
  below for details on this).
- New pass doesn't carry state for thresholds between pass iterations.
- New pass doesn't carry state for correctness (both miscompile and
  infloop) between pass iterations.
- New pass produces substantially better code after unswitching.
- New pass can handle more trivial unswitch cases.
- New pass doesn't recompute the dominator tree for the entire function
  and instead incrementally updates it.

I've ported all of the trivial unswitching test cases from the old pass
to the new one to make sure that major functionality isn't lost in the
process. For several of the test cases I've worked to improve the
precision and rigor of the CHECKs, but for many I've just updated them
to handle the new IR produced.

My initial motivation was the fact that the old pass carried state in
very unreliable ways between pass iterations, and these mechansims were
incompatible with the new pass manager. However, I discovered many more
improvements to make along the way.

This pass makes two very significant assumptions that enable most of these
improvements:

1) Focus on *full* unswitching -- that is, completely removing whatever
   control flow construct is being unswitched from the loop. In the case
   of trivial unswitching, this means removing the trivial (exiting)
   edge. In non-trivial unswitching, this means removing the branch or
   switch itself. This is in opposition to *partial* unswitching where
   some part of the unswitched control flow remains in the loop. Partial
   unswitching only really applies to switches and to folded branches.
   These are very similar to full unrolling and partial unrolling. The
   full form is an effective canonicalization, the partial form needs
   a complex cost model, cannot be iterated, isn't canonicalizing, and
   should be a separate pass that runs very late (much like unrolling).

2) Leverage LLVM's Loop machinery to the fullest. The original unswitch
   dates from a time when a great deal of LLVM's loop infrastructure was
   missing, ineffective, and/or unreliable. As a consequence, a lot of
   complexity was added which we no longer need.

With these two overarching principles, I think we can build a fast and
effective unswitcher that fits in well in the new PM and in the
canonicalization pipeline. Some of the remaining functionality around
partial unswitching may not be relevant today (not many test cases or
benchmarks I can find) but if they are I'd like to add support for them
as a separate layer that runs very late in the pipeline.

Purely to make reviewing and introducing this code more manageable, I've
split this into first a trivial-unswitch-only pass and in the next patch
I'll add support for full non-trivial unswitching against a *fixed*
threshold, exactly like full unrolling. I even plan to re-use the
unrolling thresholds, as these are incredibly similar cost tradeoffs:
we're cloning a loop body in order to end up with simplified control
flow. We should only do that when the total growth is reasonably small.

One of the biggest changes with this pass compared to the previous one
is that previously, each individual trivial exiting edge from a switch
was unswitched separately as a branch. Now, we unswitch the entire
switch at once, with cases going to the various destinations. This lets
us unswitch multiple exiting edges in a single operation and also avoids
numerous extremely bad behaviors, where we would introduce 1000s of
branches to test for thousands of possible values, all of which would
take the exact same exit path bypassing the loop. Now we will use
a switch with 1000s of cases that can be efficiently lowered into
a jumptable. This avoids relying on somehow forming a switch out of the
branches or getting horrible code if that fails for any reason.

Another significant change is that this pass actively updates the CFG
based on unswitching. For trivial unswitching, this is actually very
easy because of the definition of loop simplified form. Doing this makes
the code coming out of loop unswitch dramatically more friendly. We
still should run loop-simplifycfg (at the least) after this to clean up,
but it will have to do a lot less work.

Finally, this pass makes much fewer attempts to simplify instructions
based on the unswitch. Something like loop-instsimplify, instcombine, or
GVN can be used to do increasingly powerful simplifications based on the
now dominating predicate. The old simplifications are things that
something like loop-instsimplify should get today or a very, very basic
loop-instcombine could get. Keeping that logic separate is a big
simplifying technique.

Most of the code in this pass that isn't in the old one has to do with
achieving specific goals:
- Updating the dominator tree as we go
- Unswitching all cases in a switch in a single step.

I think it is still shorter than just the trivial unswitching code in
the old pass despite having this functionality.

Differential Revision: https://reviews.llvm.org/D32409

llvm-svn: 301576
2017-04-27 18:45:20 +00:00
Eli Friedman 10ab923b32 [GlobalOpt] Correctly update metadata when localizing a global.
Just calling dropAllReferences leaves pointers to the ConstantExpr
behind, so we would eventually crash with a null pointer dereference.

Differential Revision: https://reviews.llvm.org/D32551

llvm-svn: 301575
2017-04-27 18:39:08 +00:00
Sanjoy Das 40c32dd9a0 Use a pointer type for target frame indices during statepoint lowering
Summary:
The type of the target frame index is intptr, not the type of the value we're
going to store into it.  Without this change we crash in the attached test case
when trying to type-legalize a TargetFrameIndex.

Patchpoint lowering types the target frame index as intptr as well.

Reviewers: reames, bogner, arsenm

Subscribers: arsenm, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D32256

llvm-svn: 301566
2017-04-27 17:17:16 +00:00
Xinliang David Li d21601a929 [PartialInlining]: Improve partial inlining to handle complex conditions
Differential Revision: http://reviews.llvm.org/D32249

llvm-svn: 301561
2017-04-27 16:34:00 +00:00
Sanjay Patel c3e00fcadd [x86] add minimal tests for potential size-changing vsel transforms; NFC
llvm-svn: 301554
2017-04-27 16:10:20 +00:00
Sam Kolton 5d99386b4d [AMDGPU] DPP: add support for GFX9
Reviewers: artem.tamazov

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D32588

llvm-svn: 301551
2017-04-27 15:42:38 +00:00
Zoran Jovanovic ffef3e3c6a [mips][microMIPS] Adding code size reduction pass for MicroMIPS
Author: milena.vujosevic.janicic
Reviewers: sdardis
The code implements size reduction pass for MicroMIPS.
Load and store instructions are examined and transformed, if possible.
lw32 instruction is transformed into 16-bit instruction lwsp
sw32 instruction is transformed into 16-bit instruction swsp
Arithmetic instrcutions are examined and transformed, if possible.
addu32 instruction is transformed into 16-bit instruction addu16
subu32 instruction is transformed into 16-bit instruction subu16
Differential Revision: https://reviews.llvm.org/D15144

llvm-svn: 301540
2017-04-27 13:10:48 +00:00
Diana Picus 4f46be327c [ARM] GlobalISel: Fix extended stack operands
Fix a crash when trying to extend a value passed as a sign- or
zero-extended stack parameter. The cause of the crash was that we were
setting the size of the loaded value to 32 bits, and then tyring to
extend again to 32 bits.

This patch addresses the issue by also introducing a G_TRUNC after the
load. This will leave the unused bits to their original values set by
the caller, while being consistent about the types. For values that are
not extended, we just use a smaller load.

llvm-svn: 301531
2017-04-27 10:23:30 +00:00
Andrew V. Tischenko 9108ae2b50 2 tests that were lost in rL301390
llvm-svn: 301529
2017-04-27 10:20:35 +00:00
George Rimar e6ef4488e1 [llvm-dwarfdump] - Change format for .gdb_index dump.
It is useful to output size of ranges when address ranges
section of .gdb_index is dumped.

It helps to compare outputs produced by different linkers,
for example. In that case address ranges can look very different,
when they are the same at fact. Difference comes from different 
low address because of different address of .text.

Differential revision: https://reviews.llvm.org/D32492

llvm-svn: 301527
2017-04-27 10:00:13 +00:00
Igor Breger 360d0f23ee [GlobalISel][X86] handle not symmetric G_COPY
Summary: handle not symmetric G_COPY

Reviewers: zvi, guyblank

Reviewed By: guyblank

Subscribers: rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D32420

llvm-svn: 301523
2017-04-27 08:02:03 +00:00
Konstantin Zhuravlyov 97a663b6a2 AMDGPU: Fix assert in scheduler
Assert is triggered if DBG_VALUE is first instruction in BB

Differential Revision: https://reviews.llvm.org/D32572

llvm-svn: 301511
2017-04-27 03:22:44 +00:00
Chandler Carruth c246a4c973 Disable GVN Hoist due to still more bugs being found in it. There is
also a discussion about exactly what we should do prior to re-enabling
it.

The current bug is http://llvm.org/PR32821 and the discussion about this
is in the review thread for r300200.

llvm-svn: 301505
2017-04-27 00:28:03 +00:00
Rui Ueyama 0fcbb2893e Revert r301487: Replace HashString algorithm with xxHash64
This reverts commit r301487 to make buildbots green.

llvm-svn: 301491
2017-04-26 23:15:10 +00:00
Adrian Prantl 1d12b885b0 Add support for DW_TAG_thrown_type.
For Swift we would like to be able to encode the error types that a
function may throw, so the debugger can display them alongside the
function's return value when finish-ing a function.

DWARF defines DW_TAG_thrown_type (intended to be used for C++ throw()
declarations) that is a perfect fit for this purpose. This patch wires
up support for DW_TAG_thrown_type in LLVM by adding a list of thrown
types to DISubprogram.

To offset the cost of the extra pointer, there is a follow-up patch
that turns DISubprogram into a variable-length node.

rdar://problem/29481673

Differential Revision: https://reviews.llvm.org/D32559

llvm-svn: 301489
2017-04-26 22:56:44 +00:00
Rui Ueyama 87b30ac9d3 Replace HashString algorithm with xxHash64
The previous algorithm processed one character at a time, which is very
painful on a modern CPU. Replace it with xxHash64, which both already
exists in the codebase and is fairly fast.

Patch from Scott Smith!

Differential Revision: https://reviews.llvm.org/D32509

llvm-svn: 301487
2017-04-26 22:45:04 +00:00
Sanjay Patel a0547c3d9f [DAGCombiner] add (sext i1 X), 1 --> zext (not i1 X)
Besides better codegen, the motivation is to be able to canonicalize this pattern 
in IR (currently we don't) knowing that the backend is prepared for that.

This may also allow removing code for special constant cases in 
DAGCombiner::foldSelectOfConstants() that was added in D30180.

Differential Revision: https://reviews.llvm.org/D31944

llvm-svn: 301457
2017-04-26 20:26:46 +00:00
Dmitry Preobrazhensky 43d297eb45 [AMDGPU][MC] Added arg checks for vmcnt, expcnt, lgkmcnt helpers
Summary of changes:
- corrected vmcnt, expcnt, lgkmcnt helpers to checks their argument for truncation;
- added saturated versions of these helpers.

See bug 32711 for details: https://bugs.llvm.org//show_bug.cgi?id=32711

Reviewers: artem.tamazov, vpykhtin

Differential Revision: https://reviews.llvm.org/D32546

llvm-svn: 301439
2017-04-26 17:55:50 +00:00
Peter Collingbourne fa58f7528e LTO: Mark undefined module asm symbols as used.
Marking them as used causes them to be considered visible outside of LTO. This
prevents the symbols from being internalized or discarded, either by GlobalDCE
or by summary-based dead stripping in ThinLTO.

This change makes it unnecessary to add these symbols to llvm.compiler.used
in the backend, as the symbols are kept alive by virtue of being external,
so remove the backend code that handles that.

Fixes PR32798.

Differential Revision: https://reviews.llvm.org/D32544

llvm-svn: 301438
2017-04-26 17:53:39 +00:00
Sanjoy Das 2cbeb00f38 Reverts commit r301424, r301425 and r301426
Commits were:

"Use WeakVH instead of WeakTrackingVH in AliasSetTracker's UnkownInsts"
"Add a new WeakVH value handle; NFC"
"Rename WeakVH to WeakTrackingVH; NFC"

The changes assumed pointers are 8 byte aligned on all architectures.

llvm-svn: 301429
2017-04-26 16:37:05 +00:00
Matthew Simpson 9eed0bee3d [LV] Handle external uses of floating-point induction variables
Reference: https://bugs.llvm.org/show_bug.cgi?id=32758
Differential Revision: https://reviews.llvm.org/D32445

llvm-svn: 301428
2017-04-26 16:23:02 +00:00
Sanjoy Das 8b32b81954 Use WeakVH instead of WeakTrackingVH in AliasSetTracker's UnkownInsts
Summary:
In cases where an instruction (a call site, say) is RAUW'ed with some
other value (this is possible via the `returned` attribute, amongst
other things), we want the slot in UnknownInsts to point to the
original Instruction we wanted to track, not the value it got replaced
by.

Fixes PR32587.

Reviewers: davide

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D32268

llvm-svn: 301426
2017-04-26 16:21:02 +00:00