Commit Graph

142098 Commits

Author SHA1 Message Date
Krzysztof Parzyszek 0ca1987977 Fix ubsan failures in lane mask shifts
llvm-svn: 289826
2016-12-15 16:08:49 +00:00
Simon Pilgrim d7518896ff [X86][SSE] Fix domains for VZEXT_LOAD type instructions
Add the missing domain equivalences for movss, movsd, movd and movq zero extending loading instructions.

Differential Revision: https://reviews.llvm.org/D27684

llvm-svn: 289825
2016-12-15 16:05:29 +00:00
Alexander Timofeev a57511c451 Fix for regression after Global Load Scalarization patch
llvm-svn: 289822
2016-12-15 15:17:19 +00:00
Krzysztof Parzyszek 91b5cf8412 Extract LaneBitmask into a separate type
Specifically avoid implicit conversions from/to integral types to
avoid potential errors when changing the underlying type. For example,
a typical initialization of a "full" mask was "LaneMask = ~0u", which
would result in a value of 0x00000000FFFFFFFF if the type was extended
to uint64_t.

Differential Revision: https://reviews.llvm.org/D27454

llvm-svn: 289820
2016-12-15 14:36:06 +00:00
Simon Pilgrim 2f7f0e7a48 [CostModel][X86] Updated reverse shuffle costs
llvm-svn: 289819
2016-12-15 14:24:07 +00:00
Alexey Bataev 4160264e30 [TEST] Initial commit of tests for minmax horizontal reductions.
llvm-svn: 289817
2016-12-15 13:21:29 +00:00
Alexey Bataev 2db6045b29 Revert "[TESTS] Initial commit of tests, by Andrew Tischenko"
This reverts commit ee709f8988653a0334fbf100cdbbdd83a3933347.

llvm-svn: 289814
2016-12-15 12:26:18 +00:00
Ehsan Amiri 795b0671c5 [InstCombine] New opportunities for FoldAndOfICmp and FoldXorOfICmp
A number of new patterns for simplifying and/xor of icmp:

(icmp ne %x, 0) ^ (icmp ne %y, 0) => icmp ne %x, %y if the following is true:
1- (%x = and %a, %mask) and (%y = and %b, %mask)
2- %mask is a power of 2.

(icmp eq %x, 0) & (icmp ne %y, 0) => icmp ult %x, %y if the following is true:
1- (%x = and %a, %mask1) and (%y = and %b, %mask2)
2- Let %t be the smallest power of 2 where %mask1 & %t != 0. Then for any
   %s that is a power of 2 and %s & %mask2 != 0, we must have %s <= %t.
For example if %mask1 = 24 and %mask2 = 16, setting %s = 16 and %t = 8
violates condition (2) above. So this optimization cannot be applied.

llvm-svn: 289813
2016-12-15 12:25:13 +00:00
Simon Pilgrim 9876ed07f6 [CostModel] Fix long standing bug with reverse shuffle mask detection
Incorrect 'undef' mask index matching meant that broadcast shuffles could be detected as reverse shuffles

llvm-svn: 289811
2016-12-15 12:12:45 +00:00
Alexey Bataev 67c90c7d95 [TESTS] Initial commit of tests, by Andrew Tischenko
llvm-svn: 289807
2016-12-15 11:48:24 +00:00
Nemanja Ivanovic 552c8e960e [Power9] Allow AnyExt immediates for XXSPLTIB
In some situations, the BUILD_VECTOR node that builds a v18i8 vector by
a splat of an i8 constant will end up with signed 8-bit values and other
situations, it'll end up with unsigned ones. Handle both situations.

Fixes PR31340.

llvm-svn: 289804
2016-12-15 11:16:20 +00:00
Dylan McKay 4f590f28e7 [AVR] Support floats in the instrumention pass
This also refactors some common code into the 'GetTypeName' method.

llvm-svn: 289803
2016-12-15 11:02:41 +00:00
Simon Pilgrim 9ebeac3eed [CostModel][X86] Add tests for reverse shuffle costs
llvm-svn: 289800
2016-12-15 10:45:53 +00:00
Prakhar Bahuguna bc35f21f70 Add missing triple target for numeric section flag test
llvm-svn: 289798
2016-12-15 10:20:48 +00:00
Pavel Labath 08c2e86802 Simplify format member detection in FormatVariadic
Summary:
This replaces the format member search, which was quite complicated, with a more
direct approach to detecting whether a class should be formatted using the
format-member method. Instead we use a special type llvm::format_adapter, which
every adapter must inherit from. Then the search can be simply implemented with
the is_base_of type trait.

Aside from the simplification, I like this way more because it makes it more
explicit that you are supposed to use this type only for adapter-like
formattings, and the other approach (format_provider overloads) should be used
as a default (a mistake I made when first trying to use this library).

The only slight change in behaviour here is that now choose the format-adapter
branch even if the format member invocation will fail to compile (e.g. because it is a
non-const member function and we are passing a const adapter), whereas
previously we would have gone on to search for format_providers for the type.
However, I think that is actually a good thing, as it probably means the
programmer did something wrong.

Reviewers: zturner, inglorion

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27679

llvm-svn: 289795
2016-12-15 09:40:27 +00:00
Sjoerd Meijer 96e10b5a9e [Thumb] Teach ISel how to lower compares of AND bitmasks efficiently
This is essentially a recommit of r285893, but with a correctness fix. The
problem of the original commit was that this:

bic r5, r7, #31
cbz r5, .LBB2_10

got rewritten into:

lsrs  r5, r7, #5
beq .LBB2_10

The result in destination register r5 is not the same and this is incorrect
when r5 is not dead. So this fix includes checking the uses of the AND
destination register. And also, compared to the original commit, some regression
tests didn't need changing anymore because of this extra check.

For completeness, this was the original commit message:

For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more
efficient instruction selection if the bitmask is one consecutive sequence of
set bits (32 - clz(bm) - ctz(bm) == popcount(bm)).

1) If the bitmask touches the LSB, then we can remove all the upper bits and
set the flags by doing one LSLS.
2) If the bitmask touches the MSB, then we can remove all the lower bits and
set the flags with one LSRS.
3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit
into the sign bit with one LSLS and change the condition query from NE/EQ to
MI/PL (we could also implement this by shifting into the carry bit and
branching on BCC/BCS).
4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower
zero bits of the mask.

1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two
16-bit instructions but can elide the CMP and doesn't require materializing a
complex immediate, so is also a win.

Differential Revision: https://reviews.llvm.org/D27761

llvm-svn: 289794
2016-12-15 09:38:59 +00:00
Dylan McKay 4b028e2ee1 [AVR] Add argument indices to the instrumention hook functions
This allows the instrumention hook functions to do better
pretty-printing.

llvm-svn: 289793
2016-12-15 09:38:09 +00:00
Prakhar Bahuguna 13e9921ccc Fix for build warning in execute-only support
llvm-svn: 289788
2016-12-15 08:42:04 +00:00
Prakhar Bahuguna e640c6f765 Allow ELF section flags to be specified numerically
Summary:
GAS already allows flags for sections to be specified directly as a
numeric value. This functionality is particularly useful for setting
processor or application-specific values that may not be directly
supported or understood by LLVM. This patch allows LLVM to use numeric
section flag values verbatim if specified by the assembly file.

Reviewers: grosbach, rafael, t.p.northover, rengolin

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27451

llvm-svn: 289785
2016-12-15 07:59:15 +00:00
Prakhar Bahuguna 52a7dd7d78 [ARM] Implement execute-only support in CodeGen
This implements execute-only support for ARM code generation, which
prevents the compiler from generating data accesses to code sections.
The following changes are involved:

* Add the CodeGen option "-arm-execute-only" to the ARM code generator.
* Add the clang flag "-mexecute-only" as well as the GCC-compatible
  alias "-mpure-code" to enable this option.
* When enabled, literal pools are replaced with MOVW/MOVT instructions,
  with VMOV used in addition for floating-point literals. As the MOVT
  instruction is required, execute-only support is only available in
  Thumb mode for targets supporting ARMv8-M baseline or Thumb2.
* Jump tables are placed in data sections when in execute-only mode.
* The execute-only text section is assigned section ID 0, and is
  marked as unreadable with the SHF_ARM_PURECODE flag with symbol 'y'.
  This also overrides selection of ELF sections for globals.

llvm-svn: 289784
2016-12-15 07:59:08 +00:00
Sanjoy Das 93b1de0f8c Add missing -mtriple to MIR test case
llvm-svn: 289779
2016-12-15 07:13:50 +00:00
Yaxun Liu 6f8d90999e Attempt to fix llvm-readobj crash on ppc64 due to r289674
llvm-svn: 289777
2016-12-15 06:59:23 +00:00
Daniel Jasper befe7a3fc4 Fix go bindings after r289702 (hopefully, don't really know how to build
them, build.sh seems to be broken).

llvm-svn: 289775
2016-12-15 06:54:29 +00:00
Kostya Serebryany 628b43aab6 [libFuzzer] enable the failure-resistant merge by default (with trace-pc-guard only)
llvm-svn: 289772
2016-12-15 06:21:21 +00:00
Dylan McKay dc58eb543f [AVR] Whitelist the avrlit config environment variables
This allows us to use `lit` to run on-target execution tests.

llvm-svn: 289769
2016-12-15 06:04:53 +00:00
Hal Finkel f19e114237 Revert part of r289765 that is not necessary
CS.doesNotAccessMemory(ArgNo) and CS.onlyReadsMemory(ArgNo) calls
dataOperandHasImpliedAttr, so revert this part of r289765 because
it should not be necessary.

llvm-svn: 289768
2016-12-15 05:50:45 +00:00
Hal Finkel 34f9d6ac11 Trying to fix NDEBUG build after r289764
llvm-svn: 289766
2016-12-15 05:33:19 +00:00
Hal Finkel 39fed399e1 Fix argument attribute queries with bundle operands
When iterating over data operands in AA, don't make argument-attribute-specific
queries on bundle operands. Trying to fix self hosting...

llvm-svn: 289765
2016-12-15 05:09:15 +00:00
Sanjoy Das d7389d6261 [MachineBlockPlacement] Don't make blocks "uneditable"
Summary:
This fixes an issue with MachineBlockPlacement due to a badly timed call
to `analyzeBranch` with `AllowModify` set to true.  The timeline is as
follows:

 1. `MachineBlockPlacement::maybeTailDuplicateBlock` calls
    `TailDup.shouldTailDuplicate` on its argument, which in turn calls
    `analyzeBranch` with `AllowModify` set to true.

 2. This `analyzeBranch` call edits the terminator sequence of the block
    based on the physical layout of the machine function, turning an
    unanalyzable non-fallthrough block to a unanalyzable fallthrough
    block.  Normally MBP bails out of rearranging such blocks, but this
    block was unanalyzable non-fallthrough (and thus rearrangeable) the
    first time MBP looked at it, and so it goes ahead and decides where
    it should be placed in the function.

 3. When placing this block MBP fails to analyze and thus update the
    block in keeping with the new physical layout.

Concretely, before (1) we have something like:

```
LBL0:
  < unknown terminator op that may branch to LBL1 >
  jmp LBL1

LBL1:
  ... A

LBL2:
  ... B
```

In (2), analyze branch simplifies this to

```
LBL0:
  < unknown terminator op that may branch to LBL2 >
  ;; jmp LBL1 <- redundant jump removed

LBL1:
  ... A

LBL2:
  ... B
```

In (3), MachineBlockPlacement goes ahead with its plan of putting LBL2
after the first block since that is profitable.

```
LBL0:
  < unknown terminator op that may branch to LBL2 >
  ;; jmp LBL1 <- redundant jump

LBL2:
  ... B

LBL1:
  ... A
```

and the program now has incorrect behavior (we no longer fall-through
from `LBL0` to `LBL1`) because MBP can no longer edit LBL0.

There are several possible solutions, but I went with removing the teeth
off of the `analyzeBranch` calls in TailDuplicator.  That makes thinking
about the result of these calls easier, and breaks nothing in the lit
test suite.

I've also added some bookkeeping to the MachineBlockPlacement pass and
used that to write an assert that would have caught this.

Reviewers: chandlerc, gberry, MatzeB, iteratee

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D27783

llvm-svn: 289764
2016-12-15 05:08:57 +00:00
Craig Topper ab5f355d8c [AVX-512][InstCombine] Add masked scalar FMA intrinsics to SimplifyDemandedVectorElts.
llvm-svn: 289759
2016-12-15 03:49:45 +00:00
Hal Finkel 321053a7ca Fix iterator-invalidation issue
Inserting a new key into a DenseMap potentially invalidates iterators into that
map. Trying to fix an issue from r289755 triggering this assertion:

  Assertion `isHandleInSync() && "invalid iterator access!"' failed.

llvm-svn: 289757
2016-12-15 03:30:40 +00:00
Hal Finkel 3ca4a6bcf1 Remove the AssumptionCache
After r289755, the AssumptionCache is no longer needed. Variables affected by
assumptions are now found by using the new operand-bundle-based scheme. This
new scheme is more computationally efficient, and also we need much less
code...

llvm-svn: 289756
2016-12-15 03:02:15 +00:00
Hal Finkel cb9f78e1c3 Make processing @llvm.assume more efficient by using operand bundles
There was an efficiency problem with how we processed @llvm.assume in
ValueTracking (and other places). The AssumptionCache tracked all of the
assumptions in a given function. In order to find assumptions relevant to
computing known bits, etc. we searched every assumption in the function. For
ValueTracking, that means that we did O(#assumes * #values) work in InstCombine
and other passes (with a constant factor that can be quite large because we'd
repeat this search at every level of recursion of the analysis).

Several of us discussed this situation at the last developers' meeting, and
this implements the discussed solution: Make the values that an assume might
affect operands of the assume itself. To avoid exposing this detail to
frontends and passes that need not worry about it, I've used the new
operand-bundle feature to add these extra call "operands" in a way that does
not affect the intrinsic's signature. I think this solution is relatively
clean. InstCombine adds these extra operands based on what ValueTracking, LVI,
etc. will need and then those passes need only search the users of the values
under consideration. This should fix the computational-complexity problem.

At this point, no passes depend on the AssumptionCache, and so I'll remove
that as a follow-up change.

Differential Revision: https://reviews.llvm.org/D27259

llvm-svn: 289755
2016-12-15 02:53:42 +00:00
Eli Friedman db07ebbab6 Add testcases for some shuffle bugs.
See https://llvm.org/bugs/show_bug.cgi?id=31301 and
https://llvm.org/bugs/show_bug.cgi?id=31364 .

llvm-svn: 289751
2016-12-15 01:47:15 +00:00
Nico Weber d43d3ba5cd Fix test/tools/lto/hide-linkonce-odr.ll after r289719
llvm-svn: 289750
2016-12-15 01:31:38 +00:00
Justin Lebar 7853d3b9dd [NVPTX] Remove dead #defines from NVPTXUtilities.h.
llvm-svn: 289747
2016-12-15 00:45:06 +00:00
Joerg Sonnenberger 400e7b7811 Use PIC relocation model as default for PowerPC64 ELF.
Most of the PowerPC64 code generation for the ELF ABI is already PIC.
There are four main exceptions:
(1) Constant pointer arrays etc. should in writeable sections.
(2) The TOC restoration NOP after a call is needed for all global
symbols. While GNU ld has a workaround for questionable GCC self-calls,
we trigger the checks for calls from COMDAT sections as they cross input
sections and are therefore not considered self-calls. The current
decision is questionable and suboptimal, but outside the scope of the
change.
(3) TLS access can not use the initial-exec model.
(4) Jump tables should use relative addresses. Note that the current
encoding doesn't work for the large code model, but it is more compact
than the default for any non-trivial jump table. Improving this is again
beyond the scope of this change.

At least (1) and (3) are assumptions made in target-independent code and
introducing additional hooks is a bit messy. Testing with clang shows
that a -fPIC binary is 600KB smaller than the corresponding -fno-pic
build. Separate testing from improved jump table encodings would explain
only about 100KB or so. The rest is expected to be a result of more
aggressive immediate forming for -fno-pic, where the -fPIC binary just
uses TOC entries.

This change brings the LLVM output in line with the GCC output, other
PPC64 compilers like XLC on AIX are known to produce PIC by default
as well. The relocation model can still be provided explicitly, i.e.
when using MCJIT.

One test case for case (1) is included, other test cases with relocation
mode sensitive behavior are wired to static for now. They will be
reviewed and adjusted separately.

Differential Revision: https://reviews.llvm.org/D26566

llvm-svn: 289743
2016-12-15 00:01:53 +00:00
Justin Lebar a091da75b2 [AMDGPU] Fix runtime-metadata.ll test so it doesn't leave an object file in the source tree.
llvm-svn: 289742
2016-12-14 23:24:43 +00:00
Justin Lebar a54f4d7052 [NVPTX] Remove dead code.
I've chosen to remove NVPTXInstrInfo::CanTailMerge but not
NVPTXInstrInfo::isLoadInstr and isStoreInstr (which are also dead)
because while the latter two are reasonably useful utilities, the former
cannot be used safely: It relies on successful address space inference
to identify writes to shared memory, but addrspace inference is a
best-effort thing.

llvm-svn: 289740
2016-12-14 23:20:40 +00:00
Sanjay Patel afee21a5b2 [DAG] allow more select folding for targets that have 'and not' (PR31175)
The original motivation for this patch comes from wanting to canonicalize 
more IR to selects and also canonicalizing min/max.

If we're going to do that, we need more backend fixups to undo select codegen 
when simpler ops will do. I chose AArch64 for the tests because that shows the
difference in the simplest way. This should fix:
https://llvm.org/bugs/show_bug.cgi?id=31175

Differential Revision: https://reviews.llvm.org/D27489

llvm-svn: 289738
2016-12-14 22:59:14 +00:00
Davide Italiano 1ebbd176b3 [gold] Add datalayout to two tests where it was missing.
Reported by: thakis via chromium bots.

llvm-svn: 289737
2016-12-14 22:53:43 +00:00
Eugene Zelenko f9f8c68290 [Hexagon] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 289736
2016-12-14 22:50:46 +00:00
Greg Clayton 52fe1f68c8 Add the ability to get attribute values as Optional<T>
When getting attributes it is sometimes nicer to use Optional<T> some of the time instead of magic values. I tried to cut over to only using the Optional values but it made many of the call sites very messy, so it makes sense the leave in the calls that can return a default value. Otherwise code that looks like this:

uint64_t CallColumn = Die.getAttributeValueAsAddress(DW_AT_call_line, 0);

Has to be turned into:

uint64_t CallColumn = 0;
if (auto CallColumnValue = Die.getAttributeValueAsAddress(DW_AT_call_line))
    CallColumn = *CallColumnValue;

The first snippet of code looks much better. But in cases where you want an offset that may or may not be there, the following code looks better:

if (auto StmtOffset = Die.getAttributeValueAsSectionOffset(DW_AT_stmt_list)) {
  // Use StmtOffset
}

Differential Revision: https://reviews.llvm.org/D27772

llvm-svn: 289731
2016-12-14 22:38:08 +00:00
Justin Lebar e7bbf7fde3 Whitespace cleanup in test/CodeGen/NVPTX/annotations.ll.
llvm-svn: 289730
2016-12-14 22:32:55 +00:00
Justin Lebar 19bf9d2b6d [NVPTX] Support .maxnreg annotation.
Reviewers: tra

Subscribers: llvm-commits, jholewinski

Differential Revision: https://reviews.llvm.org/D27638

llvm-svn: 289729
2016-12-14 22:32:50 +00:00
Justin Lebar e6867085fa [NVPTX] Remove string constants from NVPTXBaseInfo.h.
Summary:
Previously they were defined as a 2D char array in a header file.  This
is kind of overkill -- we can let the linker lay out these strings
however it pleases.  While we're at it, we might as well just inline
these constants where they're used, as each of them is used only once.

Also move NVPTXUtilities.{h,cpp} into namespace llvm.

Reviewers: tra

Subscribers: jholewinski, mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D27636

llvm-svn: 289728
2016-12-14 22:32:44 +00:00
Peter Collingbourne b677fe00fb LibDriver: Reject inputs that are not COFF objects or bitcode files.
Fixes PR31372.

Differential Revision: https://reviews.llvm.org/D27776

llvm-svn: 289726
2016-12-14 22:19:22 +00:00
Dehao Chen 40dd8c5109 Only sets profile summary when it was not preset.
Summary: SampleProfileLoader pass may be invoked twice by LTO. The 2nd pass should not append more summary info as it is already preset by the 1st pass.

Reviewers: eraman, davidxl

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D27733

llvm-svn: 289725
2016-12-14 22:06:49 +00:00
Dehao Chen fb699619a0 Fix the bug in r289714 (NFC).
llvm-svn: 289724
2016-12-14 22:03:08 +00:00
Jan Sjodin 071b0fa06a Revert revision 289721.
llvm-svn: 289723
2016-12-14 21:58:42 +00:00
Jan Sjodin 9419021bba Dummy commit.
llvm-svn: 289721
2016-12-14 21:57:18 +00:00
Davide Italiano ebed410ca0 [LTO] Add the missing datalayout in a test.
llvm-svn: 289720
2016-12-14 21:57:14 +00:00
Davide Italiano 2ceb628f36 [LTO] Reject modules without datalayout.
Also, udpate the ~60 failing tests in the tree which did
not contain a valid datalayout.
This fixes PR31123. lld will be updated in a following patch,
immediately after this is committed.

Differential Revision:  https://reviews.llvm.org/D27082

llvm-svn: 289719
2016-12-14 21:57:04 +00:00
Filipe Cabecinhas dd9688703c [asan] Don't skip instrumentation of masked load/store unless we've seen a full load/store on that pointer.
Reviewers: kcc, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27625

llvm-svn: 289718
2016-12-14 21:57:04 +00:00
Filipe Cabecinhas 1e69017a6d [asan] Hook ClInstrumentWrites and ClInstrumentReads to masked operation instrumentation.
Reviewers: kcc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27548

llvm-svn: 289717
2016-12-14 21:56:59 +00:00
Dehao Chen a99e082e15 Create SampleProfileLoader pass in llvm instead of clang
Summary: We used to create SampleProfileLoader pass in clang. This makes LTO/ThinLTO unable to add this pass in the linker plugin. This patch moves the SampleProfileLoader pass creation from clang to llvm pass manager builder.

Reviewers: tejohnson, davidxl, dnovillo

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D27743

llvm-svn: 289714
2016-12-14 21:40:47 +00:00
Eli Friedman cbed30c501 [ARM] Split 128-bit vectors in BUILD_VECTOR lowering
Given that INSERT_VECTOR_ELT operates on D registers anyway, combining
64-bit vectors into a 128-bit vector is basically free. Therefore, try
to split BUILD_VECTOR nodes before giving up and lowering them to a series
of INSERT_VECTOR_ELT instructions. Sometimes this allows dramatically
better lowerings; see testcases for examples. Inspired by similar code
in the x86 backend for AVX.

Differential Revision: https://reviews.llvm.org/D27624

llvm-svn: 289706
2016-12-14 20:44:38 +00:00
Nico Weber 53816d074d fix gcc warning about a superfluous ;
llvm-svn: 289705
2016-12-14 20:33:54 +00:00
Robert Lougher cfd7198698 [InstCombine] Folding of a compare with RHS const should merge debug locations
If all the operands to a phi node are compares that have a RHS constant,
instcombine will try to pull them through the phi node, combining them into
a single operation. When it does this, the debug location of the new op
should be the merged debug locations of the phi node arguments.

Patch 8 of 8 for D26256.  Folding of a compare that has a RHS constant.

Differential Revision: https://reviews.llvm.org/D26256

llvm-svn: 289704
2016-12-14 20:27:22 +00:00
Eli Friedman 10576e73c9 [ARM] Add ARMISD::VLD1DUP to match vld1_dup more consistently.
Currently, there are substantial problems forming vld1_dup even if the
VDUP survives legalization. The lack of an actual node
leads to terrible results: not only can we not form post-increment vld1_dup
instructions, but we form scalar pre-increment and post-increment
loads which force the loaded value into a GPR. This patch fixes that
by combining the vdup+load into an ARMISD node before DAGCombine
messes it up.

Also includes a crash fix for vld2_dup (see testcase @vld2dupi8_postinc_variable).

Differential Revision: https://reviews.llvm.org/D27694

llvm-svn: 289703
2016-12-14 20:25:26 +00:00
Amjad Aboud 43c8b6b7b2 [DebugInfo] Changed DIBuilder::createCompileUnit() to take DIFile instead of FileName and Directory.
This way it will be easier to expand DIFile (e.g., to contain checksum) without the need to modify the createCompileUnit() API.

Reviewers: llvm-commits, rnk

Differential Revision: https://reviews.llvm.org/D27762

llvm-svn: 289702
2016-12-14 20:24:54 +00:00
Yaxun Liu 04334b527d Fix build failure due to r289674 on certain systems
Removed a useless include which caused conflict.

llvm-svn: 289700
2016-12-14 20:17:47 +00:00
Robert Lougher c9f7354776 [InstCombine] Folding of a binop with RHS const should merge the debug locations
If all the operands to a phi node are a binop with a RHS constant, instcombine
will try to pull them through the phi node, combining them into a single
operation. When it does this, the debug location of the new op should be the
merged debug locations of the phi node arguments.

Patch 7 of 8 for D26256.  Folding of a binop with RHS constant.

Differential Revision: https://reviews.llvm.org/D26256

llvm-svn: 289699
2016-12-14 20:07:49 +00:00
David Blaikie b461468958 DebugInfo: Improve type safety and simplify some subprogram finalization code
This probably ended up this way aften the subprogram<>function link
inversion and debug info metadata schema changes.

llvm-svn: 289697
2016-12-14 19:38:39 +00:00
Geoff Berry ca11a1e147 [GVNHoist] Move GVNHoist to function simplification part of pipeline.
Summary:
Move GVNHoist to later in the optimization pipeline, specifically, to
the function simplification part of the pipeline.  The new pipeline
location allows GVNHoist to run on a function after its callees have
been inlined but before the function has been considered for inlining
into its callers, exposing more opportunities for hoisting.

Performance results on AArch64 kryo:
Improvements:
  Benchmarks/CoyoteBench/fftbench  -24.952%
  spec2006/bzip2                    -4.071%
  internal bmark                    -3.177%
  Benchmarks/PAQ8p/paq8p            -1.754%
  spec2000/perlbmk                  -1.328%
  spec2006/h264ref                  -1.140%

Regressions:
  internal bmark                    +1.818%
  Benchmarks/mafft/pairlocalalign   +1.084%

Reviewers: sebpop, dberlin, hiraditya

Subscribers: aemerson, mehdi_amini, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D27722

llvm-svn: 289696
2016-12-14 19:38:22 +00:00
Andrew Kaylor ce3bcae632 [WinEH] Avoid holding references to BlockColor (DenseMap) entries while inserting new elements
Differential Revision: https://reviews.llvm.org/D27693

llvm-svn: 289694
2016-12-14 19:30:18 +00:00
Robert Lougher f02d9b8325 [InstCombine] When folding casts through a phi node merge the debug locations
If all the operands to a phi node are a cast, instcombine will try to pull
them through the phi node, combining them into a single cast. When it does
this, the debug location of the new cast should be the merged debug locations
of the phi node arguments.

Patch 6 of 8 for D26256.  Folding of a cast operation.

Differential Revision: https://reviews.llvm.org/D26256

llvm-svn: 289693
2016-12-14 19:24:01 +00:00
Sean Callanan 62204ad74a Include <cstdarg> in PrettyStackTrace.cpp, fixing the bots.
llvm-svn: 289691
2016-12-14 19:19:53 +00:00
Sean Callanan 032dbf9ee3 Prepare PrettyStackTrace for LLDB adoption
This patch fixes the linkage for __crashtracer_info__, making it have the proper mangling (extern "C") and linkage (private extern).
It also adds a new PrettyStackTrace type, allowing LLDB to adopt this instead of Host::SetCrashDescriptionWithFormat().

Without this patch, CrashTracer on macOS won't pick up pretty stack traces from any LLVM client. 
An LLDB commit adopting this API will follow shortly.

Differential Revision: https://reviews.llvm.org/D27683

llvm-svn: 289689
2016-12-14 19:09:43 +00:00
Robert Lougher 373e36a410 [InstCombine] Folding loads through a phi node should merge the debug locations
If all the operands to a phi node are a load, instcombine will try to pull
them through the phi node, combining them into a single load. When it does
this, the debug location of the new load should be the merged debug locations
of the phi node arguments.

Patch 5 of 8 for D26256.  Folding of a load operation.

Differential Revision: https://reviews.llvm.org/D26256

llvm-svn: 289688
2016-12-14 19:02:14 +00:00
Robert Lougher 8fc1e89bbb [InstCombine] When folding GEP through a phi node merge the debug locations
If all the operands to a phi node are getelementptr, instcombine
will try to pull them through the phi node, combining them into a single
operation.  When it does this, the debug location of the new getelementptr
should be the merged debug locations of the phi node arguments.

Patch 4 of 8 for D26256.  Folding of a getelementptr operation.

Differential Revision: https://reviews.llvm.org/D26256

llvm-svn: 289684
2016-12-14 18:37:50 +00:00
Eric Christopher ba1024cfb8 This change does two things:
Adds a "Discriminator" field to struct DILineInfo, which defaults to 0.
Fills out the "Discriminator" field in DILineInfo in DWARFDebugLine::LineTable::getFileLineInfoForAddress().

in order to have a slightly nicer interface in getFileLineInfoForAddress.

Patch by Simon Que!

Differential Revision: https://reviews.llvm.org/D27649

llvm-svn: 289683
2016-12-14 18:29:39 +00:00
Robert Lougher 4b0790d488 [InstCombine] Merge debug locations when folding through a phi node
If all the operands to a phi node are of the same operation, instcombine
will try to pull them through the phi node, combining them into a single
operation.  When it does this, the debug location of the operation should
be the merged debug locations of the phi node arguments.

Patch 3 of 8 for D26256.  Folding of a compare operation.

Differential Revision: https://reviews.llvm.org/D26256

llvm-svn: 289681
2016-12-14 18:14:57 +00:00
Kostya Serebryany d9d9a54511 [libFuzzer] disable msan for one more hook that reads target's data that might be uninitialized
llvm-svn: 289680
2016-12-14 18:13:02 +00:00
Robert Lougher 2428a4050f [InstCombine] Merge debug locations when folding through a phi node
If all the operands to a phi node are of the same operation, instcombine
will try to pull them through the phi node, combining them into a single
operation.  When it does this, the debug location of the operation should
be the merged debug locations of the phi node arguments.

Patch 2 of 8 for D26256.  Folding of a binary operation.

Differential Revision: https://reviews.llvm.org/D26256

llvm-svn: 289679
2016-12-14 17:49:19 +00:00
Dehao Chen 23025f8483 revert r289669 which breaks bots
llvm-svn: 289676
2016-12-14 17:23:16 +00:00
Yaxun Liu 07d659bc76 AMDGPU: Emit runtime metadata version 2 as YAML
Differential Revision: https://reviews.llvm.org/D25046

llvm-svn: 289674
2016-12-14 17:16:52 +00:00
Derek Schuff ebd8110aa1 lit.cfg: Check value of build config rather than converting to boolean
This is a CMake var which never evaluates to false.

llvm-svn: 289673
2016-12-14 17:05:34 +00:00
Matt Arsenault bdc0ac0a0e AMDGPU: Make AllocationPriority of SGPRs higher than VGPRs
Since SGPRs should spill to VGPRs, they should be allocated first.
I don't think this is sufficient for SGPRs to always spill to
VGPRs though.

llvm-svn: 289671
2016-12-14 16:52:06 +00:00
Dehao Chen cb61c94d87 Create SampleProfileLoader pass in llvm instead of clang
Summary: We used to create SampleProfileLoader pass in clang. This makes LTO/ThinLTO unable to add this pass in the linker plugin. This patch moves the SampleProfileLoader pass creation from clang to llvm pass manager builder.

Reviewers: tejohnson, davidxl, dnovillo

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D27743

llvm-svn: 289669
2016-12-14 16:49:28 +00:00
Nirav Dave f5bf03c7ef Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
Reverting due to ARM MCJIT and MIPS LLD error.

This reverts commit r289659.

llvm-svn: 289667
2016-12-14 16:43:44 +00:00
Matt Arsenault ebfba7027e AMDGPU: Change vintrp printing
llvm-svn: 289664
2016-12-14 16:36:12 +00:00
Derek Schuff 112b303905 Revert gold part of change, just liblto
llvm-svn: 289663
2016-12-14 16:20:25 +00:00
Derek Schuff 0c2796dc36 Disable libLTO tests when libLTO is not built
Summary:
The current test only checks whether ld64 is available, causing tests
to fail when ld64 is avilable but libLTO is not built.

Reviewers: beanz, mehdi_amini

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D27739

llvm-svn: 289662
2016-12-14 16:20:22 +00:00
Robert Lougher 7bd04e3b2d New API for merging debug locations. NFC.
Given two debug locations the function getMergedLocation combines the
locations into a single location (which may be an empty location).
Please see https://reviews.llvm.org/D26256 for the discussion leading
up to this API.

Note the function is currently a stub.  This allows optimisations to
use the API although no location will actually be used.

This is patch 1 out of 8 for D26256.  As suggested by David Blaikie,
each change in D26256 has been broken out into a separate patch.

llvm-svn: 289661
2016-12-14 16:14:17 +00:00
Nirav Dave 8527ab0ad2 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Retrying after fixing after removing load-store factoring through
token factors in favor of improved token factor operand pruning

Simplify Consecutive Merge Store Candidate Search

Now that address aliasing is much less conservative, push through
simplified store merging search which only checks for parallel stores
through the chain subgraph. This is cleaner as the separation of
non-interfering loads/stores from the store-merging logic.

Whem merging stores, search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited. This improves the quality of the
output SelectionDAG and generally the output CodeGen (with some
exceptions).

Additional Minor Changes:

   1. Finishes removing unused AliasLoad code
   2. Unifies the the chain aggregation in the merged stores across
      code paths
   3. Re-add the Store node to the worklist after calling
      SimplifyDemandedBits.
   4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
      arbitrary, but seemed sufficient to not cause regressions in
      tests.

This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.

Many tests required some changes as memory operations are now
reorderable. Some tests relying on the order were changed to use
volatile memory operations

Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -

      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and
      merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 289659
2016-12-14 15:44:26 +00:00
Simon Pilgrim facbd35696 Wdocumentation fix
llvm-svn: 289655
2016-12-14 15:14:44 +00:00
Simon Pilgrim 05ab8ffc7e [DAGCombiner] Try to use SelectionDAG::isKnownToBeAPowerOfTwo instead of just APInt::isPowerOf2
Generalize sdiv/udiv/srem/urem combines using APInt::isPowerOf2, which only works for const/splat-const values, to call SelectionDAG::isKnownToBeAPowerOfTwo instead which recognises many more cases.

Added a DAGCombiner::BuildLogBase2 helper since PowerOf2 combines often involve taking the log2 of such a value.

Differential Revision: https://reviews.llvm.org/D27714

llvm-svn: 289654
2016-12-14 15:08:13 +00:00
Michael Zuckerman 1ce2a23a1e Fix bug 30945- [AVX512] Failure to flip vector comparison to remove not mask instruction
adding new optimization opportunity by adding new X86ISelLowering pattern. The test case was shown in https://llvm.org/bugs/show_bug.cgi?id=30945.

Test explanation:
Select gets three arguments mask, op and op2. In this case, the Mask is a result of ICMP. The ICMP instruction compares (with equal operand) the zero initializer vector and the result of the first ICMP.

In general, The result of "cmp eq, op1, zero initializers" is "not(op1)" where op1 is a mask. By rearranging of the two arguments inside the Select instruction, we can get the same result. Without the necessary of the middle phase ("cmp eq, op1, zero initializers").

Missed optimization opportunity: 
vpcmpled %zmm0, %zmm1, %k0
knotw %k0, %k1

can be combine to 
vpcmpgtd %zmm0, %zmm2, %k1

Reviewers: 
1. delena
2. igorb 

Commited after check all 
Differential Revision: https://reviews.llvm.org/D27160

llvm-svn: 289653
2016-12-14 14:57:10 +00:00
Simon Pilgrim ebe58191c8 [X86][SSE] Add AVX1 tests to sdiv/udiv srem/urem combine tests
As requested on D27714

llvm-svn: 289652
2016-12-14 14:39:51 +00:00
Renato Golin ce1dd3c949 Revert "[AVR] Add the very first on-target test"
This reverts commit r289648, as it's an execution test and relies on the
emulator/dispatcher being available on all builders.

llvm-svn: 289651
2016-12-14 13:24:20 +00:00
Stephan Bergmann 7d94d54a36 Adapt to recent APFloat change
llvm-svn: 289649
2016-12-14 12:11:35 +00:00
Dylan McKay 452e266cd6 [AVR] Add the very first on-target test
This test runs on actual AVR hardware.

llvm-svn: 289648
2016-12-14 12:03:39 +00:00
Stephan Bergmann 17c7f70362 Replace APFloatBase static fltSemantics data members with getter functions
At least the plugin used by the LibreOffice build
(<https://wiki.documentfoundation.org/Development/Clang_plugins>) indirectly
uses those members (through inline functions in LLVM/Clang include files in turn
using them), but they are not exported by utils/extract_symbols.py on Windows,
and accessing data across DLL/EXE boundaries on Windows is generally
problematic.

Differential Revision: https://reviews.llvm.org/D26671

llvm-svn: 289647
2016-12-14 11:57:17 +00:00
Artur Pilipenko f3ee444010 Add a couple of assertions to the load combine code introduced by r289538
llvm-svn: 289646
2016-12-14 11:55:47 +00:00
Dylan McKay cfd1ce6a52 [AVR] Add the integrated testing tool to the .gitignore
We build it as an LLVM tool.

llvm-svn: 289645
2016-12-14 11:47:14 +00:00
Oliver Stannard 268f42f1ce [Assembler] Better error messages for .org directive
Currently, the error messages we emit for the .org directive when the
expression is not absolute or is out of range do not include the line
number of the directive, so it can be hard to track down the problem if
a file contains many .org directives.

This patch stores the source location in the MCOrgFragment, so that it
can be used for diagnostics emitted during layout.

Since layout is an iterative process, and the errors are detected during
each iteration, it would have been possible for errors to be reported
multiple times. To prevent this, I've made the assembler bail out after
each iteration if any errors have been reported. This will still allow
multiple unrelated errors to be reported in the common case where they
are all detected in the first round of layout.

Differential Revision: https://reviews.llvm.org/D27411

llvm-svn: 289643
2016-12-14 10:43:58 +00:00
Dylan McKay 3abd1d3e12 [AVR] Add a function instrumentation pass
This will be used for an on-chip test suite.

llvm-svn: 289641
2016-12-14 10:15:00 +00:00
Craig Topper aeaa52cc11 [X86][InstCombine] Handle demanded elements for operand of AVX-512 scalar floating point to integer conversion intrinsics.
llvm-svn: 289639
2016-12-14 07:46:12 +00:00
Hal Finkel 065b756528 [PowerPC] Fix logic dealing with nop after calls (and tail-call eligibility)
This change aims to unify and correct our logic for when we need to allow for
the possibility of the linker adding a TOC restoration instruction after a
call. This comes up in two contexts:

 1. When determining tail-call eligibility. If we make a tail call (i.e.
    directly branch to a function) then there is no place for the linker to add
    a TOC restoration.
 2. When determining when we need to add a nop instruction after a call.
    Likewise, if there is a possibility that the linker might need to add a
    TOC restoration after a call, then we need to put a nop after the call
    (the bl instruction).

First problem: We were using similar, but different, logic to decide (1) and
(2). This is just wrong. Both the resideInSameModule function (used when
determining tail-call eligibility) and the isLocalCall function (used when
deciding if the post-call nop is needed) were supposed to be determining the
same underlying fact (i.e. might a TOC restoration be needed after the call).
The same logic should be used in both places.

Second problem: The logic in both places was wrong. We only know that two
functions will share the same TOC when both functions come from the same
section of the same object. Otherwise the linker might cause the functions to
use different TOC base addresses (unless the multi-TOC linker option is
disabled, in which case only shared-library boundaries are relevant). There are
a number of factors that can cause functions to be placed in different sections
or come from different objects (-ffunction-sections, explicitly-specified
section names, COMDAT, weak linkage, etc.). All of these need to be checked.
The existing logic only checked properties of the callee, but the properties of
the caller must also be checked (for example, calling from a function in a
COMDAT section means calling between sections).

There was a conceptual error in the resideInSameModule function in that it
allowed tail calls to functions with weak linkage and protected/hidden
visibility. While protected/hidden visibility does prevent the function
implementation from being replaced at runtime (via interposition), it does not
prevent the linker from using an alternate implementation at link time (i.e.
using some strong definition to replace the provided weak one during linking).
If this happens, then we're still potentially looking at a required TOC
restoration upon return.

Otherwise, in general, the post-call nop is needed wherever ELF interposition
needs to be supported. We don't currently support ELF interposition at the IR
level (see http://lists.llvm.org/pipermail/llvm-dev/2016-November/107625.html
for more information), and I don't think we should try to make it appear to
work in the backend in spite of that fact. This will yield subtle bugs if
interposition is attempted. As a result, regardless of whether we're in PIC
mode, we don't assume that we need to add the nop to support the possibility of
ELF interposition. However, the necessary check is in place (i.e. calling
GV->isInterposable and TM.shouldAssumeDSOLocal) so when we have functions for
which interposition is allowed at the IR level, we'll add the nop as necessary.
In the mean time, we'll generate more tail calls and fewer nops when compiling
position-independent code.

Differential Revision: https://reviews.llvm.org/D27231

llvm-svn: 289638
2016-12-14 07:24:50 +00:00