Commit Graph

40260 Commits

Author SHA1 Message Date
Pierre Gousseau b6d652adb5 [X86] Take advantage of the lzcnt instruction on btver2 architectures when ORing comparisons to zero.
This change adds transformations such as:
  zext(or(setcc(eq, (cmp x, 0)), setcc(eq, (cmp y, 0))))
  To:
  srl(or(ctlz(x), ctlz(y)), log2(bitsize(x))
This optimisation is beneficial on Jaguar architecture only, where lzcnt has a good reciprocal throughput.
Other architectures such as Intel's Haswell/Broadwell or AMD's Bulldozer/PileDriver do not benefit from it.
For this reason the change also adds a "HasFastLZCNT" feature which gets enabled for Jaguar.

Differential Revision: https://reviews.llvm.org/D23446

llvm-svn: 284248
2016-10-14 16:41:38 +00:00
Sanjay Patel 6d6eca5cdc [InstCombine] use m_APInt to allow sub with constant folds for splat vectors
llvm-svn: 284247
2016-10-14 16:31:54 +00:00
Sanjay Patel ecd0da2619 [InstCombine] add tests for missing vector folds
llvm-svn: 284245
2016-10-14 15:55:34 +00:00
Sanjay Patel a3bc38b36d [InstCombine] auto-generate checks
llvm-svn: 284244
2016-10-14 15:41:25 +00:00
Sanjay Patel 0b611dcabf [InstCombine] remove redundant test
This test was apparently checking for 2 independent folds, but we have
plenty of tests for those individual folds already. We are lacking
vector tests, however, because we don't have the shift folds for vectors.

llvm-svn: 284243
2016-10-14 15:36:28 +00:00
Sanjay Patel ad0757febb [InstCombine] update test to use FileCheck and auto-generate checks
llvm-svn: 284242
2016-10-14 15:30:31 +00:00
Sanjay Patel c6c5965a42 [InstCombine] sub X, sext(bool Y) -> add X, zext(bool Y)
Prefer add/zext because they are better supported in terms of value-tracking.

Note that the backend should be prepared for this IR canonicalization 
(including vector types) after:
https://reviews.llvm.org/rL284015

Differential Revision: https://reviews.llvm.org/D25135

llvm-svn: 284241
2016-10-14 15:24:31 +00:00
Sanjay Patel 00fc7a6159 [DAG] add folds for negated shifted sign bit
The same folds exist in InstCombine already.

This came up as part of:
https://reviews.llvm.org/D25485

llvm-svn: 284239
2016-10-14 14:26:47 +00:00
Sanjay Patel 7b4e4afb61 [x86] add tests to show missing folds for negated shifted sign bit
llvm-svn: 284238
2016-10-14 14:14:40 +00:00
Nicolai Haehnle 67624af0cc AMDGPU: Select 64-bit {ADD,SUB}{C,E} nodes
Summary:
This will be used for 64-bit MULHU, which is in turn used for the 64-bit
divide-by-constant optimization (see D24822).

Reviewers: arsenm, tstellarAMD

Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25289

llvm-svn: 284224
2016-10-14 10:30:00 +00:00
Diana Picus 68c7b04e8d [GlobalISel] Get the AArch64 tests to work on Linux
Mostly this just means changing the triple from aarch64-apple-ios to the generic
aarch64--. Only one test needs more significant changes, but GlobalISel already
does the right thing so it's ok to just change the checks.

Differential Revision: https://reviews.llvm.org/D25532

llvm-svn: 284223
2016-10-14 10:19:40 +00:00
Simon Dardis b3fd189cb5 [mips] Fix aui/daui/dahi/dati for MIPSR6
For compatiblity with binutils, define these instructions to take
two registers with a 16bit unsigned immediate. Both of the registers
have to be same for dahi and dati.

Reviewers: dsanders, zoran.jovanovic

Differential Review: https://reviews.llvm.org/D21473

llvm-svn: 284218
2016-10-14 09:31:42 +00:00
Craig Topper 40feb7f157 [DAGCombiner] Teach createBuildVecShuffle to handle cases where input vectors are less than half of the output vector size.
This will be needed by a future commit to support sign/zero extending from v8i8 to v8i64 which requires a sign/zero_extend_vector_inreg to be created which requires v8i8 to be concatenated upto v64i8 and goes through this code.

llvm-svn: 284204
2016-10-14 06:00:42 +00:00
Konstantin Zhuravlyov c96b5d7073 [AMDGPU] Emit 32-bit lo/hi got and pc relative variant kinds for external and global address space variables
Differential Revision: https://reviews.llvm.org/D25562

llvm-svn: 284196
2016-10-14 04:37:34 +00:00
Konstantin Zhuravlyov 2a2ac37c2c [AMDGPU] Add 32-bit lo/hi got and pc relative variant kinds and emit appropriate relocations
Differential Revision: https://reviews.llvm.org/D25548

llvm-svn: 284195
2016-10-14 04:21:32 +00:00
Konstantin Zhuravlyov ee68fdadfe [Support/ELF/AMDGPU] Add 32-bit lo/hi got and pc relative relocations
Added relocation names:
  - R_AMDGPU_GOTPCREL32_LO
  - R_AMDGPU_GOTPCREL32_HI
  - R_AMDGPU_REL32_LO
  - R_AMDGPU_REL32_HI

AMDGPU isa only supports 32-bit immediates. In order to access 64-bit address we need to generate 32-bit lo/hi relocations, and do the right math (separate patch). Currently we only generate one 32 bit relocation for lower bits for each access, losing higher bits. Hence we need relocations listed above.

Differential Revision: https://reviews.llvm.org/D25546

llvm-svn: 284191
2016-10-14 04:03:49 +00:00
Saleem Abdulrasool 7705c4f1be CodeGen: use MSVC division on windows itanium
Windows itanium is identical to MSVC when dealing with everything but C++.
Lower the math routines into msvcrt rather than compiler-rt.

llvm-svn: 284175
2016-10-13 23:00:11 +00:00
Saleem Abdulrasool 06383dd272 CodeGen: adjust floating point operations in Windows itanium
Windows itanium is equivalent to MSVC except in C++ mode.  Ensure that the
promote the 32-bit floating point operations to their 64-bit equivalences.

llvm-svn: 284173
2016-10-13 22:38:15 +00:00
Sriraman Tallam f29fa586e1 New llc option pie-copy-relocations to optimize access to extern globals.
This option indicates copy relocations support is available from the linker
when building as PIE and allows accesses to extern globals to avoid the GOT.

Differential Revision: https://reviews.llvm.org/D24849

llvm-svn: 284160
2016-10-13 20:54:39 +00:00
Nirav Dave a81682aad4 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r284151 which appears to be triggering a LTO
failures on Hexagon

llvm-svn: 284157
2016-10-13 20:23:25 +00:00
Nirav Dave 4b36957243 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Retrying after upstream changes.

   Simplify Consecutive Merge Store Candidate Search

   Now that address aliasing is much less conservative, push through
   simplified store merging search which only checks for parallel stores
   through the chain subgraph. This is cleaner as the separation of
   non-interfering loads/stores from the store-merging logic.

   Whem merging stores, search up the chain through a single load, and
   finds all possible stores by looking down from through a load and a
   TokenFactor to all stores visited. This improves the quality of the
   output SelectionDAG and generally the output CodeGen (with some
   exceptions).

   Additional Minor Changes:

       1. Finishes removing unused AliasLoad code
       2. Unifies the the chain aggregation in the merged stores across
       code paths
       3. Re-add the Store node to the worklist after calling
       SimplifyDemandedBits.
       4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
       arbitrary, but seemed sufficient to not cause regressions in
       tests.

   This finishes the change Matt Arsenault started in r246307 and
   jyknight's original patch.

   Many tests required some changes as memory operations are now
   reorderable. Some tests relying on the order were changed to use
   volatile memory operations

   Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -

      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and
      merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

    CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll -
      This test appears to work but no longer exhibits the spill behavior.

Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 284151
2016-10-13 19:20:16 +00:00
David L Kreitzer d9ca3589de [safestack] Move X86-targeted tests into the X86 subdirectory.
Patch by Michael LeMay

Differential revision: http://reviews.llvm.org/D25340

llvm-svn: 284139
2016-10-13 17:51:59 +00:00
Reid Kleckner edfc9dcf42 Truncate long names in type records
In the MS ABI, the frontend is supposed to MD5 such pathologically long
names. LLVM should still defend itself from long names, though.

Fixes part of PR29098.

llvm-svn: 284136
2016-10-13 17:33:22 +00:00
Igor Breger 8409c356ad [X86][AVX512] Fix sext v32i1 -> v32i8 lowering.
Fix PR30600.

Differential Revision: https://reviews.llvm.org/D25554

llvm-svn: 284134
2016-10-13 17:20:38 +00:00
Reid Kleckner 468e793fea Fix for PR30687. Avoid dereferencing MBB.end().
We don't need to return a MachineInstr* from these stack probe insertion
calls anyway. If we ever need to add it back, we can return an iterator
instead.

Based on a patch by David Kreitzer

This bug is a consequence of

r279314 | dexonsmith | 2016-08-19 13:40:12 -0700 (Fri, 19 Aug 2016) | 110 lines

We hit the "Assertion `!NodePtr->isKnownSentinel()' failed" assertion,
but only when inserting a stack probe call at the end of an MBB, which
isn't necessarily a common situation.

Differential Revision: https://reviews.llvm.org/D25566

llvm-svn: 284130
2016-10-13 15:48:48 +00:00
Javed Absar 85874a9360 [ARM]: Assign cost of scaling used in addressing mode for ARM cores
This patch assigns cost of the scaling used in addressing.
On many ARM cores, a negated register offset takes longer than a
non-negated register offset, in a register-offset addressing mode.

For instance:

LDR R0, [R1, R2 LSL #2]
LDR R0, [R1, -R2 LSL #2]

Above, (1) takes less cycles than (2).

By assigning appropriate scaling factor cost, we enable the LLVM
to make the right trade-offs in the optimization and code-selection phase.

Differential Revision: http://reviews.llvm.org/D24857

Reviewers: jmolloy, rengolin
llvm-svn: 284127
2016-10-13 14:57:43 +00:00
Matthew Simpson 1d4b163fc0 [LV] Account for predicated stores in instruction costs
This patch ensures that we scale the estimated cost of predicated stores by
block probability. This is a follow-on patch for r284123.

llvm-svn: 284126
2016-10-13 14:54:31 +00:00
Sanjay Patel 24b6ef7792 [x86] add negate-i1 run for 32-bit target
llvm-svn: 284124
2016-10-13 14:27:08 +00:00
Matthew Simpson 6cdb5a6f96 [LV] Avoid rounding errors for predicated instruction costs
This patch modifies the cost calculation of predicated instructions (div and
rem) to avoid the accumulation of rounding errors due to multiple truncating
integer divisions. The calculation for predicated stores will be addressed in a
follow-on patch since we currently don't scale the cost of predicated stores by
block probability.

Differential Revision: https://reviews.llvm.org/D25333

llvm-svn: 284123
2016-10-13 14:19:48 +00:00
Simon Pilgrim cb59b5257c [DAGCombiner] Add vector support to (mul (shl X, Y), Z) -> (shl (mul X, Z), Y) style combines
llvm-svn: 284122
2016-10-13 14:04:35 +00:00
Matt Arsenault 253640e18d AMDGPU: Assume spilling will occur at -O0
Because everything live is spilled at the end of a
block by fast regalloc, assume this will happen and
avoid the copies of the resource descriptor.

llvm-svn: 284119
2016-10-13 13:10:00 +00:00
Simon Pilgrim 26b6dbc369 Copy+pasts typo in comment describing combine test
Repeated the "fold (mul x, 0) -> 0" instead of "fold (mul x, 1) -> x"

llvm-svn: 284118
2016-10-13 12:54:32 +00:00
Simon Pilgrim fa8fadc0e5 [DAGCombiner] Add vector support to C2-(A+C1) -> (C2-C1)-A folding
llvm-svn: 284117
2016-10-13 12:49:31 +00:00
Simon Dardis 515e8699f4 [mips] Add IAS support for dvp, evp
These instructions were only defined for microMIPSR6 previously. Add
definitions for MIPSR6, correct definitions for microMIPSR6, flag these
instructions as having unmodelled side effects (they disable/enable
virtual processors) and add missing disassember tests for microMIPSR6.

Reviewers: vkalintiris

Differential Review: https://reviews.llvm.org/D24291

llvm-svn: 284115
2016-10-13 12:12:56 +00:00
Simon Pilgrim 833b8a2071 [DAGCombiner] Add vector support to (sub -1, x) -> (xor x, -1) canonicalization
Improves commutation potential

llvm-svn: 284113
2016-10-13 12:05:20 +00:00
Oren Ben Simhon 92ccbf20ff [X86] Basic additions to support RegCall Calling Convention.
The Register Calling Convention (RegCall) was introduced by Intel to optimize parameter transfer on function call.
This calling convention ensures that as many values as possible are passed or returned in registers.
This commit presents the basic additions to LLVM CodeGen in order to support RegCall in X86.

Differential Revision: http://reviews.llvm.org/D25022

llvm-svn: 284108
2016-10-13 07:53:43 +00:00
Craig Topper 3d41f91f61 [AVX-512] Fix v16i32 zero extending shuffle test case so it's really zero extend.
llvm-svn: 284106
2016-10-13 05:41:01 +00:00
Craig Topper ff23af4299 [AVX-512] Teach shuffle lowering to recognize 512-bit zero extends.
llvm-svn: 284105
2016-10-13 05:29:41 +00:00
Craig Topper 05242739c2 [AVX-512] Add tests for basic 512-bit zero extending shuffle patterns. Code will be improved in a future commit.
llvm-svn: 284104
2016-10-13 05:29:37 +00:00
Sebastian Pop 5ba9f24ed7 commit back "GVN-hoist: fix store past load dependence analysis (PR30216, PR30499)"
This is with an extra change to avoid calling MemoryLocation::get() on a call instruction.

Differential Revision: https://reviews.llvm.org/D25542

llvm-svn: 284098
2016-10-13 01:39:10 +00:00
Quentin Colombet 6b87a3109c [AArch64][RegisterBankInfo] Provide alternative mappings for 64-bit load
This allows RegBankSelect in greedy mode to get rid some of the cross
register bank copies when loads are involved in the chain of
computation.

llvm-svn: 284097
2016-10-13 01:01:23 +00:00
Reid Kleckner 741d8a21d3 Correct PrivateLinkage for COFF
- Use storage class C_STAT for 'PrivateLinkage' The storage class for
  PrivateLinkage should equal to the Internal Linkage.

- Set 'PrivateGlobalPrefix' from "L" to ".L" for MM_WinCOFF (includes
  x86_64) MM_WinCOFF has empty GlobalPrefix '\0' so PrivateGlobalPrefix
  "L" may conflict to the normal symbol name starting with 'L'.

Based on a patch by Han Sangjin! Manually updated test cases.

llvm-svn: 284096
2016-10-13 00:55:24 +00:00
Quentin Colombet cd80e97e88 [AArch64][RegisterBankInfo] Provide alternative mappings for G_BITCASTs.
Thanks to this patch, RegBankSelect is able to get rid of some register
bank copies as demonstrated in the test case.

llvm-svn: 284094
2016-10-13 00:34:48 +00:00
Reid Kleckner 8958f6a529 Revert "GVN-hoist: fix store past load dependence analysis (PR30216, PR30499)"
This CL didn't actually address the test case in PR30499, and clang
still crashes.

Also revert dependent change "Memory-SSA cleanup of clobbers interface, NFC"

Reverts r283965 and r283967.

llvm-svn: 284093
2016-10-13 00:18:26 +00:00
Quentin Colombet 9e64919b7c [AArch64][RegisterBankInfo] Use static mapping for same bank G_BITCAST.
NFC.

llvm-svn: 284090
2016-10-13 00:12:04 +00:00
Quentin Colombet db643d9091 [AArch64][MachineLegalizer] Mark more G_BITCAST as legal.
Basically any vector types that fits in a 32-bit register is also valid
as far as copies are concerned.

llvm-svn: 284089
2016-10-13 00:12:01 +00:00
Albert Gutowski 3245ee7e57 fix function label name in addressofreturnaddress test
llvm-svn: 284085
2016-10-12 23:58:45 +00:00
Krzysztof Parzyszek abc0662f04 Handle lane masks in LivePhysRegs when adding live-ins
Differential Revision: https://reviews.llvm.org/D25533

llvm-svn: 284076
2016-10-12 22:53:41 +00:00
Tim Northover fb8d989818 GlobalISel: support G_TRUNC selection on AArch64.
Ahmed's patch again.

llvm-svn: 284075
2016-10-12 22:49:15 +00:00
Tim Northover 69271c64d5 GlobalISel: support int <-> float conversions on AArch64.
More of Ahmed's work.

llvm-svn: 284074
2016-10-12 22:49:11 +00:00
Tim Northover 7dd378dd08 GlobalISel: select G_FCMP instructions on AArch64.
Another of Ahmed's patches.

llvm-svn: 284073
2016-10-12 22:49:07 +00:00
Tim Northover 6c02ad5e4f GlobalISel: support selection of G_ICMP on AArch64.
Patch from Ahmed Bougaca again.

llvm-svn: 284072
2016-10-12 22:49:04 +00:00
Tim Northover 5e3dbf326c GlobalISel: select G_BRCOND instructions on AArch64.
llvm-svn: 284071
2016-10-12 22:49:01 +00:00
Tim Northover 6aacd27cd7 GlobalISel: mark G_BRCOND on s1 as legal.
It's going to be a TBNZ (at -O0) anyway, so the high bits don't matter.

llvm-svn: 284070
2016-10-12 22:48:36 +00:00
Albert Gutowski 795d7d6381 Create llvm.addressofreturnaddress intrinsic
Summary: We need a new LLVM intrinsic to implement MS _AddressOfReturnAddress builtin on 64-bit Windows.

Reviewers: majnemer, rnk

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25293

llvm-svn: 284061
2016-10-12 22:13:19 +00:00
Haicheng Wu 1ef17e90b2 Reapply "[LoopUnroll] Use the upper bound of the loop trip count to fullly unroll a loop"
Reappy r284044 after revert in r284051. Krzysztof fixed the error in r284049.

The original summary:

This patch tries to fully unroll loops having break statement like this

for (int i = 0; i < 8; i++) {
    if (a[i] == value) {
        found = true;
        break;
    }
}

GCC can fully unroll such loops, but currently LLVM cannot because LLVM only
supports loops having exact constant trip counts.

The upper bound of the trip count can be obtained from calling
ScalarEvolution::getMaxBackedgeTakenCount(). Part of the patch is the
refactoring work in SCEV to prevent duplicating code.

The feature of using the upper bound is enabled under the same circumstance
when runtime unrolling is enabled since both are used to unroll loops without
knowing the exact constant trip count.

llvm-svn: 284053
2016-10-12 21:29:38 +00:00
Krzysztof Parzyszek d62669d791 [MIRParser] Parse lane masks for register live-ins
Differential Revision: https://reviews.llvm.org/D25530

llvm-svn: 284052
2016-10-12 21:06:45 +00:00
Haicheng Wu 45e4ef737d Revert "[LoopUnroll] Use the upper bound of the loop trip count to fullly unroll a loop"
This reverts commit r284044.

llvm-svn: 284051
2016-10-12 21:02:22 +00:00
Krzysztof Parzyszek 3cb5ffeb35 Fix testcases failing after r284036
The codegen has changed slightly between my tests and the commit.

llvm-svn: 284049
2016-10-12 20:39:33 +00:00
Haicheng Wu 6cac34fd41 [LoopUnroll] Use the upper bound of the loop trip count to fullly unroll a loop
This patch tries to fully unroll loops having break statement like this

for (int i = 0; i < 8; i++) {
    if (a[i] == value) {
        found = true;
        break;
    }
}

GCC can fully unroll such loops, but currently LLVM cannot because LLVM only
supports loops having exact constant trip counts.

The upper bound of the trip count can be obtained from calling
ScalarEvolution::getMaxBackedgeTakenCount(). Part of the patch is the
refactoring work in SCEV to prevent duplicating code.

The feature of using the upper bound is enabled under the same circumstance
when runtime unrolling is enabled since both are used to unroll loops without
knowing the exact constant trip count.

Differential Revision: https://reviews.llvm.org/D24790

llvm-svn: 284044
2016-10-12 20:24:32 +00:00
Peter Collingbourne 46aafc16e8 LTO: Use the correct mangler function in LTOCodeGenerator::applyScopeRestrictions().
We need to use the overload of Mangler::getNameWithPrefix that takes a
GlobalValue in order to mangle in the stdcall stack byte count for Windows
targets.

Differential Revision: https://reviews.llvm.org/D25529

llvm-svn: 284040
2016-10-12 20:12:19 +00:00
Krzysztof Parzyszek 8271be9a1d Do not remove implicit defs in BranchFolder
Branch folder removes implicit defs if they are the only non-branching
instructions in a block, and the branches do not use the defined registers.
The problem is that in some cases these implicit defs are required for
the liveness information to be correct.

Differential Revision: https://reviews.llvm.org/D25478

llvm-svn: 284036
2016-10-12 19:50:57 +00:00
Matt Arsenault d486d3f8d1 AMDGPU: Initial implementation of VGPR indexing mode
This is the most basic handling of the indirect access
pseudos using GPR indexing mode. This currently only enables
the mode for a single v_mov_b32 and then disables it.
This is much more complicated to use than the movrel instructions,
so a new optimization pass is probably needed to fold the access
into the uses and keep the mode enabled for them.

llvm-svn: 284031
2016-10-12 18:49:05 +00:00
Teresa Johnson 4b9b379172 [ThinLTO] Don't link module level assembly when importing
Module inline asm was always being linked/concatenated
when running the IRLinker. This is correct for full LTO but not when
we are importing for ThinLTO, as it can result in multiply defined
symbols when the module asm defines a global symbol.

In order to test with llvm-lto2, I had to work around PR30396,
where a symbol that is defined in module assembly but defined in the
LLVM IR appears twice. Added workaround to llvm-lto2 with a FIXME.

Fixes PR30610.

Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25359

llvm-svn: 284030
2016-10-12 18:39:29 +00:00
Sanjoy Das bc357e8fa3 [SimplifyCFG] Don't create PHI nodes for constant bundle operands
Summary:
Constant bundle operands may need to retain their constant-ness for
correctness.  I'll admit that this is slightly odd, but it looks like
SimplifyCFG already does this for things like @llvm.frameaddress and
@llvm.stackmap, so I suppose adding one more case is not a big deal.

It is possible to add a mechanism to denote bundle operands that need to
remain constants, but that's probably too complicated for the time
being.

Reviewers: jmolloy

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D25502

llvm-svn: 284028
2016-10-12 18:15:33 +00:00
Matt Arsenault cc88ce36ed AMDGPU: Add instruction definitions for VGPR indexing
VI added a second method of indexing into VGPRs
besides using v_movrel*

llvm-svn: 284027
2016-10-12 18:00:51 +00:00
Zvi Rackover 025e8614ab [X86] Add the v4i32 flavor test-case for pr30371
llvm-svn: 284025
2016-10-12 17:06:30 +00:00
Tom Stellard fac248cb5f AMDGPU/SI: Change mimg intrinsic signatures
This makes more fields overridable and removes redundant bits.

Patch by: Changpeng Fang

llvm-svn: 284024
2016-10-12 16:35:29 +00:00
Artur Pilipenko c6eb6bd9cb [ValueTracking] An improvement to IR ValueTracking on Non-negative Integers
Since this change is known to cause performance degradations in some cases it's commited under a temporary flag which is turned off by default.

Patch by Li Huang

Differential Revision: https://reviews.llvm.org/D18777

llvm-svn: 284022
2016-10-12 16:18:43 +00:00
Nirav Dave d046332679 [MC] Fix Error Location for ParseIdentifier
Prevent partial parsing of '$' or '@' of invalid identifiers and fixup
workaround points. NFC Intended.

llvm-svn: 284017
2016-10-12 13:58:07 +00:00
Simon Pilgrim 08190943cb [DAGCombiner] Update most ADD combines to support general vector combines
Add a number of helper functions to match scalar or vector equivalent constant/splat values to allow most of the combine patterns to be used by vectors.

Differential Revision: https://reviews.llvm.org/D25374

llvm-svn: 284015
2016-10-12 13:48:10 +00:00
Konstantin Zhuravlyov 081385a74e [DAGCombiner] Do not remove the load of stored values when optimizations are disabled
This combiner breaks debug experience and should not be run when optimizations are disabled.

For example:
  int main() {
    int j = 0;
    j += 2;
    if (j == 2)
      return 0;
    return 5;
  }
When debugging this code compiled in /O0, it should be valid to break at line "j+=2;" and edit the value of j. It should change the return value of the function.

Differential Revision: https://reviews.llvm.org/D19268

llvm-svn: 284014
2016-10-12 13:44:24 +00:00
Chad Rosier c215c3fd14 [CVP] Convert an AShr to a LShr if 1st operand is known to be nonnegative.
An arithmetic shift can be safely changed to a logical shift if the first
operand is known positive. This allows ComputeKnownBits (and similar analysis)
to determine the sign bit of the shifted value in some cases. In turn, this
allows InstCombine to canonicalize a signed comparison (a > 0) into an equality
check (a != 0).

PR30577

Differential Revision: https://reviews.llvm.org/D25119

llvm-svn: 284013
2016-10-12 13:41:38 +00:00
Simon Pilgrim fd0d7b21e0 [InstCombine] Fix constexpr issue in select combining
As discussed by Andrea on PR30486, we have an unsafe cast to an Instruction type in the select combine which doesn't take into account that it could be a ConstantExpr instead.

Differential Revision: https://reviews.llvm.org/D25466

llvm-svn: 284000
2016-10-12 10:20:15 +00:00
Diana Picus 40f9341154 Add AArch64 unit tests
Add unit tests for checking a few tricky instruction sizes. Also remove the old
tests for the instruction sizes, which were clunky and brittle.

Since this is the first set of target-specific unit tests, we need to add some
CMake plumbing. In the future, adding unit tests for a given target will be as
simple as creating a directory with the same name as the target under
unittests/Target. The tests are only run if the target is enabled in
LLVM_TARGETS_TO_BUILD.

Differential Revision: https://reviews.llvm.org/D24548

llvm-svn: 283990
2016-10-12 09:00:44 +00:00
Quentin Colombet a907b5ca7c [AArch64][InstructionSelector] Fix unintended test changes in r283973.
I screwed up my merge conflict and lost some of the CHECK lines.

llvm-svn: 283974
2016-10-12 04:12:44 +00:00
Quentin Colombet 9de30faeac [AArch64][InstrustionSelector] Teach the selector about G_BITCAST.
llvm-svn: 283973
2016-10-12 03:57:52 +00:00
Quentin Colombet cb629a897c [AArch64][InstructionSelector] Refactor the handling of copies.
Although Copies are not specific to preISel, we still have to assign them
a proper register class. However, given they are not constrained to
anything we do not have to handle the source register at the copy. It
will be properly mapped when reaching the related definition.

In the process, the handlong of G_ANYEXT is slightly modified as those
end up being selected as copy. The difference is that when register size
do not match on both sides, we need to insert SUBREG_TO_REG operation,
otherwise the post RA copy expansion will not be happy!

llvm-svn: 283972
2016-10-12 03:57:49 +00:00
Quentin Colombet 5a0f5d4831 [AArch64][InstructionSelector] Fix typos in the related mir file. NFC.
llvm-svn: 283971
2016-10-12 03:57:46 +00:00
Quentin Colombet 404e4350dc [AArch64][MachineLegalizer] Mark more bitcasts as legal.
Those are copies, we do not have to do any legalization action for them.

llvm-svn: 283970
2016-10-12 03:57:43 +00:00
Sebastian Pop ab12fb62ee GVN-hoist: fix store past load dependence analysis (PR30216, PR30499)
This is a refreshed version of a patch that was reverted: it fixes
the problems reported in both PR30216 and PR30499, and
contains all the test-cases from both bugs.

To hoist stores past loads, we used to search for potential
conflicting loads on the hoisting path by following a MemorySSA
def-def link from the store to be hoisted to the previous
defining memory access, and from there we followed the def-use
chains to all the uses that occur on the hoisting path. The
problem is that the def-def link may point to a store that does
not alias with the store to be hoisted, and so the loads that are
walked may not alias with the store to be hoisted, and even as in
the testcase of PR30216, the loads that may alias with the store
to be hoisted are not visited.

The current patch visits all loads on the path from the store to
be hoisted to the hoisting position and uses the alias analysis
to ask whether the store may alias the load. I was not able to
use the MemorySSA functionality to ask for whether load and
store are clobbered: I'm not sure which function to call, so I
used a call to AA->isNoAlias().

Store past store is still working as before using a MemorySSA
query: I added an extra test to pr30216.ll to make sure store
past store does not regress.

Tested on x86_64-linux with check and a test-suite run.

Differential Revision: https://reviews.llvm.org/D25476

llvm-svn: 283965
2016-10-12 02:23:39 +00:00
Tim Shen 4ff62b187e [PPCMIPeephole] Fix splat elimination
Summary:
In PPCMIPeephole, when we see two splat instructions, we can't simply do the following transformation:
  B = Splat A
  C = Splat B
=>
  C = Splat A
because B may still be used between these two instructions. Instead, we should make the second Splat a PPC::COPY and let later passes decide whether to remove it or not:
  B = Splat A
  C = Splat B
=>
  B = Splat A
  C = COPY B

Fixes PR30663.

Reviewers: echristo, iteratee, kbarton, nemanjai

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D25493

llvm-svn: 283961
2016-10-12 00:48:25 +00:00
Michael Kuperstein 7adbf6b042 [DAG] Fix crash in build_vector -> vector_shuffle combine
Fixes a crash in the build_vector -> vector_shuffle combine
when the first vector input is twice as wide as the output,
and the second input vector is even wider.

llvm-svn: 283953
2016-10-11 22:44:31 +00:00
Tim Northover c1d8c2bf8c GlobalISel: support same-size casts on AArch64.
Mostly Ahmed's work again, I'm just sprucing things up slightly before
committing.

llvm-svn: 283952
2016-10-11 22:29:23 +00:00
Vedant Kumar de2ae96710 [InstrProf] Add support for dead_strip+live_support functionality
On Darwin, marking a section as "regular,live_support" means that a
symbol in the section should only be kept live if it has a reference to
something that is live. Otherwise, the linker is free to dead-strip it.

Turn this functionality on for the __llvm_prf_data section.

This means that counters and data associated with dead functions will be
removed from dead-stripped binaries. This will result in smaller
profiles and binaries, and should speed up profile collection.

Tested with check-profile, llvm-lit test/tools/llvm-{cov,profdata}, and
check-llvm.

Differential Revision: https://reviews.llvm.org/D25456

llvm-svn: 283947
2016-10-11 21:48:16 +00:00
Reid Kleckner bdfc05ff93 Re-land "[Thumb] Save/restore high registers in Thumb1 pro/epilogues"
Reverts r283938 to reinstate r283867 with a fix.

The original change had an ArrayRef referring to a destroyed temporary
initializer list. Use plain C arrays instead.

llvm-svn: 283942
2016-10-11 21:14:03 +00:00
Kevin Enderby 68fffa8a62 Next set of additional error checks for invalid Mach-O files for the
load commands that uses the MachO::linker_option_command
type but not used in llvm libObject code but used in llvm tool code.

This includes just LC_LINKER_OPTION load command.

llvm-svn: 283939
2016-10-11 21:04:39 +00:00
Reid Kleckner f4876beb2b Revert "[Thumb] Save/restore high registers in Thumb1 pro/epilogues"
This reverts r283867.

This appears to be an infinite loop:

    while (HiRegToSave != AllHighRegs.end() && CopyReg != AllCopyRegs.end()) {
      if (HiRegsToSave.count(*HiRegToSave)) {
        ...

        CopyReg = findNextOrderedReg(++CopyReg, CopyRegs, AllCopyRegs.end());
        HiRegToSave =
            findNextOrderedReg(++HiRegToSave, HiRegsToSave, AllHighRegs.end());
      }
    }

llvm-svn: 283938
2016-10-11 20:54:41 +00:00
Tim Northover 3d38b3a4d1 GlobalISel: support selection of extend operations.
Patch mostly by Ahmed Bougaca.

llvm-svn: 283937
2016-10-11 20:50:21 +00:00
Kyle Butt 0846e56e63 Codegen: Tail-duplicate during placement.
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.

In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.

This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.

Issue from previous rollback fixed, and a new test was added for that
case as well. Issue was worklist/scheduling/taildup issue in layout.

Issue from 2nd rollback fixed, with 2 additional tests. Issue was
tail merging/loop info/tail-duplication causing issue with loops that share
a header block.

Issue with early tail-duplication of blocks that branch to a fallthrough
predecessor fixed with test case: tail-dup-branch-to-fallthrough.ll

Differential revision: https://reviews.llvm.org/D18226

llvm-svn: 283934
2016-10-11 20:36:43 +00:00
Sanjay Patel c2bd185dc4 [x86] add tests for negate bool
llvm-svn: 283930
2016-10-11 20:15:20 +00:00
Kostya Serebryany 4d25ad93f3 [sanitizer-coverage] use private linkage for coverage guards, delete old commented-out code.
llvm-svn: 283924
2016-10-11 19:36:50 +00:00
Konstantin Zhuravlyov 78f3fa774b [AMDGPU] Fix test that was broken by rL283893
llvm-svn: 283911
2016-10-11 18:16:56 +00:00
Sanjay Patel 8253e15ef3 [DAG] add fold for masked negated sign-extended bool
This enhances the fold added with:
https://reviews.llvm.org/rL283900

llvm-svn: 283905
2016-10-11 17:05:52 +00:00
Sanjay Patel 58b4987284 [x86] add sext variants of tests added with r283894
llvm-svn: 283903
2016-10-11 16:49:52 +00:00
Bernard Ogden c5164132fe Let test pass for builds that support X86, but do not default to it
Differential Revision: https://reviews.llvm.org/D25471

llvm-svn: 283902
2016-10-11 16:34:49 +00:00
Bernard Ogden 2cf7ca3f35 Fix test on non-x86 hosts
Summary:
This test is allowed to run on non-x86 hosts and thus must use
llvm-nm rather than nm.

Differential Revision: https://reviews.llvm.org/D25473

llvm-svn: 283901
2016-10-11 16:32:37 +00:00
Sanjay Patel 8384703d9b [DAG] add fold for masked negated extended bool
The non-obvious motivation for adding this fold (which already happens in InstCombine)
is that we want to canonicalize IR towards select instructions and canonicalize DAG 
nodes towards boolean math. So we need to recreate some folds in the DAG to handle that
change in direction. 

An interesting implementation difference for cases like this is that InstCombine
generally works top-down while the DAG goes bottom-up. That means we need to detect 
different patterns. In this case, the SimplifyDemandedBits fold prevents us from 
performing a zext to sext fold that would then be recognized as a negation of a sext. 

llvm-svn: 283900
2016-10-11 16:26:36 +00:00
Sanjay Patel 9b3c8a7321 [x86] add tests to show missed folds for masked bools
llvm-svn: 283894
2016-10-11 16:04:37 +00:00
Changpeng Fang 98317d20f4 AMDGPU/SI: Update ISA version numbers for Tonga and Polaris10/11.
Differential Revision:
  http://reviews.llvm.org/D25454

Reviewers:
  tstellarAMD

llvm-svn: 283893
2016-10-11 16:00:47 +00:00
Simon Pilgrim 5b8627aada [X86][SSE] Regenerate scalar i64 uitofp test
Added 32-bit target test

llvm-svn: 283883
2016-10-11 14:01:38 +00:00
Simon Pilgrim 092cfc597f [X86][SSE] Regenerate vector load-trunc test
llvm-svn: 283881
2016-10-11 13:55:49 +00:00
Simon Pilgrim fe9fa7314c [X86][SSE] Regenerate vsplit and tests
To make it more obvious how bad some of that truncation code is....

llvm-svn: 283880
2016-10-11 13:51:44 +00:00
Sanjay Patel 6d71f7b348 [x86] update test to use FileCheck and auto-generate checks
llvm-svn: 283876
2016-10-11 13:36:07 +00:00
Oliver Stannard d2083fb356 [Thumb] Save/restore high registers in Thumb1 pro/epilogues
The high registers are not allocatable in Thumb1 functions, but they
could still be used by inline assembly, so we need to save and restore
the callee-saved high registers (r8-r11) in the prologue and epilogue.

This is complicated by the fact that the Thumb1 push and pop
instructions cannot access these registers. Therefore, we have to move
them down into low registers before pushing, and move them back after
popping into low registers.

In most functions, we will have low registers that are also being
pushed/popped, which we can use as the temporary registers for
saving/restoring the high registers. However, this is not guaranteed, so
we may need to push some extra low registers to ensure that the high
registers can be saved/restored. For correctness, it would be sufficient
to use just one low register, but if we have enough low registers
available then we only need one push/pop instruction, rather than one
per high register.

We can also use the argument/return registers when they are not live,
and the link register when saving (but not restoring), reducing the
number of extra registers we need to push.

There are still a few extreme edge cases where we need two push/pop
instructions, because not enough low registers can be made live in the
prologue or epilogue.

In addition to the regression tests included here, I've also tested this
using a script to generate functions which clobber different
combinations of registers, have different numbers of argument and return
registers (including variadic arguments), allocate different fixed sized
objects on the stack, and do or don't use variable sized allocas and the
__builtin_return_address intrinsic (all of which affect the available
registers in the prologue and epilogue). I ran these functions in a test
harness which verifies that all of the callee-saved registers are
correctly preserved.

Differential Revision: https://reviews.llvm.org/D24228

llvm-svn: 283867
2016-10-11 10:12:25 +00:00
Oliver Stannard 50a74393c2 [ARM] Fix registers clobbered by SjLj EH on soft-float targets
Currently, the Int_eh_sjlj_dispatchsetup intrinsic is marked as
clobbering all registers, including floating-point registers that may
not be present on the target. This is technically true, as we could get
linked against code that does use the FP registers, but that will not
actually work, as the soft-float code cannot save and restore the FP
registers. SjLj exception handling can only work correctly if either all
or none of the code is built for a target with FP registers. Therefore,
we can assume that, when Int_eh_sjlj_dispatchsetup is compiled for a
soft-float target, it is only going to be linked against other
soft-float code, and so only clobbers the general-purpose registers.
This allows us to check that no non-savable registers are clobbered when
generating the prologue/epilogue.

Differential Revision: https://reviews.llvm.org/D25180

llvm-svn: 283866
2016-10-11 10:06:59 +00:00
Diana Picus c93518db8c [AArch64] Allow label arithmetic with add/sub/cmp
Allow instructions such as 'cmp w0, #(end - start)' by folding the
expression into a constant. For ELF, we fold only if the symbols are in
the same section. For MachO, we fold if the expression contains only
symbols that are not linker visible.

Fixes https://llvm.org/bugs/show_bug.cgi?id=18920

Differential Revision: https://reviews.llvm.org/D23834

llvm-svn: 283862
2016-10-11 09:17:47 +00:00
George Rimar 5fecfaadc9 Reverted r283740 [Object/ELF] - Do not crash on invalid Header->e_shoff value.
Bot does not like it: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/17075

/mnt/b/sanitizer-buildbot3/sanitizer-x86_64-linux-fast/build/llvm/test/Object/invalid.test:70:32: error: expected string not found in input
INVALID-SEC-ADDRESS-ALIGNMENT: Invalid address alignment of section headers
                               ^
<stdin>:1:1: note: scanning from here
/mnt/b/sanitizer-buildbot3/sanitizer-x86_64-linux-fast/build/llvm/include/llvm/Object/ELF.h:412:7: runtime error: upcast of misaligned address 0x000002d8b899 for type 'llvm::object::Elf_Shdr_Impl<llvm::object::ELFType<llvm::support::endianness::little, true> >', which requires 2 byte alignment
^
<stdin>:1:125: note: possible intended match here
/mnt/b/sanitizer-buildbot3/sanitizer-x86_64-linux-fast/build/llvm/include/llvm/Object/ELF.h:412:7: runtime error: upcast of misaligned address 0x000002d8b899 for type 'llvm::object::Elf_Shdr_Impl<llvm::object::ELFType<llvm::support::endianness::little, true> >', which requires 2 byte alignment
          

llvm-svn: 283858
2016-10-11 08:12:27 +00:00
Daniel Jasper 0c42dc4784 Revert "Codegen: Tail-duplicate during placement."
This reverts commit r283842.

test/CodeGen/X86/tail-dup-repeat.ll causes and llc crash with our
internal testing. I'll share a link with you.

llvm-svn: 283857
2016-10-11 07:36:11 +00:00
Matthias Braun 74ad41c7cd MIRParser: Rewrite register info initialization; mostly NFC
This changes MachineRegisterInfo to be initializes after parsing all
instructions. This is in preparation for upcoming commits that allow the
register class specification on the operand or deduce them from the
MCInstrDesc.

This commit removes the unused feature of having nonsequential register
numbers. This was confusing anyway as the vreg numbers would be
different after parsing when you had "holes" in your numbering.

This patch also introduces the concept of an incomplete virtual
register. An incomplete virtual register may be used during .mir parsing
to construct MachineOperands without knowing the exact register class
(or register bank) yet.

NFC except for some error messages.

Differential Revision: https://reviews.llvm.org/D22397

llvm-svn: 283848
2016-10-11 03:13:01 +00:00
Kyle Butt ae068a320c Codegen: Tail-duplicate during placement.
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.

In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.

This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.

Issue from previous rollback fixed, and a new test was added for that
case as well. Issue was worklist/scheduling/taildup issue in layout.

Issue from 2nd rollback fixed, with 2 additional tests. Issue was
tail merging/loop info/tail-duplication causing issue with loops that share
a header block.

Issue with early tail-duplication of blocks that branch to a fallthrough
predecessor fixed with test case: tail-dup-branch-to-fallthrough.ll

Differential revision: https://reviews.llvm.org/D18226

llvm-svn: 283842
2016-10-11 01:20:33 +00:00
Dylan McKay c328fe5af4 [RegAllocGreedy] Attempt to split unspillable live intervals
Summary:
Previously, when allocating unspillable live ranges, we would never
attempt to split. We would always bail out and try last ditch graph
recoloring.

This patch changes this by attempting to split all live intervals before
performing recoloring.

This fixes LLVM bug PR14879.

I can't add test cases for any backends other than AVR because none of
them have small enough register classes to trigger the bug.

Reviewers: qcolombet

Subscribers: MatzeB

Differential Revision: https://reviews.llvm.org/D25070

llvm-svn: 283838
2016-10-11 01:04:36 +00:00
David Majnemer 80dca0c78f [InstCombine] Transform !range metadata to !nonnull when combining loads
When combining an integer load with !range metadata that does not include 0 to a pointer load, make sure emit !nonnull metadata on the newly-created pointer load. This prevents the !nonnull metadata from being dropped during a ptrtoint/inttoptr pair.

This fixes PR30597.

Patch by Ariel Ben-Yehuda!

Differential Revision: https://reviews.llvm.org/D25215

llvm-svn: 283836
2016-10-11 01:00:45 +00:00
Quentin Colombet d2623f8e38 [AArch64][InstructionSelector] Teach how to select FP load/store.
This patch allows to select 32 and 64-bit FP load and store.

llvm-svn: 283832
2016-10-11 00:21:14 +00:00
Quentin Colombet 0e5312787e [AArch64][InstructionSelector] Teach the selector how to handle vector OR.
This only adds the support for 64-bit vector OR. Adding more sizes is
not difficult, but it requires a bigger refactoring because ORs work on
any size, not necessarly the ones that match the width of the register
width. Right now, this is not expressed in the legalization, so don't
bother pushing the refactoring yet.

llvm-svn: 283831
2016-10-11 00:21:11 +00:00
Quentin Colombet d3126d5fb4 [AArch64][MachineLegalizer] Mark v2s32 G_LOAD as legal.
Actually every 64-bit loads are legal, but right now the API does not
offer a simple way to express that.

llvm-svn: 283829
2016-10-11 00:21:08 +00:00
Sanjay Patel 3013a62dd8 [x86] auto-generate checks
llvm-svn: 283812
2016-10-10 22:04:12 +00:00
Sanjay Patel b493cdaabf [x86] auto-generate checks
llvm-svn: 283811
2016-10-10 22:01:42 +00:00
Tim Northover bdf1624367 GlobalISel: select G_GLOBAL_VALUE uses on AArch64.
llvm-svn: 283809
2016-10-10 21:50:00 +00:00
Tim Northover ad0acca544 GlobalISel: allow G_GLOBAL_VALUEs in AArch64 legalization.
llvm-svn: 283808
2016-10-10 21:49:53 +00:00
Tim Northover 2fda4b08ae GlobalISel: support selecting G_GEP instructions.
They're basically just an alias for G_ADD on AArch64.

llvm-svn: 283807
2016-10-10 21:49:49 +00:00
Tim Northover 4edc60d785 GlobalISel: support selecting constants on AArch64.
llvm-svn: 283806
2016-10-10 21:49:42 +00:00
Hal Finkel fcd2421667 [SelectionDAGBuilder] Support llvm.flt.rounds on targets where i32 is not legal
Add integer expansion for FLT_ROUNDS_ for targets where i32 is not a legal
type.

Patch by Edward Jones, thanks!

Differential Revision: https://reviews.llvm.org/D24459

llvm-svn: 283797
2016-10-10 20:45:15 +00:00
Adrian Prantl 3bfe1093df Teach llvm::StripDebugInfo() about global variable !dbg attachments.
This is a regression introduced by the global variable ownership
reversal performed in r281284.

rdar://problem/28448075

llvm-svn: 283784
2016-10-10 17:53:33 +00:00
Alexandros Lamprineas 20e9ddba73 [ARM] Fix invalid VLDM/VSTM access when targeting Big Endian with NEON
The instructions VLDM/VSTM can only access word-aligned memory
locations and produce alignment fault if the condition is not met.

The compiler currently generates VLDM/VSTM for v2f64 load/store
regardless the alignment of the memory access. Instead, if a v2f64
load/store is not word-aligned, the compiler should generate
VLD1/VST1. For each non double-word-aligned VLD1/VST1, a VREV
instruction should be generated when targeting Big Endian.

Differential Revision: https://reviews.llvm.org/D25281

llvm-svn: 283763
2016-10-10 16:01:54 +00:00
Zvi Rackover 2a21f125bd [X86] Prefer rotate by 1 over rotate by imm
Summary:
Rotate by 1 is translated to 1 micro-op, while rotate with imm8 is translated to 2 micro-ops.

Fixes pr30644.

Reviewers: delena, igorb, craig.topper, spatel, RKSimon

Differential Revision: https://reviews.llvm.org/D25399

llvm-svn: 283758
2016-10-10 14:43:55 +00:00
Simon Pilgrim cfef627b1f [SLPVectorizer][X86] Add 512-bit sitofp/uitofp tests
llvm-svn: 283756
2016-10-10 14:28:06 +00:00
Simon Pilgrim 2c0733c678 [SLPVectorizer][X86] Add avx512 sitofp/uitofp tests
llvm-svn: 283751
2016-10-10 14:14:31 +00:00
Simon Pilgrim 6cadb5610e [SLPVectorizer][X86] Fixed alignments of scalar loads in sitofp/uitofp tests
Fixed copy+paste vector alignment to correct for per-element scalar loads

Increased to 512-bit data sizes in preparation of avx512 tests

llvm-svn: 283748
2016-10-10 14:10:41 +00:00
Simon Pilgrim 4aea8e8a39 Fixed windows stdout/stderr redirection in inline asm constraint tests
llvm-svn: 283741
2016-10-10 11:11:27 +00:00
George Rimar e4dce5ce3e [Object/ELF] - Do not crash on invalid Header->e_shoff value.
sections_begin() may return unalignment pointer when Header->e_shoff isinvalid.
That may result in a crash in clients, for example we have one in LLD:

assert((PtrWord & ~PointerBitMask) == 0 &&
       "Pointer is not sufficiently aligned");
fails when trying to push_back Elf_Shdr* (unaligned) into TinyPtrVector.

Patch forces check for alignment of Header->e_shoff.

Differential revision: https://reviews.llvm.org/D25368

llvm-svn: 283740
2016-10-10 10:51:38 +00:00
Chris Dewhurst 850131213f This pass, fixing an erratum in some LEON 2 processors ensures that the SDIV instruction is not issued, but replaced by SDIVcc instead, which does not exhibit the error. Unit test included.
Differential Review: https://reviews.llvm.org/D24660

llvm-svn: 283727
2016-10-10 08:53:06 +00:00
Craig Topper 9ece2f7529 [AVX-512] Add missing pattern sext or zext from bytes to quad words with a 128-bit load as input.
llvm-svn: 283720
2016-10-10 06:25:48 +00:00
Craig Topper 0f905027b3 [AVX-512] Add test cases for AVX512 sign/zero extend instructions derived from the sse41 and avx2 test cases. Code will be improved in future commits.
llvm-svn: 283719
2016-10-10 06:25:45 +00:00
Craig Topper aba15075da [AVX-512] Add an AVX512VL/BW command line to sse41-pmovxrm.ll and avx2-pmovxrm.ll. Also disable peephole so we really test pattern matching.
llvm-svn: 283718
2016-10-10 06:25:42 +00:00
Michael Zuckerman 3eeac2d56b [x86][inline-asm][llvm] accept 'v' constraint
Commit in the name of:Coby Tayree
1.'v' constraint for (x86) non-avx arch imitates the already implemented 'x' constraint, i.e. allows XMM{0-15} & YMM{0-15} depending on the apparent arch & mode (32/64).
2.for the avx512 arch it allows [X,Y,Z]MM{0-31} (mode dependent)

This patch applies the needed changes to clang
 clang patch: https://reviews.llvm.org/D25004

Differential Revision: D25005
 

llvm-svn: 283717
2016-10-10 05:48:56 +00:00
Craig Topper 64378f4378 [AVX-512] Port 128 and 256-bit memory->register sign/zero extend patterns from SSE file. Also add a minimal set for 512-bit.
llvm-svn: 283704
2016-10-09 23:08:39 +00:00
Zvi Rackover b764bf2987 [X86] Adding the 'nounwind' attribute to test functions for cleaner generated code
Thanks to RKSimon for the suggestion.

llvm-svn: 283696
2016-10-09 13:33:51 +00:00
Zvi Rackover f841080caf [X86] Improve the rotate ISel test
Summary:
- Added 64-bit target testing.
- Added 64-bit operand test cases.
- Added cases that demonstrate pr30644

Reviewers: RKSimon, craig.topper, igorb

Differential Revision: https://reviews.llvm.org/D25401

llvm-svn: 283695
2016-10-09 13:07:25 +00:00
Elena Demikhovsky 5b10aa1f1e DAG: Setting Masked-Expand-Load as a variant of Masked-Load node
Masked-expand-load node represents load operation that loads a variable amount of elements from memory according to amount of "true" bits in the mask and expands the loaded elements according to their position in the mask vector.
Right now, the node is used in intrinsics for VEXPAND* instructions. 
The work is done towards implementation of masked.expandload and masked.compressstore intrinsics.

Differential Revision: https://reviews.llvm.org/D25322

llvm-svn: 283694
2016-10-09 10:48:52 +00:00
Craig Topper 43973154dd [AVX-512] Fix execution domain for EVEX encoded VINSERTPS.
llvm-svn: 283692
2016-10-09 06:41:47 +00:00
Craig Topper e30cb00dc0 [AVX-512] Add subvector insert and extract to load/store folding tables.
llvm-svn: 283689
2016-10-09 03:54:13 +00:00
Craig Topper 50a468e03f [AVX-512] Add avx512dq to the fp stack folding test.
llvm-svn: 283688
2016-10-09 03:54:09 +00:00
Craig Topper 4262d53024 [AVX-512] Add the vector down convert instructions to the store folding tables.
llvm-svn: 283687
2016-10-09 03:54:05 +00:00
Mehdi Amini 8ec7b4f588 ThinLTO: Fix Gold test after caching fix in r283655
(I don't have Gold available, so this is speculative)

llvm-svn: 283681
2016-10-08 22:49:28 +00:00
Simon Pilgrim 319c094771 [X86][SSE] Regenerate select tests
llvm-svn: 283674
2016-10-08 21:17:44 +00:00
Zvi Rackover ce4900aaa6 Revert "[X86] Apply the Update LLC Test Checks tool on the rotate tests."
This reverts commit 283667.

llvm-svn: 283673
2016-10-08 20:54:20 +00:00
Simon Pilgrim 9e7a22fc13 [X86][SSE] Regenerate and add 32-bit tests to widening tests
llvm-svn: 283672
2016-10-08 19:54:28 +00:00
Simon Pilgrim 30cbd1ab84 Fix comment typos - full update script path in assertions note
llvm-svn: 283670
2016-10-08 18:51:55 +00:00
Craig Topper 2067142d7d [AVX-512] Add test case for PR30430 that I should have added in r281959.
llvm-svn: 283669
2016-10-08 18:50:00 +00:00