Commit Graph

36314 Commits

Author SHA1 Message Date
Simon Pilgrim 7823fd2535 [X86][SSE] Select domain for 32/64-bit partial loads for EltsFromConsecutiveLoads
Choose between MOVD/MOVSS and MOVQ/MOVSD depending on the target vector type.

This has a lot fewer test changes than trying to add this to X86InstrInfo::setExecutionDomain.....

llvm-svn: 259816
2016-02-04 19:27:51 +00:00
Chad Rosier 05f8020cdf [AArch64] Improve load/store optimizer to handle LDUR + LDR (take 3).
This patch allows the mixing of scaled and unscaled load/stores to form
load/store pairs.

PR24465
http://reviews.llvm.org/D12116
Many thanks to Ahmed and Michael for fixes and code review.

This is a reapplication of r246769 and r259790.  The tramp3d failure was caused
by an incorrect refactoring in the patch.  Specifically, we weren't always
properly clearing the SExtIdx flag.

llvm-svn: 259812
2016-02-04 18:59:49 +00:00
Silviu Baranga 33b3bd17dd [AArch64] Multiply extended 32-bit ints with `[U|S]MADDL'
During instruction selection, the AArch64 backend can recognise the
following pattern and generate an [U|S]MADDL instruction, i.e. a
multiply of two 32-bit operands with a 64-bit result:

(mul (sext i32), (sext i32))
However, when one of the operands is constant, the sign extension
gets folded into the constant in SelectionDAG::getNode(). This means
that the instruction selection sees this:

(mul (sext i32), i64)
...which doesn't match the pattern. Sign-extension and 64-bit
multiply instructions are generated, which are slower than one 32-bit
multiply.

Add a pattern to match this and generate the correct instruction, for
both signed and unsigned multiplies.

Patch by Chris Diamand!

llvm-svn: 259800
2016-02-04 16:47:09 +00:00
Simon Pilgrim 6788f33cf2 [X86][SSE] Add general 32-bit LOAD + VZEXT_MOVL support to EltsFromConsecutiveLoads
This patch adds support for consecutive (load/undef elements) 32-bit loads, followed by trailing undef/zero elements to be combined to a single MOVD load.

Differential Revision: http://reviews.llvm.org/D16729

llvm-svn: 259796
2016-02-04 16:12:56 +00:00
Chad Rosier 18896c0f5e Revert "[AArch64] Improve load/store optimizer to handle LDUR + LDR."
This reverts commit r259790. tramp3d-v4 is still having problems.

llvm-svn: 259795
2016-02-04 16:01:40 +00:00
Elena Demikhovsky 86528270b9 AVX-512: Fixed a bug in FMA instruction selection on KNL
The FMA instruction was selected from AVX2 set instead of AVX-512

Differential Revision: http://reviews.llvm.org/D16884

llvm-svn: 259792
2016-02-04 15:11:11 +00:00
Chad Rosier feec2aeb0f [AArch64] Improve load/store optimizer to handle LDUR + LDR.
This patch allows the mixing of scaled and unscaled load/stores to form
load/store pairs.

PR24465
http://reviews.llvm.org/D12116
Many thanks to Ahmed and Michael for fixes and code review.

This is a reapplication of r246769, which was reverted in r246782 due to a
test-suite failure.  I'm unable to reproduce the issue at this time.

llvm-svn: 259790
2016-02-04 14:42:55 +00:00
Michael Zuckerman 7d73360479 [AVX512] add vfmadd132ss and vfmadd132sd Intrinsic
Differential Revision: http://reviews.llvm.org/D16589

llvm-svn: 259789
2016-02-04 14:41:08 +00:00
Simon Pilgrim 1d2d6c5a57 [X86] Moved SEXT -> SIGN_EXTEND_VECTOR_INREG combine into helper. NFC.
llvm-svn: 259771
2016-02-04 09:27:19 +00:00
Andrey Turetskiy bca0f99224 [X86] Use hash table in LEA optimization pass.
Use hash table (key is a memory operand) to store found LEA instructions to reduce compile time.

Differential Revision: http://reviews.llvm.org/D16404

llvm-svn: 259770
2016-02-04 08:57:03 +00:00
Jingyue Wu f650441b04 [NVPTX] Disable performance optimizations when OptLevel==None
Reviewers: jholewinski, tra, eliben

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D16874

llvm-svn: 259749
2016-02-04 04:15:36 +00:00
Sanjay Patel 460ce9cd9b clean up; NFC
llvm-svn: 259720
2016-02-03 22:37:37 +00:00
Saleem Abdulrasool f36005a358 ARM: support TLS for WoA
Add support for TLS access for Windows on ARM.  This generates a similar access
to MSVC for ARM.

The changes to the tablegen data is needed to support loading an external symbol
global that is not for a call.  The adjustments to the DAG to DAG transforms are
needed to preserve the 32-bit move.

llvm-svn: 259676
2016-02-03 18:21:59 +00:00
Renato Golin 6027dd38ef [ARM] Move GNUEABI divmod to __aeabi_divmod*
The GNU toolchain emits __aeabi_divmod for soft-divide on ARM cores
which happens to be a lot faster than __divsi3/__modsi3 when the core
has hardware divide instructions. Do the same here.

Fixes PR26450.

llvm-svn: 259657
2016-02-03 16:10:54 +00:00
Daniel Sanders 3b1a2dbffa [mips] Remove redundant inclusions of MipsAnalyzeImmediate.h
llvm-svn: 259655
2016-02-03 15:54:12 +00:00
Nemanja Ivanovic 82e1168989 Fix for PR 26381
Simple fix - Constant values were not being sign extended in FastIsel.

llvm-svn: 259645
2016-02-03 12:53:38 +00:00
Simon Atanasyan e774126c96 [mips] Add SHF_MIPS_GPREL flag to the MIPS .sbss and .sdata sections
MIPS ABI states that .sbss and .sdata sections must have SHF_MIPS_GPREL
flag. See Figure 4–7 on page 69 in the following document:
ftp://www.linux-mips.org/pub/linux/mips/doc/ABI/mipsabi.pdf.

Differential Revision: http://reviews.llvm.org/D15740

llvm-svn: 259641
2016-02-03 11:50:22 +00:00
Simon Pilgrim 18bcf93efb [X86][AVX] Add support for 64-bit VZEXT_LOAD of 256/512-bit vectors to EltsFromConsecutiveLoads
Follow up to D16217 and D16729

This change uncovered an odd pattern where VZEXT_LOAD v4i64 was being lowered to a load of the lower v2i64 (so the 2nd i64 destination element wasn't being zeroed), I can't find any use/reason for this and have removed the pattern and replaced it so only the 1st i64 element is loaded and the upper bits all zeroed. This matches the description for X86ISD::VZEXT_LOAD

Differential Revision: http://reviews.llvm.org/D16768

llvm-svn: 259635
2016-02-03 09:41:59 +00:00
Kyle Butt d62d8b771d Codegen: [PPC] Fix PPCVSXFMAMutate to handle duplicates.
The purpose of PPCVSXFMAMutate is to elide copies by changing FMA forms
on PPC.

    %vreg6<def> = COPY %vreg96
    %vreg6<def,tied1> = XSMADDASP %vreg6<tied0>, %vreg5<kill>, %vreg7
    ;v6 = v6 + v5 * v7

is replaced by

    %vreg5<def,tied1> = XSMADDMSP %vreg5<tied0>, %vreg7, %vreg96
    ;v5 = v5 * v7 + v96

This was broken in the case where the target register was also used as a
multiplicand. Fix this case by checking for it and replacing both uses
with the copied register.

    %vreg6<def> = COPY %vreg96
    %vreg6<def,tied1> = XSMADDASP %vreg6<tied0>, %vreg5<kill>, %vreg6
    ;v6 = v6 + v5 * v6

is replaced by

    %vreg5<def,tied1> = XSMADDMSP %vreg5<tied0>, %vreg96, %vreg96
    ;v5 = v5 * v96 + v96

llvm-svn: 259617
2016-02-03 01:41:09 +00:00
Yunzhong Gao eb959722a7 Revert r259576: Disable the vzeroupper insertion pass on PS4.
Will re-implement based on review feedback.

llvm-svn: 259615
2016-02-03 01:25:12 +00:00
Yunzhong Gao b76ccacfb1 Disable the vzeroupper insertion pass on PS4.
See comments in test/CodeGen/X86/avx-vzeroupper.ll for more explanation.

Original patch by: Sean Silva

llvm-svn: 259576
2016-02-02 21:39:23 +00:00
Matt Arsenault de4208122b AMDGPU: Do not promote allocas with non-inbounds GEPs
If we can't assume the pointer value isn't within the bounds
of the object, it seems risky to try to replace the pointer
calculations.

llvm-svn: 259573
2016-02-02 21:16:12 +00:00
Matt Arsenault 7e747f1a38 AMDGPU: Handle promoting memmove
Also add missing tests for the others.

llvm-svn: 259558
2016-02-02 20:28:10 +00:00
Quentin Colombet b8fb2ba1bb [X86] Fix the merging of SP updates in prologue/epilogue insertions.
When the merging was involving LEAs, we were taking the wrong immediate
from the list of operands.

rdar://problem/24446069

llvm-svn: 259553
2016-02-02 20:11:17 +00:00
Matt Arsenault 8b175672cb AMDGPU: Skip promote alloca with no optimizations
llvm-svn: 259551
2016-02-02 19:32:42 +00:00
Matt Arsenault fb8cdbae0c AMDGPU: Minor cleanups for AMDGPUPromoteAlloca
Mostly convert to use range loops.

llvm-svn: 259550
2016-02-02 19:32:35 +00:00
Matt Arsenault e5737f7cac AMDGPU: Report AMDGPUPromoteAlloca changed the function
llvm-svn: 259547
2016-02-02 19:18:57 +00:00
Matt Arsenault ad1348459f AMDGPU: Whitelist handled intrinsics
We shouldn't crash on unhandled intrinsics.
Also simplify failure handling in loop.

llvm-svn: 259546
2016-02-02 19:18:53 +00:00
Matt Arsenault 853a1fc6d9 AMDGPU: Use inbounds when calculating workitem offset
When promoting allocas to LDS, we know we are indexing
into a specific area just created, and the calculation
will also never overflow.

Also emit some of the muls as nsw nuw, because instcombine
infers this already from the range metadata. I think
putting this on the other adds and muls might be OK too,
but I'm not 100% sure.

llvm-svn: 259545
2016-02-02 19:18:48 +00:00
Eugene Zelenko ecefe5a81f Fix Clang-tidy readability-redundant-control-flow warnings; other minor fixes.
Differential revision: http://reviews.llvm.org/D16793

llvm-svn: 259539
2016-02-02 18:20:45 +00:00
Derek Schuff c6d8fd3f54 [MC] Enable eip-relative addressing on x86-64 for X32 ABI
Summary:
Enables eip-based addressing, e.g.,

lea    constant(%eip), %rax
lea    constant(%eip), %eax

in MC, (used for the x32 ABI). EIP-base addressing is also valid in x86_64,
it is left enabled for that architecture as well.

Patch by João Porto

Differential Revision: http://reviews.llvm.org/D16581

llvm-svn: 259528
2016-02-02 17:20:04 +00:00
Chad Rosier 1142f3cf90 [AArch64] Add a FIXME comment.
llvm-svn: 259515
2016-02-02 15:22:55 +00:00
Chad Rosier bba881ef3d [AArch64] Allocate the modified and used regs only once per function.
llvm-svn: 259510
2016-02-02 15:02:30 +00:00
JF Bastien 926b189a81 WebAssembly: update expected GCC torture test failures
The 3 programs used __attribute__((mode(?))) on enum, which clang r259497 fixed.

llvm-svn: 259508
2016-02-02 14:27:34 +00:00
Oliver Stannard 7e7d983a87 Refactor backend diagnostics for unsupported features
Re-commit of r258951 after fixing layering violation.

The BPF and WebAssembly backends had identical code for emitting errors
for unsupported features, and AMDGPU had very similar code. This merges
them all into one DiagnosticInfo subclass, that can be used by any
backend.

There should be minimal functional changes here, but some AMDGPU tests
have been updated for the new format of errors (it used a slightly
different format to BPF and WebAssembly). The AMDGPU error messages will
now benefit from having precise source locations when debug info is
available.

llvm-svn: 259498
2016-02-02 13:52:43 +00:00
Simon Pilgrim 96fe4ef5f7 [X86][AVX512] Add support for AVX512 VMOVQ (load) shuffle decoding
llvm-svn: 259496
2016-02-02 13:32:56 +00:00
JF Bastien dc1255f02f WebAssembly: add option to disable register coloring
Having this hidden option makes it easier to debug other issues.

llvm-svn: 259482
2016-02-02 09:30:01 +00:00
Sjoerd Meijer ffe19f5245 Removed FeatureVFPOnlySP from the Cortex-R7 processor model
description and changed the regression test accordingly.
The default configuration of a Cortex-R7 is to implement the
VFPv3-D16 architecture and the feature line as it was is too
restrictive.

llvm-svn: 259480
2016-02-02 09:28:20 +00:00
Sanjoy Das 881de4d12a [X86] Fix a bug in getMemOpBaseRegImmOfs
Fix a crash in `getMemOpBaseRegImmOfs` that happens if the base of
`MemOp` is a frame index memory operand.  The fix is to have
`getMemOpBaseRegImmOfs` bail out in such cases.  We can possibly be more
clever here, if needed.

llvm-svn: 259456
2016-02-02 02:32:43 +00:00
Ahmed Bougacha 68a8efa374 [X86][FastISel] Don't force Nearest-Even rounding for VCVTPS2PH, use MXCSR.
FastISel counterpart to r259448.

llvm-svn: 259449
2016-02-02 01:44:03 +00:00
Ahmed Bougacha 55c6682ae2 [X86] Don't force Nearest-Even rounding for VCVTPS2PH, use MXCSR.
Officially, we don't acknowledge non-default configurations of MXCSR,
as getting there would require usage of the FENV_ACCESS pragma (at
least insofar as rounding mode is concerned).

We don't support the pragma, so we can assume that the default
rounding mode - round to nearest, ties to even - is always used.

However, it's inconsistent with the rest of the instruction set,
where MXCSR is always effective (unless otherwise specified).
Also, it's an unnecessary obstacle to the few brave souls that use
fenv.h with LLVM.

Avoid the hard-coded rounding mode for fp_to_f16; use MXCSR instead.

llvm-svn: 259448
2016-02-02 01:32:50 +00:00
Sanjay Patel c54600dbb1 fix typos; NFC
llvm-svn: 259438
2016-02-01 23:53:35 +00:00
Simon Pilgrim 5be17b6e3e [X86][AVX512] Add support for AVX512 VMOVD (load) shuffle decoding
llvm-svn: 259430
2016-02-01 23:04:05 +00:00
Simon Pilgrim f5c23ad3d7 [X86][AVX512] Add support for AVX512 VMOVSD/VMOVSS shuffle decoding
llvm-svn: 259427
2016-02-01 22:26:28 +00:00
Simon Pilgrim 025a3d857a [X86][AVX512] Add support for AVX512 VINSERTPS shuffle decoding
llvm-svn: 259420
2016-02-01 22:05:50 +00:00
Matthias Braun 3f88eabe93 SmallSet/SmallPtrSet: Refuse huge Small numbers
These sets do linear searching in small mode; It is not a good idea to
use huge numbers as the small value here, save people from themselves by
adding a static_assert.

Differential Revision: http://reviews.llvm.org/D16706

llvm-svn: 259419
2016-02-01 22:05:16 +00:00
Chad Rosier dbdb1d6eaf Move comments a bit closer to associated code. NFC.
llvm-svn: 259411
2016-02-01 21:38:31 +00:00
Chad Rosier 064261da16 Remove extra semicolon. NFC.
llvm-svn: 259402
2016-02-01 20:54:36 +00:00
Balaram Makam 92431703d7 AArch64: Implement missed conditional compare sequences.
Summary:
This is an extension to the existing implementation of r242436 which
restricts to only select inputs. This version fixes missed opportunities
in pr26084 by attempting to lower conditional compare sequences of
and/or trees with setcc leafs. This will additionaly handle the case
when a tree with select input is not a conjunction-disjunction tree
but some of the sub trees are conjunction-disjunction trees.

Reviewers: jmolloy, t.p.northover, mcrosier, MatzeB

Subscribers: mcrosier, llvm-commits, junbuml, haicheng, mssimpso, gberry

Differential Revision: http://reviews.llvm.org/D16291

llvm-svn: 259387
2016-02-01 19:13:07 +00:00
Geoff Berry 29d4a695f4 [AArch64] Simplify prolog/epilog callee save/restore. NFC.
Summary:
Factor out common code for callee-save register pair calculation.  This
is intended to simplify follow-on changes that reduce the number of
registers saved/restored.

Depends on D16732

Reviewers: mcrosier, jmolloy, t.p.northover

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D16734

llvm-svn: 259384
2016-02-01 19:07:06 +00:00
Ulrich Weigand 4a4d4ab7a4 [SystemZ] Fix wrong-code generation for certain always-false conditions
We've found another bug in the code generation logic conditions for a
certain class of always-false conditions, those of the form
   if ((a & 1) < 0)

These only reach the back end when compiling without optimization.

The bug was introduced by the choice of using TEST UNDER MASK
to implement a check for
   if ((a & MASK) < VAL)
as
   if ((a & MASK) == 0)

where VAL is less than the the lowest bit of MASK.  This is correct
in all cases except for VAL == 0, in which case the original
condition is always false, but the replacement isn't.

Fixed by excluding that particular case.

llvm-svn: 259381
2016-02-01 18:31:19 +00:00
Colin LeMahieu 6fdfa3dc32 [NFC] Referencing manual for reason why subregbit is checked
llvm-svn: 259380
2016-02-01 18:15:39 +00:00
Geoff Berry 04bf91a8c1 [AArch64] Simplify callee-save register save/restore. NFC.
Summary:
Simplify callee-save register save/restore code generation by
remembering the size of the callee-save area when it is computed so we
don't have to scan the prologue/epilogue instructions again later to
reconstruct it.

This is intended to simplify follow-on changes that reduce the number of
registers saved/restored.

Reviewers: mcrosier, jmolloy, t.p.northover

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D16732

llvm-svn: 259365
2016-02-01 16:29:19 +00:00
Asaf Badouh 5a3a0231f4 [X86][AVX512VBMI] add encoding and intrinsics for Multishift
Differential Revision: http://reviews.llvm.org/D16399

llvm-svn: 259363
2016-02-01 15:48:21 +00:00
Daniel Sanders f8bb23e509 [mips] Range check uimm16 and fix several bugs this revealed.
Summary:
The bugs were:
* teq and similar take 4-bit unsigned immediates on microMIPS.
* teqi and similar have side-effects like teq do.
* shll_s.w and shra_r.w take 5-bit unsigned immediates.
* The various DSP ext* instructions take a 5-bit immediate.
* repl.qh takes an 8-bit unsigned immediate.
* repl.ph takes a 10-bit unsigned immediate.
* rddsp/wrdsp take a 10-bit unsigned immediate.
* teqi and similar take signed 16-bit immediates (10-bit for microMIPS).
* Out-of-range immediate macros for or/xor take a simm32/simm64 depending
  on architecture. I'll fix the simm64 case properly when I reach simm32.

lui is a bit more lenient than GAS and accepts signed immediates in addition
to unsigned. This is because MipsMCExpr can produce signed values when
constant folding and it currently lacks a way of knowing it should fold to
an unsigned value.

Reviewers: vkalintiris

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D15446

llvm-svn: 259360
2016-02-01 15:13:31 +00:00
JF Bastien a5b8ea0d66 WebAssembly NFC: simplify control flow
This should now be easier to read.

llvm-svn: 259349
2016-02-01 10:46:16 +00:00
Igor Breger 56b039ea17 AVX512: fix mask handling for gather/scatter/prefetch intrinsics.
Differential Revision: http://reviews.llvm.org/D16755

llvm-svn: 259346
2016-02-01 09:57:15 +00:00
Simon Pilgrim 1358d86659 [X86][SSE] Find source of the inserted element of INSERTPS
Minor patch to trace back through target shuffles to the source of the inserted element in a (V)INSERTPS shuffle.

Differential Revision: http://reviews.llvm.org/D16652

llvm-svn: 259343
2016-02-01 08:59:30 +00:00
Igor Breger 6cc9115cec AVX512 : Fix SETCCE lowering for KNL 32 bit.
Differential Revision: http://reviews.llvm.org/D16752

llvm-svn: 259342
2016-02-01 07:56:09 +00:00
David Majnemer efb41741f2 [X86] Cleanup the WinEHState pass
Remove unnecessary includes and class state.

No functional change intended.

llvm-svn: 259340
2016-02-01 04:28:59 +00:00
Craig Topper 3ef74f5956 Replace usages of llvm::utostr_32 with just llvm::utostr. While this is less efficient, its unclear the few places that were using the _32 version were doing so for efficiency.
llvm-svn: 259330
2016-01-31 20:00:24 +00:00
JF Bastien 578c8cde53 WebAssembly: more failures are gone
llvm-svn: 259321
2016-01-31 08:19:40 +00:00
JF Bastien ac9e8664a4 WebAssembly: update expected failures
r259305 fixed a few assertions around FrameIndex, and I forgot to update these failures despite having run the torture tests.

llvm-svn: 259320
2016-01-31 08:05:05 +00:00
Derek Schuff c97ba939d1 [WebAssembly] Fix uses of FrameIndex as store values
Previously the code assumed all uses of FI on loads and stores were as
addresses. This checks whether the use is the address or a value and
handles the latter case as it does for non-memory instructions.

llvm-svn: 259306
2016-01-30 21:43:08 +00:00
JF Bastien fbc89d21dd WebAssembly: don't optimize frameindex store
The previous code was incorrect (can't getReg a frameindex). We could instead optimize it to reduce tree height, but I'm not sure that's worthwhile yet because we then try to eliminate the frameindex.

This patch also fixes frame index elimination for operations which may load or store: it used to assume the base was operand 2 and immediate offset operand 1. That's not true for stores, where they're 4 and 3.

llvm-svn: 259305
2016-01-30 14:11:26 +00:00
JF Bastien 3ca3ea690f WebAssembly NFC: fix build warning
WebAssemblyFrameLowering.cpp:158:44: warning: enumeral and non-enumeral type in conditional expression [enabled by default]

llvm-svn: 259303
2016-01-30 11:19:26 +00:00
Matt Arsenault e013246462 AMDGPU: Fix emitting invalid workitem intrinsics for HSA
The AMDGPUPromoteAlloca pass was emitting the read.local.size
calls, which with HSA was incorrectly selected to reading from
the offset mesa uses off of the kernarg pointer.

Error on intrinsics which aren't supported by HSA, and start
emitting the correct IR to read the workgroup size
out of the dispatch pointer.

Also initialize the pass so it can be tested with opt, and
start moving towards not depending on the subtarget as an
argument.

Start emitting errors for the intrinsics not handled with HSA.

llvm-svn: 259297
2016-01-30 05:19:45 +00:00
Matt Arsenault d0799df707 AMDGPU: Stop checking intrinsics not used by HSA for dispatch-ptr
Only the dispatch.ptr intrinsic is supposed to be used now to get
the workgroup size, and the read.local.size intrinsics do not
work correctly.

llvm-svn: 259296
2016-01-30 05:10:59 +00:00
Dan Gohman ed0f113885 [WebAssembly] Refine block placement to insert blocks between trees.
Refine the test for whether an instruction is in an expression tree so that
it detects when one tree ends and another begins, so we can place a block
at that point, rather than continuing to find the first instruction not in
a tree at all.

llvm-svn: 259294
2016-01-30 05:01:06 +00:00
Matt Arsenault 43976df0da AMDGPU: Add new amdgcn workitem intrinsics
These use the correct prefix and follow the HSA naming convention
rather than the config register option names.

llvm-svn: 259293
2016-01-30 04:25:19 +00:00
Matthias Braun b30f2f5141 Avoid overly large SmallPtrSet/SmallSet
These sets perform linear searching in small mode so it is never a good
idea to use SmallSize/N bigger than 32.

llvm-svn: 259283
2016-01-30 01:24:31 +00:00
Justin Lebar ead59f4765 [CUDA] Die if we ask the NVPTX backend to emit a global ctor/dtor.
Summary: Previously we'd just silently skip these.

Reviewers: tra, jholewinski

Subscribers: llvm-commits, jhen, echristo,

Differential Revision: http://reviews.llvm.org/D16739

llvm-svn: 259279
2016-01-30 01:07:38 +00:00
Yaron Keren eb2a25467e Annotate dump() methods with LLVM_DUMP_METHOD, addressing Richard Smith r259192 post commit comment.
clang part in r259232, this is the LLVM part of the patch.

llvm-svn: 259240
2016-01-29 20:50:44 +00:00
Tim Northover c4093c3ced ARM: don't mangle DAG constant if it has more than one use
The basic optimisation was to convert (mul $LHS, $complex_constant) into
roughly "(shl (mul $LHS, $simple_constant), $simple_amt)" when it was expected
to be cheaper. The original logic checks that the mul only has one use (since
we're mangling $complex_constant), but when used in even more complex
addressing modes there may be an outer addition that can pick up the wrong
value too.

I *think* the ARM addressing-mode problem is actually unreachable at the
moment, but that depends on complex assessments of the profitability of
pre-increment addressing modes so I've put a real check in there instead of an
assertion.

llvm-svn: 259228
2016-01-29 19:18:46 +00:00
Derek Schuff d91a12ec11 [WebAssembly] Update test expectations
llvm-svn: 259223
2016-01-29 18:54:38 +00:00
Derek Schuff 6ea637af35 [WebAssembly] Support frame pointer
Add support for frame pointer use in prolog/epilog.
Supports dynamic allocas but not yet over-aligned locals.
Target-independend CG generates SP updates, but we still need to write
back the SP value to memory when necessary.

llvm-svn: 259220
2016-01-29 18:37:49 +00:00
Zoran Jovanovic d474ef3a3b [mips] Absolute value macro expansion
Author: obucina
Reviewers: dsanders
Differential Revision: http://reviews.llvm.org/D16323

llvm-svn: 259202
2016-01-29 16:18:34 +00:00
Alexandros Lamprineas 8c26e7c647 [ARM] Emit trap instruction using .inst directive
The trap instruction is emitted as a data-in-text rather
than an instruction. This patch uses the .inst directive
for emitting trap.

Differential Revision: http://reviews.llvm.org/D16684

llvm-svn: 259182
2016-01-29 10:23:32 +00:00
Matt Arsenault 295875efda AMDGPU: Remove 24-bit intrinsics
The known bit matching code seems to work reasonably well,
so these shouldn't really be needed.

llvm-svn: 259180
2016-01-29 10:05:16 +00:00
Eric Christopher 7d9b9b2d7d Refactor common code for PPC fast isel load immediate selection.
llvm-svn: 259178
2016-01-29 07:20:30 +00:00
Eric Christopher 5a2429e239 Since LI/LIS sign extend the constant passed into the instruction we should
check that the sign extended constant fits into 16-bits if we want a
zero extended value, otherwise go ahead and put it together piecemeal.

Fixes PR26356.

llvm-svn: 259177
2016-01-29 07:20:01 +00:00
Eric Christopher 80ba58a15c Fix up conditional formatting.
llvm-svn: 259176
2016-01-29 07:19:49 +00:00
David Majnemer f2bb710da5 [WinEH] Don't perform state stores in cleanups
Our cleanups do not support true lexical nesting of funclets which
obviates the need to perform state stores.

This fixes PR26361.

llvm-svn: 259161
2016-01-29 05:33:15 +00:00
Ahmed Bougacha 53010a0d5b [AArch64] Fix i64 nontemporal high-half extraction.
Since we only have pair - not single - nontemporal store instructions,
we have to extract the high part into a separate register to be able
to use them.

When the initial nontemporal codegen support was added, I wrote the
extract using the nonsensical UBFX [0,32[.
Use the correct LSR form instead.

llvm-svn: 259134
2016-01-29 01:08:41 +00:00
Matt Arsenault 5b39b34ca5 AMDGPU: Match fmed3 patterns with legacy fmin/fmax
llvm-svn: 259090
2016-01-28 20:53:48 +00:00
Matt Arsenault f639c32739 AMDGPU: Match some med3 patterns
llvm-svn: 259089
2016-01-28 20:53:42 +00:00
Matt Arsenault 7293f9895e AMDGPU: Set DX10Clamp bit
llvm-svn: 259088
2016-01-28 20:53:35 +00:00
Tom Stellard 3d2c852958 AMDGPU: waitcnt operand fixes
Summary:
Allow lgkmcnt up to 0xF (hardware allows that).
Fix mask for ExpCnt in AMDGPUInstPrinter.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D16314

Patch by: Nikolay Haustov

llvm-svn: 259059
2016-01-28 17:13:44 +00:00
Mitch Bodart e5cadbbcdd [X86] Test commit, fixed typos in comments. NFC.
llvm-svn: 259057
2016-01-28 16:40:51 +00:00
Tom Stellard 2ff726272a AMDGPU: Move subtarget specific code out of AMDGPUInstrInfo.cpp
Summary:
Also delete all the stub functions that are identical to the
implementations in TargetInstrInfo.cpp.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16609

llvm-svn: 259054
2016-01-28 16:04:37 +00:00
Chad Rosier 3ada75f7e8 [AArch64] Set MMOs on pre- and post-index instructions.
Without the MMOs the MI scheduler is unable to reason about the dependencies of
these instructions.

llvm-svn: 259052
2016-01-28 15:38:24 +00:00
Simon Pilgrim de16172d9d [x86] Merge multiple calls to DAG.getTargetLoweringInfo(). NFC.
llvm-svn: 259050
2016-01-28 15:29:11 +00:00
Oliver Stannard 02fa1c80c4 Revert r259035, it introduces a cyclic library dependency
llvm-svn: 259045
2016-01-28 13:19:47 +00:00
Igor Breger fca0a34398 AVX512: Fix truncate v32i8 to v32i1 lowering implementation.
Enable truncate 128/256bit packed byte/word with AVX512BW but without AVX512VL, use 512bit instructions.

Differential Revision: http://reviews.llvm.org/D16531

llvm-svn: 259044
2016-01-28 13:19:25 +00:00
Benjamin Kramer 16e0f147a9 Unbreak the wasm backend again after r259035.
llvm-svn: 259040
2016-01-28 11:26:34 +00:00
Zoran Jovanovic 838eabcd46 [mips][microMIPS] Disable FastISel for microMIPS
Author: milena.vujosevic.janicic
Reviewers: dsanders

FastIsel is not supported for microMIPS, thus it needs to be disabled. 
Test micromips-zero-mat-uses.ll is deleted since the tested sequence of instructions is not generated for microMIPS without FastISel.
Differential Revision: http://reviews.llvm.org/D15892

llvm-svn: 259039
2016-01-28 11:08:03 +00:00
Oliver Stannard b4b092ea1b Add backend dignostic printer for unsupported features
Re-commit of r258951 after fixing layering violation.

The related LLVM patch adds a backend diagnostic type for reporting
unsupported features, this adds a printer for them to clang.

In the case where debug location information is not available, I've
changed the printer to report the location as the first line of the
function, rather than the closing brace, as the latter does not give the
user any information. This also affects optimisation remarks.

Differential Revision: http://reviews.llvm.org/D16590

llvm-svn: 259035
2016-01-28 10:07:27 +00:00
Simon Pilgrim d3b78430d1 [X86][SSE] Move setTargetShuffleZeroElements closer to getTargetShuffleMask. NFCI.
Keep target shuffle mask helper functions closer together.

llvm-svn: 259034
2016-01-28 09:45:01 +00:00
Asaf Badouh 42852d99e7 [X86][AVX512] small fix in ptestm intrinsics
move ptestm{q|d} intrinsics from patterns form (in td file) to the intrinsics table

Differential Revision: http://reviews.llvm.org/D16633

llvm-svn: 259029
2016-01-28 08:33:22 +00:00
JF Bastien 1e02c70ba3 WebAssembly: fix build
r259016 didn't also revert r258957 which broken the WebAssembly build.

llvm-svn: 259020
2016-01-28 05:05:17 +00:00
NAKAMURA Takumi 628a7a0aef Revert r258951 (and r258950), "Refactor backend diagnostics for unsupported features"
It broke layering violation in LLVMIR.

clang r258950 "Add backend dignostic printer for unsupported features"
llvm  r258951 "Refactor backend diagnostics for unsupported features"

llvm-svn: 259016
2016-01-28 04:41:32 +00:00
Dan Gohman fbfe5ec4a4 [WebAssembly] Don't stackify a register def past a get_local use in the same tree.
llvm-svn: 259013
2016-01-28 03:59:09 +00:00
Dan Gohman adf28177eb [WebAssembly] Enhanced register stackification
This patch revamps the RegStackifier pass with a new tree traversal mechanism,
enabling three major new features:

 - Stackification of values with multiple uses, using the result value of set_local
 - More aggressive stackification of instructions with side effects
 - Reordering operands in commutative instructions to enable more stackification.

llvm-svn: 259009
2016-01-28 01:22:44 +00:00
Adam Nemet dadfbb52f7 [TTI] Add getPrefetchDistance from PPCLoopDataPrefetch, NFC
This patch is part of the work to make PPCLoopDataPrefetch
target-independent
(http://thread.gmane.org/gmane.comp.compilers.llvm.devel/92758).

As it was discussed in the above thread, getPrefetchDistance is
currently using instruction count which may change in the future.

llvm-svn: 258995
2016-01-27 22:21:25 +00:00
Derek Schuff 4dd6778660 [WebAssembly] Implement byval arguments
Summary:
Just does the simple allocation of a stack object and passes
a pointer to the callee.

Differential Revision: http://reviews.llvm.org/D16610

llvm-svn: 258989
2016-01-27 21:17:39 +00:00
Tim Northover 042a6c1fe1 ARMv7k: base ABI decision on v7k Arch rather than watchos OS.
Various bits we want to use the new ABI actually compile with "-arch armv7k
-miphoneos-version-min=9.0". Not ideal, but also not ridiculous given how
slices work.

llvm-svn: 258975
2016-01-27 19:32:29 +00:00
Benjamin Kramer 391be792f2 One more batch of self-containing headers.
llvm-svn: 258974
2016-01-27 19:29:56 +00:00
Benjamin Kramer b32a5042bd Don't put classes in headers into anonymous namespaces.
You want ODR violations? That's how you get ODR violations.

llvm-svn: 258973
2016-01-27 19:29:42 +00:00
Benjamin Kramer c8be5be968 Unbreak wasm build after r258951.
llvm-svn: 258957
2016-01-27 18:03:40 +00:00
Benjamin Kramer 45275a4d3c Make more headers self-contained.
A lot of this comes from the new complete type requirement of DenseMap.

llvm-svn: 258956
2016-01-27 18:03:37 +00:00
Oliver Stannard 1e67a9f196 Refactor backend diagnostics for unsupported features
The BPF and WebAssembly backends had identical code for emitting errors
for unsupported features, and AMDGPU had very similar code. This merges
them all into one DiagnosticInfo subclass, that can be used by any
backend.

There should be minimal functional changes here, but some AMDGPU tests
have been updated for the new format of errors (it used a slightly
different format to BPF and WebAssembly). The AMDGPU error messages will
now benefit from having precise source locations when debug info is
available.

The implementation of DiagnosticInfoUnsupported::print must be in
lib/Codegen rather than in the existing file in lib/IR/ to avoid
introducing a dependency from IR to CodeGen.

Differential Revision: http://reviews.llvm.org/D16590

llvm-svn: 258951
2016-01-27 17:30:33 +00:00
Benjamin Kramer f9172fd4ac Rename TargetSelectionDAGInfo into SelectionDAGTargetInfo and move it to CodeGen/
It's a SelectionDAG thing, not a Target thing.

llvm-svn: 258939
2016-01-27 16:32:26 +00:00
Benjamin Kramer 820f7548a1 Make some headers self-contained, remove unused includes that violate layering.
llvm-svn: 258937
2016-01-27 16:05:37 +00:00
Tom Stellard 6e3b14de62 AMDGPU/SI: Fix commuting of 32-bit VOPC instructions
Summary:
We didn't have entries in the commuting table for the 32-bit
instructions.  I don't think we hit this problem now, but we
will once uniform branching is enabled.  Tests will come in
a later commit.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16600

llvm-svn: 258936
2016-01-27 15:53:52 +00:00
Benjamin Kramer d477e9e378 Revert "Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed."
and "Add a missing test case for r258847."

This reverts commit r258847, r258848. Causes miscompilations and backend
errors.

llvm-svn: 258927
2016-01-27 12:44:12 +00:00
Marek Olsak e86f252209 AMDGPU/SI: Stoney has only 16 LDS banks
Summary:
This is a candidate for stable, along with all patches that add the "stoney"
processor.

Reviewers: tstellarAMD

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D16485

llvm-svn: 258922
2016-01-27 11:19:45 +00:00
Benjamin Kramer b3e8a6d2b8 Move MCTargetAsmParser.h to llvm/MC/MCParser where it belongs.
llvm-svn: 258917
2016-01-27 10:01:28 +00:00
Igor Breger b1bd47ca1a AVX512: Fix vpmovzxbw predicate for AVX1/2 instructions.
Differential Revision: http://reviews.llvm.org/D16595

llvm-svn: 258915
2016-01-27 08:57:46 +00:00
Igor Breger d6c187b038 AVX512: Add store mask patterns.
Differential Revision: http://reviews.llvm.org/D16596

llvm-svn: 258914
2016-01-27 08:43:25 +00:00
Matt Arsenault b22828f2fb AMDGPU: Fix default device handling
When no device name is specified, default to kaveri
for HSA since SI is not supported and it woud fail.

Default to "tahiti" instead of "SI" since these are
effectively the same, and tahiti is an actual device.

Move default device handling to the TargetMachine
rather than the AMDGPUSubtarget. The module ISA version
is computed from the device name provided with the target
machine, so the attributes printed by the AsmPrinter were
inconsistent with those computed in the subtarget.

Also remove DevName field from subtarget since it's redundant
with getCPU() in the superclass.

llvm-svn: 258901
2016-01-27 02:17:49 +00:00
Reid Kleckner 5b4637141e [llvm-tblgen] Avoid StringMatcher for GCC and MS builtin names
This brings the compile time of Function.cpp from ~40s down to ~4s for
me locally. It also shaves off about 400KB of object file size in a
release+asserts build.

I also realized that the AMDGPU backend does not have any GCC builtin
names to match, so the extra lookup was a no-op. I removed it to silence
a zero-length string table array warning. There should be no functional
change here.

This change really ends the story of PR11951.

llvm-svn: 258897
2016-01-27 01:43:12 +00:00
Reid Kleckner 1c93b4cd7b [llvm-tblgen] Stop emitting the intrinsic name matching code
The AMDGPU backend was the last user of the old StringMatcher
recognition code. Move it over to the new lookupLLVMIntrinsicName
funciton, which is now improved to handle all of the interesting edge
cases exposed by AMDGPU intrinsic names.

llvm-svn: 258875
2016-01-26 23:01:21 +00:00
Derek Schuff 90d9e8d370 [WebAssembly] Omit no-op adds for non-mem uses of FrameIndex
Differential Revision: http://reviews.llvm.org/D16554

llvm-svn: 258872
2016-01-26 22:47:43 +00:00
Sanjay Patel 06fe9183b0 [x86] make the subtarget member a const reference, not a pointer ; NFCI
It's passed in as a reference; it's not optional; it's not a pointer.

llvm-svn: 258867
2016-01-26 22:08:58 +00:00
Simon Pilgrim 00adc1e105 [X86] Add support for zeroed shuffle elements to getShuffleScalarElt
Enable handling of SM_SentinelZero shuffle elements to getShuffleScalarElt. Improves VZEXT_LOAD matches in EltsFromConsecutiveLoads.

llvm-svn: 258865
2016-01-26 21:39:25 +00:00
Chris Bieneman e49730d4ba Remove autoconf support
Summary:
This patch is provided in preparation for removing autoconf on 1/26. The proposal to remove autoconf on 1/26 was discussed on the llvm-dev thread here: http://lists.llvm.org/pipermail/llvm-dev/2016-January/093875.html

"I felt a great disturbance in the [build system], as if millions of [makefiles] suddenly cried out in terror and were suddenly silenced. I fear something [amazing] has happened."
- Obi Wan Kenobi

Reviewers: chandlerc, grosbach, bob.wilson, tstellarAMD, echristo, whitequark

Subscribers: chfast, simoncook, emaste, jholewinski, tberghammer, jfb, danalbert, srhines, arsenm, dschuff, jyknight, dsanders, joker.eph, llvm-commits

Differential Revision: http://reviews.llvm.org/D16471

llvm-svn: 258861
2016-01-26 21:29:08 +00:00
Derek Schuff e7305cc4b3 [WebAssembly] Remove check for FrameIndex operands in WebAssemblyPeephole
This pass runs after FrameIndex elimination, so it should never see FI
operands. NFC

llvm-svn: 258860
2016-01-26 21:08:27 +00:00
Sanjay Patel 3e1701da29 [x86] add materializeVectorConstant() helper function; NFC
LowerBUILD_VECTOR is still over 300 lines long, but it's a start...

llvm-svn: 258858
2016-01-26 21:05:00 +00:00
JF Bastien 43436716aa WebAssembly NFC: update error message
I forgot to update this one in my previous patch.

llvm-svn: 258853
2016-01-26 20:24:51 +00:00
JF Bastien 1a6c7608b1 WebAssembly: don't optimize memcpy/memmove/memcpy to frame index
r258781 optimized memcpy/memmove/memcpy so the intrinsic call can return its first argument, but missed the frame index case. Teach it to ignore that case so C code doesn't assert out in these cases.

llvm-svn: 258851
2016-01-26 20:22:42 +00:00
Cong Hou 551a57f797 Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed.
Currently, AnalyzeBranch() fails non-equality comparison between floating points
on X86 (see https://llvm.org/bugs/show_bug.cgi?id=23875). This is because this
function can modify the branch by reversing the conditional jump and removing
unconditional jump if there is a proper fall-through. However, in the case of
non-equality comparison between floating points, this can turn the branch
"unanalyzable". Consider the following case:

jne.BB1
jp.BB1
jmp.BB2
.BB1:
...
.BB2:
...

AnalyzeBranch() will reverse "jp .BB1" to "jnp .BB2" and then "jmp .BB2" will be
removed:

jne.BB1
jnp.BB2
.BB1:
...
.BB2:
...

However, AnalyzeBranch() cannot analyze this branch anymore as there are two
conditional jumps with different targets. This may disable some optimizations
like block-placement: in this case the fall-through behavior is enforced even if
the fall-through block is very cold, which is suboptimal.

Actually this optimization is also done in block-placement pass, which means we
can remove this optimization from AnalyzeBranch(). However, currently
X86::COND_NE_OR_P and X86::COND_NP_OR_E are not reversible: there is no defined
negation conditions for them.

In order to reverse them, this patch defines two new CondCode X86::COND_E_AND_NP
and X86::COND_P_AND_NE. It also defines how to synthesize instructions for them.
Here only the second conditional jump is reversed. This is valid as we only need
them to do this "unconditional jump removal" optimization.


Differential Revision: http://reviews.llvm.org/D11393

llvm-svn: 258847
2016-01-26 20:08:01 +00:00
Sanjay Patel 70fa79fdf2 [x86] simplify getOnesVector() ; NFCI
Let DAG.getConstant() handle the splatting; there's no need
to repeat that logic here.

llvm-svn: 258833
2016-01-26 18:49:36 +00:00
Eugene Zelenko 6ac3f739ca Fix Clang-tidy modernize-use-nullptr and modernize-use-override warnings; other minor fixes.
Differential revision: reviews.llvm.org/D16568

llvm-svn: 258831
2016-01-26 18:48:36 +00:00
Benjamin Kramer c50b89070c Update wasm target for r258819.
llvm-svn: 258827
2016-01-26 18:21:38 +00:00
Benjamin Kramer f57c1977c1 Reflect the MC/MCDisassembler split on the include/ level.
No functional change, just moving code around.

llvm-svn: 258818
2016-01-26 16:44:37 +00:00
Dan Gohman fb619e9686 [WebAssembly] Fix a typo in a comment.
llvm-svn: 258810
2016-01-26 14:55:17 +00:00
Simon Pilgrim 46696ef93c [X86][SSE] Add zero element and general 64-bit VZEXT_LOAD support to EltsFromConsecutiveLoads
This patch adds support for trailing zero elements to VZEXT_LOAD loads (and checks that no zero elts occur within the consecutive load).

It also generalizes the 64-bit VZEXT_LOAD load matching to work for loads other than 2x32-bit loads.

After this patch it will also be easier to add support for other basic load patterns like 32-bit VZEXT_LOAD loads, PMOVZX and subvector load insertion.

Differential Revision: http://reviews.llvm.org/D16217

llvm-svn: 258798
2016-01-26 09:30:08 +00:00
Craig Topper b9c932f26e [X86] Mark LDS/LES as not being allowed in 64-bit mode.
Their opcodes are used as part of the VEX prefix in 64-bit mode. Clearly the disassembler implicitly decoded them as AVX instructions in 64-bit mode, but I think the AsmParser would have encoded them.

llvm-svn: 258793
2016-01-26 06:10:15 +00:00
Matt Arsenault bee7575e1a AMDGPU: Move AMDGPU intrinsics only used by R600
llvm-svn: 258790
2016-01-26 04:49:24 +00:00
Matt Arsenault 382d945d16 AMDGPU: Tidy minor td file issues
Make comments and indentation more consistent.

Rearrange a few things to be in a more consistent order,
such as organizing subtarget features from those describing
an actual device property, and those used as options.

llvm-svn: 258789
2016-01-26 04:49:22 +00:00
Matt Arsenault c5f6152911 AMDGPU: Make v32i8/v64i8 illegal types
Old intrinsics were forcing these, but they have now all
been removed. This fixes large i8 vector operations generally
being broken.

llvm-svn: 258788
2016-01-26 04:43:48 +00:00
Matt Arsenault 018179fc46 AMDGPU: Remove old sample intrinsics
I did my best to try to update all the uses in tests that
just happened to use the old ones to the newer intrinsics.

I'm not sure I got all of the immediate operand conversions
correct, since the value seems to have been ignored by the
old pattern but I don't think it really matters.

llvm-svn: 258787
2016-01-26 04:38:08 +00:00
Matt Arsenault 051d6f9fde AMDGPU: Add new amdgcn intrinsics for cube instructions
More cleanup to try to get all intrinsics using the correct
amdgcn prefix that are as close to the instruction as possible.

llvm-svn: 258786
2016-01-26 04:29:56 +00:00
Matt Arsenault 9a10cea7fb AMDGPU: Implement read_register and write_register intrinsics
Some of the special intrinsics now that now correspond to a instruction
also have special setting of some registers, e.g. llvm.SI.sendmsg sets
m0 as well as use s_sendmsg. Using these explicit register intrinsics
may be a better option.

Reading the exec mask and others may be useful for debugging. For this
I'm not sure this is entirely correct because we would want this to
be convergent, although it's possible this is already treated
sufficently conservatively.

llvm-svn: 258785
2016-01-26 04:29:24 +00:00
Matt Arsenault 0c3e2338fe AMDGPU: Restore AMDGPU prefixed rsq intrinsic for now
Also move into backend intrinsics to discourage use of the old name.

llvm-svn: 258783
2016-01-26 04:14:16 +00:00
Dan Gohman bdf08d5da6 [WebAssembly] Optimize memcpy/memmove/memcpy calls.
These calls return their first argument, but because LLVM uses an intrinsic
with a void return type, they can't use the returned attribute. Generalize
the store results pass to optimize these calls too.

llvm-svn: 258781
2016-01-26 04:01:11 +00:00
Dan Gohman be6f196bff [WebAssembly] Remove a completed entry from the README.txt.
llvm-svn: 258780
2016-01-26 03:43:48 +00:00
Dan Gohman bb3722430f [WebAssembly] Implement unaligned loads and stores.
Differential Revision: http://reviews.llvm.org/D16534

llvm-svn: 258779
2016-01-26 03:39:31 +00:00
Reid Kleckner 86ff2689a5 Sort intrinsics by LLVM intrinsic name, rather than tablegen def name
Step one towards using a simple binary search to lookup intrinsic IDs
instead of our crazy table generated switch+memcmp+startswith code that
makes Function.cpp take about a minute to compile.  See PR24785 and
PR11951 for why we should do this.

The X86 backend contains tables that need to be sorted on intrinsic ID,
so reorder those.

llvm-svn: 258757
2016-01-26 00:55:00 +00:00
Matthias Braun 4e67e5c91a X86ISelLowering: Fix cmov(cmov) special lowering bug
There's a special case in EmitLoweredSelect() that produces an improved
lowering for cmov(cmov) patterns. However this special lowering is
currently broken if the inner cmov has multiple users so this patch
stops using it in this case.

If you wonder why this wasn't fixed by continuing to use the special
lowering and inserting a 2nd PHI for the inner cmov: I believe this
would incur additional copies/register pressure so the special lowering
does not improve upon the normal one anymore in this case.

This fixes http://llvm.org/PR26256 (= rdar://24329747)

llvm-svn: 258729
2016-01-25 22:08:25 +00:00
Simon Pilgrim d1d118097d [X86][AVX] Add commutation support for VPERM2X128 instructions
Its main use is to allow memory folding of the 1st operand

Differential Revision: http://reviews.llvm.org/D16521

llvm-svn: 258726
2016-01-25 21:51:34 +00:00
Dan Gohman 899cb5ab7b [WebAssembly] Fix unbalanced register stack code in the case of late DCE.
Instructions can be DCE'd after the RegStackify pass. If the instruction which
would be the pop for what would be a push is removed, don't use a push.

llvm-svn: 258694
2016-01-25 16:48:44 +00:00
Dan Gohman ec977b07a8 [WebAssembly] Minor code formatting cleanups. NFC.
llvm-svn: 258692
2016-01-25 15:12:05 +00:00
Michael Zuckerman 1bd7f993fc [AVX512] Adding PTESTNMB/D/W/Q instruction
Differential Revision: http://reviews.llvm.org/D16520

llvm-svn: 258688
2016-01-25 14:43:23 +00:00
Michael Zuckerman 19670d479a [AVX512] Adding PTESTMB/W/D/Q instruction
Differential Revision: http://reviews.llvm.org/D16519

llvm-svn: 258686
2016-01-25 13:27:32 +00:00
Bradley Smith d27a6a7072 [ARM] Add DSP build attribute and extension targeting
This patch was originally committed as r257885, but was reverted due to windows
failures. The cause of these failures has been fixed under r258677, hence
re-committing the original patch.

llvm-svn: 258683
2016-01-25 11:26:11 +00:00
Bradley Smith f277c8a5ea [ARM] Add new system registers to ARMv8-M Baseline/Mainline
This patch was originally committed as r257884, but was reverted due to windows
failures. The cause of these failures has been fixed under r258677, hence
re-committing the original patch.

llvm-svn: 258682
2016-01-25 11:25:36 +00:00
Bradley Smith fed3e4ac00 [ARM] Add ARMv8-M security extension instructions to ARMv8-M Baseline/Mainline
This patch was originally committed as r257883, but was reverted due to windows
failures. The cause of these failures has been fixed under r258677, hence
re-committing the original patch.

llvm-svn: 258681
2016-01-25 11:24:47 +00:00
Asaf Badouh 655822ab7e [X86][IFMA] adding intrinsics and encoding for multiply and add of unsigned 52bit integer
VPMADD52LUQ - Packed Multiply of Unsigned 52-bit Integers and Add the Low 52-bit Products to Qword Accumulators
 VPMADD52HUQ - Packed Multiply of Unsigned 52-bit Unsigned Integers and Add High 52-bit Products to 64-bit Accumulators

Differential Revision: http://reviews.llvm.org/D16407

llvm-svn: 258680
2016-01-25 11:14:24 +00:00
Oliver Stannard 65b85382f6 [ARM] Add ARMv8.2-A FP16 scalar instructions
This was originally committed as r255762, but reverted as it broke windows
bots. Re-commitiing the exact same patch, as the underlying cause was fixed by
r258677.

ARMv8.2-A adds 16-bit floating point versions of all existing VFP
floating-point instructions. This is an optional extension, so all of
these instructions require the FeatureFullFP16 subtarget feature.

The assembly for these instructions uses S registers (AArch32 does not
have H registers), but the instructions have ".f16" type specifiers
rather than ".f32" or ".f64". The top 16 bits of each source register
are ignored, and the top 16 bits of the destination register are set to
zero.

These instructions are mostly the same as the 32- and 64-bit versions,
but they use coprocessor 9 rather than 10 and 11.

Two new instructions, VMOVX and VINS, have been added to allow packing
and extracting two 16-bit floats stored in the top and bottom halves of
an S register.

New fixup kinds have been added for the PC-relative load and store
instructions, but no ELF relocations have been added as they have a
range of 512 bytes.

Differential Revision: http://reviews.llvm.org/D15038

llvm-svn: 258678
2016-01-25 10:26:26 +00:00
Oliver Stannard 7772f023b5 [TableGen] Fix sort order of asm operand classes
This is a fix for https://llvm.org/bugs/show_bug.cgi?id=22796.

The previous implementation of ClassInfo::operator< allowed cycles of classes
such that x < y < z < x, meaning that a list of them cannot be correctly
sorted, and the sort order could differ with different standard libraries.

The original implementation sorted classes by ValueName if they were otherwise
equal. This isn't strictly necessary, but some backends seem to accidentally
rely on it. If I reverse this comparison I get 8 test failures spread across
the AArch64, Mips and X86 backends, so I have left it in until those backends
can be fixed.

There was one case in the X86 backend where the observable behaviour of the
assembler is changed by this patch. This was because some of the memory asm
operands were not marked as children of X86MemAsmOperand.

Differential Revision: http://reviews.llvm.org/D16141

llvm-svn: 258677
2016-01-25 10:20:19 +00:00
Junmo Park 3ca3e192d0 Silence a -Wparentheses warning; NFC.
llvm-svn: 258676
2016-01-25 10:17:17 +00:00
Igor Breger 6d421419db AVX1 : Enable vector masked_load/store to AVX1.
Use AVX1 FP instructions (vmaskmovps/pd) in place of the AVX2 int instructions (vpmaskmovd/q).

Differential Revision: http://reviews.llvm.org/D16528

llvm-svn: 258675
2016-01-25 10:17:11 +00:00
Michael Zuckerman 72b7223ae6 [AVX512] [CMPPS ][ CMPPD ] Adding full Comparison Predicate names
X86AsmParser.cpp is missing full comparison predicate names for CMPPD and CMPPS Instructions.
X86AsmParser.cpp defines only the short names of the Comparison predicate that you can find in the following pdf:
https://software.intel.com/sites/default/files/managed/07/b7/319433-023.pdf
Page 5-61 table 5-3

Differential Revision: http://reviews.llvm.org/D16518

llvm-svn: 258671
2016-01-25 08:43:26 +00:00
Elena Demikhovsky 29cde35b43 Added Skylake client to X86 targets and features
Changes in X86.td:

I set features of Intel processors in incremental form: IVB = SNB + X HSW = IVB + X ..
I added Skylake client processor and defined it's features
FeatureADX was missing on KNL
Added some new features to appropriate processors SMAP, IFMA, PREFETCHWT1, VMFUNC and others

Differential Revision: http://reviews.llvm.org/D16357

llvm-svn: 258659
2016-01-24 10:41:28 +00:00
Igor Breger 1e5bafbc82 AVX512: VMOVDQU8/16/32/64 (load) intrinsic implementation.
Differential Revision: http://reviews.llvm.org/D16137

llvm-svn: 258657
2016-01-24 08:04:33 +00:00
Simon Pilgrim 0423b382d3 [X86][SSE] Generalised TRUNC -> PACKSS/PACKUS code. NFC.
Generalised mask generation / subvector extraction to use the input/output types directly instead of an if/else through all the currently accepted types.

llvm-svn: 258645
2016-01-23 22:02:48 +00:00
Justin Lebar 3a5f5798a1 [CUDA] Die gracefully when trying to output an LLVM alias.
Summary:
Previously, we would just output "foo = bar" in the assembly, and then
ptxas would choke.  Now we die before emitting any invalid code.

Reviewers: echristo

Subscribers: jholewinski, llvm-commits, jhen, tra

Differential Revision: http://reviews.llvm.org/D16490

llvm-svn: 258638
2016-01-23 21:12:20 +00:00
Justin Lebar 2a161f986f [CUDA] Make empty parameter lists in nvptx function decls easier to read.
Summary:
Before:

  .func  (.param .b32 func_retval0) _ZL21__nvvm_reflect_anchorv(

  )
  {

After:

  .func  (.param .b32 func_retval0) _ZL21__nvvm_reflect_anchorv()
  {

Reviewers: bkramer

Subscribers: llvm-commits, tra, jhen, echristo, jholewinski

Differential Revision: http://reviews.llvm.org/D16512

llvm-svn: 258637
2016-01-23 21:12:17 +00:00
Aaron Ballman add830b5d1 Silence a -Wparentheses warning; NFC.
llvm-svn: 258626
2016-01-23 15:42:21 +00:00
Simon Pilgrim ead22d095e Added missing comment. NFC.
llvm-svn: 258624
2016-01-23 14:38:02 +00:00
Simon Pilgrim fd66169341 [X86][SSE] Remove INSERTPS dependencies from unreferenced operands.
If the INSERTPS zeroes out all the referenced elements from either of the 2 input vectors (and the input is not already UNDEF), then set that input to UNDEF to reduce dependencies.

llvm-svn: 258622
2016-01-23 13:37:07 +00:00
Matthias Braun 327bca776c Inline variable into assert
Seems like some compilers still give unused variable warnings for
bool var = ...;
(void)var;
so I have to inline the variable.

llvm-svn: 258619
2016-01-23 06:49:29 +00:00
NAKAMURA Takumi 9974fa9c8c AArch64ISelLowering.cpp: Fix a warning. [-Wunused-variable]
llvm-svn: 258618
2016-01-23 06:34:59 +00:00
Manuel Jacob 45cc9bb581 Put space after pointer type in test. NFC.
llvm-svn: 258615
2016-01-23 05:47:34 +00:00
Matt Arsenault 7713162c32 AMDGPU: Remove more unused intrinsics
Replace tests with lrp with basic IR expansion

llvm-svn: 258612
2016-01-23 05:42:38 +00:00
Matt Arsenault f75257aaa6 AMDGPU: Move amdgcn intrinsic handling into SITargetLowering
llvm-svn: 258608
2016-01-23 05:32:20 +00:00
Matt Arsenault f1341406bf AMDGPU: Remove IntrNoMem from llvm.SI.sendmsg
This has side effects.

llvm-svn: 258607
2016-01-23 05:32:18 +00:00
Matt Arsenault 2a93bb6365 AMDGPU: Remove Feature64BitPtr
This is a leftover from AMDIL that doesn't do anything
and doesn't belong here.

llvm-svn: 258606
2016-01-23 05:32:14 +00:00
Matthias Braun fdef49b183 AArch64ISel: Fix ccmp code selection matching deep expressions.
Some of the conditions necessary to produce ccmp sequences were only
checked in recursive calls to emitConjunctionDisjunctionTree() after
some of the earlier expressions were already built. Move all checks over
to isConjunctionDisjunctionTree() so they are all checked before we
start emitting instructions.

Also rename some variable to better reflect their usage.

llvm-svn: 258605
2016-01-23 04:05:22 +00:00
Matthias Braun 985bdf9084 AArch64ISelLowering: Reduce maximum recursion depth of isConjunctionDisjunctionTree()
This function will exhibit exponential runtime (2**n) so we should
rather use a lower limit.

llvm-svn: 258604
2016-01-23 04:05:18 +00:00
Matthias Braun fd13c14669 Fix wrong indentation
llvm-svn: 258603
2016-01-23 04:05:16 +00:00
Derek Schuff 65194682e9 [WebAssembly] Fix RegNumbering for the stack pointer
Previously it failed to add NumArgRegs to the offset and so clobbered an
already-used register. Now just start the numbering after the arg regs
and don't duplicate the add. Test coverage for this coming shortly with
the implementation of byval.

llvm-svn: 258597
2016-01-23 01:20:43 +00:00
Sanjay Patel c4efadb665 fix typos; NFC
llvm-svn: 258567
2016-01-22 22:09:41 +00:00
Matt Arsenault 10ca39ca8b AMDGPU: Add new name for barrier intrinsic
llvm-svn: 258558
2016-01-22 21:30:43 +00:00
Matt Arsenault bef34e21c7 AMDGPU: Rename intrinsics to use amdgcn prefix
The intrinsic target prefix should match the target name
as it appears in the triple.

This is not yet complete, but gets most of the important ones.
llvm.AMDGPU.* intrinsics used by mesa and libclc are still handled
for compatability for now.

llvm-svn: 258557
2016-01-22 21:30:34 +00:00
Matt Arsenault 0b783ef076 AMDGPU: Fix crash with invariant markers
The promote alloca pass didn't handle these intrinsics and crashed.
These intrinsics should accept any address space, but for now just
erase them to avoid breaking.

llvm-svn: 258537
2016-01-22 19:47:54 +00:00
Jingyue Wu 585ec8671d [NVPTX] expand mul_lohi to mul_lo and mul_hi
Summary: Fixes PR26186.

Reviewers: grosser, jholewinski

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D16479

llvm-svn: 258536
2016-01-22 19:47:26 +00:00
Ahmed Bougacha 78d6efdb93 [AArch64] Simplify emitConditionalCompare calls. NFC.
Now that both callsites are identical, we can simplify the
prototype and make it easier to reason about the 2-CC case.

llvm-svn: 258534
2016-01-22 19:43:57 +00:00
Ahmed Bougacha 99209b90a4 [AArch64] Lower 2-CC FCCMPs (one/ueq) using AND'ed CCs.
The current behavior is incorrect, as the two CCs returned by
changeFPCCToAArch64CC, intended to be OR'ed, are instead used
in an AND ccmp chain.

Consider:
define i32 @t(float %a, float %b, float %c, float %d, i32 %e, i32 %f) {
  %cc1 = fcmp one float %a, %b
  %cc2 = fcmp olt float %c, %d
  %and = and i1 %cc1, %cc2
  %r = select i1 %and, i32 %e, i32 %f
  ret i32 %r
}

Assuming (%a < %b) and (%c < %d); we used to do:
  fcmp  s0, s1            # nzcv <- 1000
  orr   w8, wzr, #0x1     # w8 <- 1
  csel  w9, w8, wzr, mi   # w9 <- 1
  csel  w8, w8, w9, gt    # w8 <- 1
  fcmp  s2, s3            # nzcv <- 1000
  cset   w9, mi           # w9 <- 1
  tst    w8, w9           # (w8 & w9) == 1, so: nzcv <- 0000
  csel  w0, w0, w1, ne    # w0 <- w0

We now do:
  fcmp  s2, s3            # nzcv <- 1000
  fccmp s0, s1, #0, mi    #  mi, so: nzcv <- 1000
  fccmp s0, s1, #8, le    # !le, so: nzcv <- 1000
  csel  w0, w0, w1, pl    # !pl, so: w0 <- w1

In other words, we transformed:
  (c < d) &&  ((a < b) || (a > b))
into:
  (c < d) &&   (a u>= b) && (a u<= b)
whereas, per De Morgan's, we wanted:
  (c < d) && !((a u>= b) && (a u<= b))

Note that this problem doesn't occur in the test-suite.

changeFPCCToAArch64CC produces disjunct CCs; here, one -> mi/gt.
We can't represent that in the fccmp chain; it can't express
arbitrary OR sequences, as one comment explains:
  In general we can create code for arbitrary "... (and (and A B) C)"
  sequences.  We can also implement some "or" expressions, because
  "(or A B)" is equivalent to "not (and (not A) (not B))" and we can
  implement some  negation operations. [...] However there is no way
  to negate the result of a partial sequence.

Instead, introduce changeFPCCToANDAArch64CC, which produces the
conjunct cond codes:
- (a one b)
    == ((a olt b) || (a ogt b))
    == ((a ord b) && (a une b))
- (a ueq b)
    == ((a uno b) || (a oeq b))
    == ((a ule b) && (a uge b))

Note that, at first, one might think that, when PushNegate is true,
we should use the disjunct CCs, in effect doing:
  (a || b)
  = !(!a && !(b))
  = !(!a && !(b1 || b2))  <- changeFPCCToAArch64CC(b, b1, b2)
  = !(!a && !b1 && !b2)

However, we can take advantage of the fact that the CC is already
negated, which lets us avoid special-casing PushNegate and doing
the simpler to reason about:

  (a || b)
  = !(!a && (!b))
  = !(!a && (b1 && b2))   <- changeFPCCToANDAArch64CC(!b, b1, b2)
  = !(!a && b1 && b2)

This makes both emitConditionalCompare cases behave identically,
and produces correct ccmp sequences for the 2-CC fcmps.

llvm-svn: 258533
2016-01-22 19:43:54 +00:00
Ahmed Bougacha 6345b9ecfa [AArch64] Assert that CCMP isel didn't fail inconsistently.
We verify that the op tree is eligible for CCMP emission in
isConjunctionDisjunctionTree, but it's also possible that
emitConjunctionDisjunctionTree fails later.
The initial check is useful, as it avoids building nodes
that will get discarded.
Still, make sure that inconsistencies don't happen with
an assert.

llvm-svn: 258532
2016-01-22 19:43:43 +00:00
Krzysztof Parzyszek 7b413c6c63 [Hexagon] Use general purpose registers to spill pred/mod registers into
Patch by Tobias Edler Von Koch.

llvm-svn: 258527
2016-01-22 19:15:58 +00:00
Matt Arsenault 59bd3014f2 AMDGPU: Rename some r600 intrinsics to use correct TargetPrefix
These ones aren't directly emitted by mesa and inserted by a pass.

llvm-svn: 258523
2016-01-22 19:00:09 +00:00
Matt Arsenault bb4ff5f5b6 AMDGPU: Remove unused R600 intrinsics
llvm-svn: 258522
2016-01-22 18:52:14 +00:00
Matt Arsenault 7898b90ee1 AMDGPU: Change control flow intrinsics to use amdgcn prefix
These aren't supposed to be used outside of the backend,
so there aren't any users to worry about.

llvm-svn: 258516
2016-01-22 18:42:55 +00:00
Matt Arsenault 8d903029e8 AMDGPU: Don't use separate mulhu/mulhs Pats
llvm-svn: 258515
2016-01-22 18:42:49 +00:00
Matt Arsenault ee0930821a AMDGPU: Remove random TGSI intrinsic
I don't think this was ever used.

llvm-svn: 258514
2016-01-22 18:42:44 +00:00
Matt Arsenault 0cbaa1762b AMDGPU: Remove AMDGPU.fract intrinsic
Mesa doesn't use this, and this is pattern matched already
from fsub x, (ffloor x)

llvm-svn: 258513
2016-01-22 18:42:38 +00:00
JF Bastien 4383a34268 NFC WebAssembly: update links
I got a vanity URL, and moved the github waterfall repo.

llvm-svn: 258484
2016-01-22 04:21:49 +00:00
Pirama Arumuga Nainar 71e9a2a4c4 Do not lower VSETCC if operand is an f16 vector
Summary:
SETCC with f16 vectors has OperationAction set to Expand but still gets
lowered to FCM* intrinsics based on its result type.  This patch skips
lowering of VSETCC if the operand is an f16 vector.

v4 and v8 tests included.

Reviewers: ab, jmolloy

Subscribers: srhines, llvm-commits

Differential Revision: http://reviews.llvm.org/D15361

llvm-svn: 258471
2016-01-22 01:16:57 +00:00
Simon Pilgrim 5ba1c127fc [X86][SSE] Improve i16 splatting shuffles
Better handling of the annoying pshuflw/pshufhw ops which only shuffle lower/upper halves of a vector.

Added vXi16 unary shuffle support for cases where i16 elements (from the same half of the source) are being splatted to the whole of one of the halves. This avoids the general lowering case which must shuffle the 32-bit elements first - meaning that we used to end up with unnecessary duplicate pshuflw/pshufhw shuffles.

Note this has the side effect of a lot of SSSE3 test cases no longer needing to use PSHUFB, as it falls below the 3 op combine threshold for when PSHUFB is typically worth it. I've raised PR26183 to discuss if the threshold should be changed and whether we need to make it more specific to the target CPU.

Differential Revision: http://reviews.llvm.org/D14901

llvm-svn: 258440
2016-01-21 22:07:41 +00:00
Adam Nemet af761104ba [TTI] Add getCacheLineSize
Summary:
And use it in PPCLoopDataPrefetch.cpp.

@hfinkel, please let me know if your preference would be to preserve the
ppc-loop-prefetch-cache-line option in order to be able to override the
value of TTI::getCacheLineSize for PPC.

Reviewers: hfinkel

Subscribers: hulx2000, mcrosier, mssimpso, hfinkel, llvm-commits

Differential Revision: http://reviews.llvm.org/D16306

llvm-svn: 258419
2016-01-21 18:28:36 +00:00
Scott Egerton 2455701117 [mips] Allowed dla instructions on 32-bit architectures.
Summary:
This is now the same as the behaviour of the GNU assembler. This was done
as it is required in order to build the Linux kernel with the integrated
assembler enabled.

Reviewers: dsanders, vkalintiris

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D13594

llvm-svn: 258400
2016-01-21 15:11:01 +00:00
Igor Breger 7a000f5bb2 AVX512: Masked move intrinsic implementation.
Implemented intrinsic for the follow instructions (reg move) : VMOVDQU8/16, VMOVDQA32/64, VMOVAPS/PD.

Differential Revision: http://reviews.llvm.org/D16316

llvm-svn: 258398
2016-01-21 14:18:11 +00:00
Michael Zuckerman 21a30a42a9 [AVX512] Adding VPERMT2B and VPERMI2B Intrinsics
Differential Revision: http://reviews.llvm.org/D16398

llvm-svn: 258397
2016-01-21 13:36:01 +00:00
Krzysztof Parzyszek 14f9535eec PR26172: unnecessary indirection in HexagonCopyToCombine.cpp
llvm-svn: 258395
2016-01-21 12:45:17 +00:00
Marina Yatsina ff262fa807 [X86] - Removing warning on legal cases caused by commit r258132
There's an overloading of the "movsd" and "cmpsd" instructions, e.g. movsd can be either "Move Data from String to String" or "Move or Merge Scalar Double-Precision Floating-Point Value".
The former should produce warnings when parsing a memory operand that is not ESI/EDI, but the latter should not.

Fixed the code to produce warnings only after making sure we're dealing with the first case.

Expanded the tests of the produced warnings + fixed RUN line of the test so that it would check both stdout and stderr

Differential Revision: http://reviews.llvm.org/D16359

llvm-svn: 258393
2016-01-21 11:37:06 +00:00
Tom Stellard de008d338c AMDGPU/SI: Pass whether to use the SI scheduler via Target Attribute
Summary:
Currently the SI scheduler can be selected via command line option,
but it turned out it would be better if it was selectable via a Target Attribute.

This patch adds "si-scheduler" attribute to the backend.

Reviewers: tstellarAMD, echristo

Subscribers: echristo, arsenm

Differential Revision: http://reviews.llvm.org/D16192

llvm-svn: 258386
2016-01-21 04:28:34 +00:00
Tom Stellard d1efda8e9e AMDGPU/SI: Promote i1 SETCC operations
Summary:
While working on uniform branching, I've hit a few cases where we emit
i1 SETCC operations.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16233

llvm-svn: 258352
2016-01-20 21:48:24 +00:00
Matt Arsenault 7836f895fe AMDGPU: Fix old comments that mention AMDIL
llvm-svn: 258350
2016-01-20 21:22:21 +00:00
Matt Arsenault 7ba334a7d9 AMDGPU: Remove AMDGPU.trunc intrinsic
llvm-svn: 258348
2016-01-20 21:05:53 +00:00
Matt Arsenault 15fbe49daf AMDGPU: Remove AMDIL.fraction intrinsic
llvm-svn: 258347
2016-01-20 21:05:49 +00:00
Matt Arsenault 7cccd2672e AMDGPU: Remove AMDIL.round.nearest intrinsic
llvm-svn: 258346
2016-01-20 21:05:40 +00:00
Matt Arsenault 1c9e4ef0df AMDGPU: Remove abs intrinsic
llvm-svn: 258343
2016-01-20 20:58:29 +00:00
Matt Arsenault f7e6e89718 AMDGPU: Remove min/max intrinsics
This removes support for mesa 11.0.x

llvm-svn: 258342
2016-01-20 20:50:19 +00:00
Keith Walker 8c44bf1b89 Write AArch64 big endian data fixup entries as BE.
There was support for writing the AArch64 big endian data fixup entries in
the .eh_frame section in BE.    This is changed to write all such fixup
entries in BE with no restriction on the section.  This is similar to
the existing support for fixup entries for ARM.

A test is added to check the length field in the .debug_line section as
this is an example of where such a fixup occurs.

Differential Revision: http://reviews.llvm.org/D16064

llvm-svn: 258320
2016-01-20 15:59:14 +00:00
Tom Stellard 77a177722f Correctly initialize SIAnnotateControlFlow
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16304

llvm-svn: 258319
2016-01-20 15:48:27 +00:00
Michael Zuckerman 65c40afb03 [AVX512] Adding VPERMB Intrinsics
Differential Revision: http://reviews.llvm.org/D16296

llvm-svn: 258316
2016-01-20 15:24:56 +00:00
Marina Yatsina 701938d64e Fixing bug in rL258132: [X86] Adding support for missing variations of X86 string related instructions
There was a bug in my rL258132 because there's an overloading of the "movsd" and "cmpsd" instructions, e.g. movsd can be either "Move Data from String to String" (the case I wanted to handle) or "Move or Merge Scalar Double-Precision Floating-Point Value" (the case that causes the asserts).
Added  code for escaping the unfamiliar scenarios and falling back to old behviour.
Also changed the asserts to llvm_unreachable. 

llvm-svn: 258312
2016-01-20 14:03:47 +00:00
Igor Breger d3341f5021 AVX512: Store (MOVNTPD, MOVNTPS, MOVNTDQ) using non-temporal hint intrinsic implementation.
Differential Revision: http://reviews.llvm.org/D16350

llvm-svn: 258309
2016-01-20 13:11:47 +00:00
Oliver Stannard f7696f8267 [AArch64] Fix two bugs in the .inst directive
The AArch64 .inst directive was implemented using EmitIntValue, which resulted
in both $x and $d (code and data) mapping symbols being emitted at the same
address. This fixes it to only emit the $x mapping symbol.

EmitIntValue also emits the value in big-endian order when targeting big-endian
systems, but instructions are always emitted in little-endian order for
AArch64.

Differential Revision: http://reviews.llvm.org/D16349

llvm-svn: 258308
2016-01-20 12:54:31 +00:00
Dan Gohman 8394756937 [WebAssembly] Minor code cleanups. NFC.
llvm-svn: 258294
2016-01-20 05:54:22 +00:00
Dan Gohman 26cf4f3689 [WebAssembly] Remove the Relooper code, as it is not currently being used.
llvm-svn: 258293
2016-01-20 05:50:29 +00:00
Dan Gohman 7e64917fd1 [WebAssembly] Don't stackify stores across instructions with side effects.
llvm-svn: 258285
2016-01-20 04:21:16 +00:00
Eduard Burtescu 23c4d83aa3 [NFC] Replace several manual GEP loops with gep_type_iterator.
Reviewers: dblaikie

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D16335

llvm-svn: 258262
2016-01-20 00:26:52 +00:00
Matthias Braun 5d458617aa RegisterPressure: Make liveness tracking subregister aware
Differential Revision: http://reviews.llvm.org/D14968

llvm-svn: 258258
2016-01-20 00:23:26 +00:00
Tom Stellard 2e045bbc5f AMDGPU/SI: Prevent the DAGCombiner from creating setcc with i1 inputs
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15035

llvm-svn: 258256
2016-01-20 00:13:22 +00:00
Quentin Colombet 4cf56917ea [X86] Do not run shrink-wrapping on function with split-stack attribute or HiPE
calling convention.
The implementation of the related callbacks in the x86 backend for such
functions are not ready to deal with a prologue block that is not the entry
block of the function.

This fixes PR26107, but the longer term solution would be to fix those callbacks.

llvm-svn: 258221
2016-01-19 23:29:03 +00:00
David Majnemer ce10842036 [MC, COFF] Add .reloc support for WinCOFF
This adds rudimentary support for a few relocations that we will use for
the CodeView debug format.

llvm-svn: 258216
2016-01-19 23:05:27 +00:00
Simon Pilgrim 4b919b2ab3 [X86][SSE] Add VZEXT_MOVL target shuffle decoding.
Add support for decoding VZEXT_MOVL target shuffle masks, allowing it to be used as a source in target shuffle combines.

llvm-svn: 258215
2016-01-19 23:04:56 +00:00
Simon Pilgrim e74653b67a [X86][SSE] Add INSERTPS target shuffle combines.
As vector shuffles can only reference two inputs many (V)INSERTPS patterns end up being split over two targets shuffles.

This patch adds combines to attempt to combine (V)INSERTPS nodes with input/output nodes that are just zeroing out these additional vector elements.

Differential Revision: http://reviews.llvm.org/D16072

llvm-svn: 258205
2016-01-19 22:24:12 +00:00
Chad Rosier 5c72966ea3 [AArch64] Remove a bunch of useless FIXME comments.
llvm-svn: 258193
2016-01-19 21:47:24 +00:00
Dan Gohman cff798386e [WebAssembly] Remove an unused data member. NFC.
llvm-svn: 258192
2016-01-19 21:31:41 +00:00
Chad Rosier b11c82d3e2 [AArch64] Remove more dead code after r258093.
llvm-svn: 258191
2016-01-19 21:27:05 +00:00
JF Bastien 17999f20fa WebAssembly: mark known failure caused by r258125
The following test program triggers the assertion:
https://github.com/gcc-mirror/gcc/blob/master/gcc/testsuite/gcc.c-torture/execute/20030916-1.c

llvm-svn: 258182
2016-01-19 20:53:12 +00:00
Michael Zuckerman 4582bdab12 [AVX512] Adding VPERMT2B and VPERMI2B instruction .
Differential Revision: http://reviews.llvm.org/D16297

llvm-svn: 258161
2016-01-19 18:47:02 +00:00
Eduard Burtescu 19eb03106d [opaque pointer types] [NFC] GEP: replace get(Pointer)ElementType uses with get{Source,Result}ElementType.
Summary:
GEPOperator: provide getResultElementType alongside getSourceElementType.
This is made possible by adding a result element type field to GetElementPtrConstantExpr, which GetElementPtrInst already has.

GEP: replace get(Pointer)ElementType uses with get{Source,Result}ElementType.

Reviewers: mjacob, dblaikie

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D16275

llvm-svn: 258145
2016-01-19 17:28:00 +00:00
Michael Zuckerman d9cac592f4 [AVX512] Adding VPERMB instruction
Differential Revision: http://reviews.llvm.org/D16294

llvm-svn: 258144
2016-01-19 17:07:43 +00:00
Dan Gohman b6fd39a3a7 [WebAssembly] Rematerialize constants rather than hold them live in registers.
Teach the register stackifier to rematerialize constants that have multiple
uses instead of leaving them in registers. In the WebAssembly encoding, it's
the same code size to materialize most constants as it is to read a value
from a register.

llvm-svn: 258142
2016-01-19 16:59:23 +00:00
Chad Rosier 401a4ab8d8 Typo.
llvm-svn: 258137
2016-01-19 16:50:45 +00:00
Marina Yatsina d9658d16fd [X86] Add support for "xlat m8"
According to x86 spec "xlat m8" is a legal instruction and it is equivalent to "xlatb".

Differential Revision: http://reviews.llvm.org/D15150

llvm-svn: 258135
2016-01-19 16:35:38 +00:00
Marina Yatsina b9f4f62cfe [X86] Adding support for missing variations of X86 string related instructions
The following are legal according to X86 spec:
ins mem, DX
outs DX, mem
lods mem
stos mem
scas mem
cmps mem, mem
movs mem, mem

Differential Revision: http://reviews.llvm.org/D14827

llvm-svn: 258132
2016-01-19 15:37:56 +00:00
Dan Gohman b13c91f159 [WebAssembly] Disable some WebAssembly-specific optimization passes at -O0.
llvm-svn: 258127
2016-01-19 14:55:02 +00:00
Dan Gohman 3196650bf3 [WebAssembly] Use the templated form of MachineFunction::getSubtarget(). NFC.
llvm-svn: 258126
2016-01-19 14:53:19 +00:00
Asaf Badouh d4a0d9a78c [X86][AVX512]fix dag & add intrinsics for fixupimm
cover all width and types (pd/ps/sd/ss) of fixupimm instruction and inrtinsics

Differential Revision: http://reviews.llvm.org/D16313

llvm-svn: 258124
2016-01-19 14:21:39 +00:00
Matt Arsenault 33e3ecee0c AMDGPU: Reduce 64-bit SRAs
llvm-svn: 258096
2016-01-18 22:09:04 +00:00
Matt Arsenault 6e3a45193a AMDGPU: Split 64-bit and of constant up
This breaks the tests that were meant for testing
64-bit inline immediates, so move those to shl where
they won't be broken up.

This should be repeated for the other related bit ops.

llvm-svn: 258095
2016-01-18 22:01:13 +00:00
Chad Rosier 234bf6fe5c [AArch64] Remove unused arguments. NFC.
AFAICT, these have been unused since the initial backend import.

llvm-svn: 258093
2016-01-18 21:56:40 +00:00
Matt Arsenault 3cbbc10488 AMDGPU: Generalize shl combine
Reduce 64-bit shl with constant > 32. We already special cased
this for the == 32 case, but this also works for any >= 32 constant.

llvm-svn: 258092
2016-01-18 21:55:14 +00:00
Matt Arsenault 80edab99ff AMDGPU: Reduce 64-bit lshr by constant to 32-bit
64-bit shifts are very slow on some subtargets.

llvm-svn: 258090
2016-01-18 21:43:36 +00:00
Matt Arsenault e83690c1cc AMDGPU: Add subtarget feature for instruction rates
llvm-svn: 258085
2016-01-18 21:13:50 +00:00
Simon Pilgrim 99c6c29c0c Fixed MSVC Win64 warning of implicit conversion of 32-bit shift to 64-bits.
llvm-svn: 258084
2016-01-18 21:11:19 +00:00
Simon Pilgrim 3e5fb61978 [X86][AVX2] Broadcast subvectors
AVX2 can only broadcast from the zero'th element of a vector, but if the broadcastable element is the zero'th element of a 128-bit subvector its advantageous to extract the subvector, broadcast from that and avoid the loading of shuffle mask data that would be needed for VPERMPS/VPERMD. The only exception being when the source type is 4f64 or 4i64 which can directly use the immediate shuffle VPERMPD/VPERMQ directly.

Differential Revision: http://reviews.llvm.org/D16050

llvm-svn: 258081
2016-01-18 20:59:04 +00:00
Krzysztof Parzyszek 7aae9b3782 [Hexagon] Recognize more copy-equivalents in RDF optimizations
llvm-svn: 258076
2016-01-18 20:45:51 +00:00
Krzysztof Parzyszek adc64b7df0 [RDF] Improvements to copy propagation
- Allow any instruction to define equality between registers.
- Keep the DFG updated.

llvm-svn: 258075
2016-01-18 20:43:57 +00:00
Krzysztof Parzyszek e6b0662092 [RDF] Improve compile-time performance of dead code elimination
llvm-svn: 258074
2016-01-18 20:42:47 +00:00
Krzysztof Parzyszek 69e670d5f9 [RDF] Allow unlinking ref nodes from data-flow chains only
llvm-svn: 258073
2016-01-18 20:41:34 +00:00
Igor Breger 239fda676c AVX512: Masked store intrinsic implementation.
Implemented intrinsic for the follow instructions (store) : VMOVDQU8/16/32/64, VMOVDQA32/64, VMOVAPS/PD, VMOVUPS/PD.

Differential Revision: http://reviews.llvm.org/D16271

llvm-svn: 258047
2016-01-18 13:52:57 +00:00
Elena Demikhovsky 9242ea87d6 Added Cannonlake processor to X86 Target
Differential Revision: http://reviews.llvm.org/D16289

llvm-svn: 258046
2016-01-18 13:00:31 +00:00
Igor Breger dd6522c653 AVX512 : Change v8i1 bitconvert GR8 pattern, remove unnecessary movzbl instruction.
code example , previous implementation.
    movzbl  %dil, %eax
    kmovw  %eax, %k0
  new code
    kmovw  %edi, %k0

Differential Revision: http://reviews.llvm.org/D16287

llvm-svn: 258045
2016-01-18 12:02:45 +00:00
Oliver Stannard 9f68749eba [ARM] Operands for PKHTB alias should be swapped
When the shift immediate is zero, PKHTB is an alias for PKHBT, but the order of
the input operands needs to be swapped.

Differential Revision: http://reviews.llvm.org/D16288

llvm-svn: 258044
2016-01-18 11:56:35 +00:00
Manuel Jacob 190577ac81 [opaque pointer types] [NFC] CallSite: use getFunctionType() instead of going through PointerType::getElementType.
Patch by Eduard Burtescu.

Reviewers: dblaikie, mjacob

Subscribers: dsanders, llvm-commits, dblaikie

Differential Revision: http://reviews.llvm.org/D16273

llvm-svn: 258023
2016-01-17 22:37:39 +00:00
Michael Zuckerman 97b6a6923e [AVX512] adding AVXVBMI feature flag
The feature flag is for VPERMB,VPERMI2B,VPERMT2B and VPMULTISHIFTQB instructions. 
More about the instruction can be found in:
hattps://software.intel.com/sites/default/files/managed/07/b7/319433-023.pdf

Differential Revision: http://reviews.llvm.org/D16190

llvm-svn: 258012
2016-01-17 13:42:12 +00:00
Igor Breger e1f273d900 AVX512: Use MemIntrinsicSDNode to implement load/store intrinsic.
Differential Revision: http://reviews.llvm.org/D16184

llvm-svn: 258009
2016-01-17 12:10:24 +00:00
Michael Zuckerman ac1b238b0a [AVX512] Adding VPERMW/D/Q VPERMPS/D Intrinsics
Differential Revision: http://reviews.llvm.org/D16189

llvm-svn: 258008
2016-01-17 11:33:29 +00:00
Michael Zuckerman ede597c753 [AVX512] Adding VPERMQ VPERMPD Intrinsics
Differential Revision: http://reviews.llvm.org/D16194

llvm-svn: 258006
2016-01-17 08:32:14 +00:00
Simon Pilgrim 20f31fa31a [X86][AVX] Enable extraction of upper 128-bit subvectors for 'half undef' shuffle lowering
Added support for the extraction of the upper 128-bit subvectors for lower/upper half undef shuffles if it would reduce the number of extractions/insertions or avoid loads of AVX2 permps/permd shuffle masks.

Minor follow up to D15477.

llvm-svn: 258000
2016-01-16 22:30:20 +00:00
Manuel Jacob 5f6eaac611 GlobalValue: use getValueType() instead of getType()->getPointerElementType().
Reviewers: mjacob

Subscribers: jholewinski, arsenm, dsanders, dblaikie

Patch by Eduard Burtescu.

Differential Revision: http://reviews.llvm.org/D16260

llvm-svn: 257999
2016-01-16 20:30:46 +00:00
Manman Ren 53a54c41d7 CXX_FAST_TLS calling convention: fix issue on x86-64.
%RBP can't be handled explicitly. We generate the following code:
    pushq %rbp
    movq  %rsp, %rbp
    ...
    movq  %rbx, (%rbp)  ## 8-byte Spill
where %rbp will be overwritten by the spilled value.

The fix is to let PEI handle %RBP.
PR26136

llvm-svn: 257997
2016-01-16 16:39:46 +00:00
NAKAMURA Takumi 33ff1dda6a [Cygwin] Use -femulated-tls by default since r257718 introduced the new pass.
FIXME: Add more targets to use emutls into clang/test/Driver/emulated-tls.cpp.
FIXME: Add cygwin tests into llvm/test/CodeGen/X86. Working in progress.
llvm-svn: 257984
2016-01-16 03:44:52 +00:00
Dan Gohman 7f86ca1803 [WebAssembly] Add some more README.txt entries.
llvm-svn: 257969
2016-01-16 00:20:03 +00:00
Kevin B. Smith c831a08fbf [X86]: Make param names in header and body match for isCalleePop.
Differential Revision: http://reviews.llvm.org/D16246

llvm-svn: 257965
2016-01-16 00:08:36 +00:00
Dan Gohman 2f301f3e92 [WebAssembly] Don't create a needless .note.GNU-stack section
WebAssembly's stack will never be executable by default, so it isn't
necessary to declare .note.GNU-stack sections to request a non-executable
stack.

Differential Revision: http://reviews.llvm.org/D15969

llvm-svn: 257962
2016-01-15 23:59:13 +00:00
Artem Belevich 5be0706ebe [NVPTX] Do not emit .hidden or .protected directives as they are not allowed by PTX.
llvm-svn: 257961
2016-01-15 23:57:53 +00:00
Manman Ren e5f807f928 CXX_FAST_TLS calling convention: fix issue on ARM.
When we have a single basic block, the explicit copy-back instructions should
be inserted right before the terminator. Before this fix, they were wrongly
placed at the beginning of the basic block.

PR26136

llvm-svn: 257930
2016-01-15 20:24:11 +00:00
Manman Ren 4632e8e625 CXX_FAST_TLS calling convention: fix issue on AArch64.
When we have a single basic block, the explicit copy-back instructions should
be inserted right before the terminator. Before this fix, they were wrongly
placed at the beginning of the basic block.

I will commit fixes to other platforms as well.

PR26136

llvm-svn: 257929
2016-01-15 20:13:28 +00:00
Manman Ren 4fe01bd8f9 CXX_FAST_TLS calling convention: fix issue on X86-64.
When we have a single basic block, the explicit copy-back instructions should
be inserted right before the terminator. Before this fix, they were wrongly
placed at the beginning of the basic block.

I will commit fixes to other platforms as well.

PR26136

llvm-svn: 257925
2016-01-15 19:35:42 +00:00
Kyle Butt 132bf36161 Codegen: [PPC] Silence false-positive initialization warning. NFC
Some compilers don't do exhaustive switch checking. For those compilers,
add an initialization to prevent un-initialized variable warnings from
firing. For compilers with exhaustive switch checking, we still get a
guarantee that the switch is exhaustive, and hence the initializations
are redundant, and a non-functional change.

llvm-svn: 257923
2016-01-15 19:20:06 +00:00
Reid Kleckner d4a0d18899 Revert "[ARM] Add ARMv8-M security extension instructions to ARMv8-M Baseline/Mainline"
This reverts commit r257883.

Somehow this didn't make it into r257916.

llvm-svn: 257919
2016-01-15 18:55:12 +00:00
Reid Kleckner 47f2452da8 # This is a combination of 2 commits.
# The first commit's message is:

Revert "[ARM] Add DSP build attribute and extension targeting"

This reverts commit b11cc50c0b4a7c8cdb628abc50b7dc226ff583dc.

# This is the 2nd commit message:

Revert "[ARM] Add new system registers to ARMv8-M Baseline/Mainline"

This reverts commit 837d08454e3e5beb8581951ac26b22fa07df3cd5.

llvm-svn: 257916
2016-01-15 18:31:29 +00:00
Krzysztof Parzyszek 2a3b2f9841 [Hexagon] Generate CONST64 when optimizing for size in copy-to-combine
llvm-svn: 257891
2016-01-15 14:08:31 +00:00
Krzysztof Parzyszek 9b7320e621 [Hexagon] Handle DBG_VALUE instructions in copy-to-combine
llvm-svn: 257890
2016-01-15 13:55:57 +00:00
Bradley Smith 48b93e1f21 [ARM] Add DSP build attribute and extension targeting
llvm-svn: 257885
2016-01-15 10:28:25 +00:00
Bradley Smith 42f6e90a43 [ARM] Add new system registers to ARMv8-M Baseline/Mainline
llvm-svn: 257884
2016-01-15 10:28:03 +00:00
Bradley Smith 618712df04 [ARM] Add ARMv8-M security extension instructions to ARMv8-M Baseline/Mainline
llvm-svn: 257883
2016-01-15 10:27:14 +00:00
Bradley Smith 433c22e35c [ARM] Add ARMv8-A semaphore/atomic instructions to ARMv8-M Baseline/Mainline
llvm-svn: 257882
2016-01-15 10:26:51 +00:00
Bradley Smith a1189106d5 [ARM] Add B.W and CBZ instructions to ARMv8-M Baseline
llvm-svn: 257881
2016-01-15 10:26:17 +00:00
Bradley Smith 519563e371 [ARM] Add SDIV/UDIV instructions to ARMv8-M Baseline
llvm-svn: 257880
2016-01-15 10:25:35 +00:00
Bradley Smith d9a99ce53d [ARM] Add MOVW/MOVT instructions to ARMv8-M Baseline/Mainline
llvm-svn: 257879
2016-01-15 10:25:14 +00:00
Bradley Smith e26f799422 [ARM] Add ARMv8-M Baseline/Mainline LLVM targeting
llvm-svn: 257878
2016-01-15 10:24:39 +00:00
Bradley Smith 4c21cba72b [ARM] Split out ARMv8-A semaphores and atomics and ARMv7 clrex as separate features
llvm-svn: 257877
2016-01-15 10:23:46 +00:00
Jonas Paulsson 5b29e096ac [SystemZ] Fix bad instruction name
SLGBR -> SLBGR

Reviewed by Ulrich Weigand

llvm-svn: 257874
2016-01-15 07:12:09 +00:00
Pete Cooper 835594e627 Delete MCRelocationInfo::createExprForRelocation.
This method has no callers.

Also remove X86ELFRelocationInfo.cpp and X86MachORelocationInfo.cpp
which only existed to provide an implementation of that method.

Ok'd by Rafael and Jim.

llvm-svn: 257859
2016-01-15 02:24:12 +00:00
Weiming Zhao 038393bba0 Fix AArch64ConditionOptimizer
Summary:
This pass may modify the Cmp operands. However, the flag reg may be used by both the branch and CSEL.
Modifying CMP will have side effect on CSEL.

Reviewers: t.p.northover

Subscribers: llvm-commits, aemerson, rengolin

Differential Revision: http://reviews.llvm.org/D16147

llvm-svn: 257844
2016-01-15 00:06:58 +00:00
Krzysztof Parzyszek 0d11212f00 [Hexagon] Use S2_lsr_i_r instead of S2_extractu to obtain upper halfword
llvm-svn: 257815
2016-01-14 21:59:22 +00:00
Krzysztof Parzyszek 5337a3e965 [Hexagon] Handle HVX registers in bit simplification
llvm-svn: 257811
2016-01-14 21:45:43 +00:00
Rui Ueyama da00f2fdf4 Update to use new name alignTo().
llvm-svn: 257804
2016-01-14 21:06:47 +00:00
Rafael Espindola c897cdde70 Handle offsets larger than 32 bits.
David Majnemer noticed that it was not obvious what the behavior would
be if B.Offset - A.Offset could not fit in an int.

llvm-svn: 257803
2016-01-14 21:03:06 +00:00
Rafael Espindola 56cb2734e3 Assert that a cmp function defines a total order.
Thanks to David Blaikie for noticing it.

llvm-svn: 257796
2016-01-14 20:28:25 +00:00
Krzysztof Parzyszek 237b96132d [Hexagon] Expand pseudo instruction Insert4
llvm-svn: 257771
2016-01-14 15:37:16 +00:00
Krzysztof Parzyszek b28ae10a16 [Hexagon] Handle branches with non-mbb operands
llvm-svn: 257768
2016-01-14 15:05:27 +00:00
Benjamin Kramer fc1f7d893e [ARM] Use the efficient version of BitVector::set and a static_assert.
No functional change intended.

llvm-svn: 257766
2016-01-14 14:33:04 +00:00
Igor Breger fc96331d88 AVX512: VMOVDQA32/64 (load) intrinsic implementation.
Differential Revision: http://reviews.llvm.org/D16142

llvm-svn: 257749
2016-01-14 07:56:04 +00:00
Ahmed Bougacha dfc77357a0 [AArch64] Don't assume extractelt constant index when matching shuffle.
llvm-svn: 257735
2016-01-14 02:12:30 +00:00
JF Bastien d1bd129d00 WebAssembly: mark a few new failures
A recent change introduced this assertion failure in some corner cases.

Repro:
mkdir /s/wasm/torture-out ; time /s/wasm/waterfall/src/compile_torture_tests.py --c /s/llvm/out/bin/clang --cxx /s/llvm/out/bin/clang++ --testsuite /s/gcc/gcc/testsuite --fails /s/llvm/llvm/lib/Target/WebAssembly/known_gcc_test_failures.txt --out /s/wasm/torture-out

Or look on the wasm integration bot:
https://build.chromium.org/p/client.wasm.llvm/console

llvm-svn: 257733
2016-01-14 01:49:22 +00:00
David Majnemer 3463e696fb [X86] Don't alter HasOpaqueSPAdjustment after we've relied on it
We rely on HasOpaqueSPAdjustment not changing after we've calculated
things based on it.  Things like whether or not we can use 'rep;movs' to
copy bytes around, that sort of thing.  If it changes, invariants in the
backend will quietly break.  This situation arose when we had a call to
memcpy *and* a COPY of the FLAGS register where we would attempt to
reference local variables using %esi, a register that was clobbered by
the 'rep;movs'.

This fixes PR26124.

llvm-svn: 257730
2016-01-14 01:20:03 +00:00
JF Bastien 664fd461c2 WebAssembly: fix build break introduced by ELFObjectWriter churn
llvm-svn: 257709
2016-01-13 23:36:00 +00:00
Rafael Espindola 8340f94df1 Convert a few assert failures into proper errors.
Fixes PR25944.

llvm-svn: 257697
2016-01-13 22:56:57 +00:00
Krzysztof Parzyszek a61f7da6ba [Hexagon] Fix the options controlling jump table generation
llvm-svn: 257679
2016-01-13 21:43:13 +00:00
Changpeng Fang c16be00313 AMDGPU/SI: Update ISA version for FIJI
llvm-svn: 257666
2016-01-13 20:39:25 +00:00
Dan Gohman a39ca60126 [WebAssembly] Add an assertion to catch unexpected MCFixupKindInfo flags.
llvm-svn: 257657
2016-01-13 19:31:57 +00:00
Dan Gohman 938ff9f0aa [WebAssembly] MCFixupKindInfo's TargetSize is in bits rather than bytes.
llvm-svn: 257655
2016-01-13 19:29:37 +00:00
Hans Wennborg 81efb6b418 Fix struct/class mismatch for MachineSchedContext
llvm-svn: 257648
2016-01-13 18:59:45 +00:00
Marek Olsak 46dadbfab2 AMDGPU/SI: Fix a GPU hang with POS_W_FLOAT enabled
Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D16037

llvm-svn: 257625
2016-01-13 17:23:20 +00:00
Marek Olsak 3c0ebc71f1 AMDGPU/SI: Remove ending s_endpgm from non-void functions
Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D16035

llvm-svn: 257623
2016-01-13 17:23:12 +00:00
Marek Olsak 8e9cc63bfb AMDGPU/SI: Add s_waitcnt at the end of non-void functions
Summary:
v2: Make ReturnsVoid private, so that I can another 8 lines of code and
    look more productive.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D16034

llvm-svn: 257622
2016-01-13 17:23:09 +00:00
Marek Olsak 8a0f335ad6 AMDGPU/SI: Add support for non-void functions
Summary:
Return values can be stored in SGPRs (i32) and VGPRs (f32).

This will be used by functions which expect some bytecode or other binary to
be appended at the end. It allows defining in which registers the return
values will be stored.

v2: don't do this for compute shaders

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D16033

llvm-svn: 257621
2016-01-13 17:23:04 +00:00
Derek Schuff 9c3bf3187a [WebAssemly] Invalidate liveness in CFG stackifier
WebAssemblyCFGStackify does not track liveness for EXPR_STACK, causing
verifier failure if liveness has not already been invalidated.

llvm-svn: 257620
2016-01-13 17:10:28 +00:00
Nicolai Haehnle 02c3291566 AMDGPU/SI: Add SI Machine Scheduler
Summary:
It is off by default, but can be used
with --misched=si

Patch by: Axel Davy

Reviewers: arsenm, tstellarAMD, nhaehnle

Subscribers: nhaehnle, solenskiner, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D11885

llvm-svn: 257609
2016-01-13 16:10:10 +00:00
Michael Zuckerman 6b35f460ac Fixing warning by adding the X86ISD::VROTRI case.
Differential Revision: http://reviews.llvm.org/D16052 

llvm-svn: 257607
2016-01-13 15:48:42 +00:00
Krzysztof Parzyszek a3c5d44437 [Hexagon] Do not insert non-phis before phis in bit simplification
llvm-svn: 257606
2016-01-13 15:48:18 +00:00
Michael Zuckerman 0e31b22487 [AVX512] Adding PMOVSXBD/W/Q , PMOVZSDQ and PMOVZSWD/Q Intrinsics .
Differential Revision: http://reviews.llvm.org/D16111 

llvm-svn: 257604
2016-01-13 14:59:19 +00:00
Michael Zuckerman 43cea85db9 [AVX512] Adding PMOVZXBD/W/Q , PMOVZXDQ and PMOVZXWD/Q Intrinsics
Differential Revision:http://reviews.llvm.org/D16071

llvm-svn: 257601
2016-01-13 14:25:21 +00:00
Ulrich Weigand 46ff7ec317 [PowerPC] Fix large code model with the ELFv2 ABI
The global entry point prologue currently assumes that the TOC
associated with a function is less than 2GB away from the function
entry point.  This is always true when using the medium or small
code model, but may not be the case when using the large code model.

This patch adds a new variant of the ELFv2 global entry point prologue
that lifts the 2GB restriction when building with -mcmodel=large.
This works by emitting a quadword containing the distance from the
function entry point to its associated TOC immediately before the
entry point, and then using a prologue like:

ld r2,-8(r12)
add r2,r2,r12

Since creation of the entry point prologue is now split across two
separate routines (PPCLinuxAsmPrinter::EmitFunctionEntryLabel emits
the data word, PPCLinuxAsmPrinter::EmitFunctionBodyStart the prolog
code), I've switched to using named labels instead of just temporaries
to indicate the locations of the global and local entry points and the
new TOC offset data word.

These names are provided by new routines in PPCFunctionInfo modeled
after the existing PPCFunctionInfo::getPICOffsetSymbol.

Note that a corresponding change was committed to GCC here:
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00355.html

Reviewers: hfinkel

Differential Revision: http://reviews.llvm.org/D15500

llvm-svn: 257597
2016-01-13 13:12:23 +00:00
Michael Zuckerman 298a680c80 [AVX512] adding PRORQ , PRORD , PRORLVQ and PRORLVD Intrinsics
Differential Revision: http://reviews.llvm.org/D16052

llvm-svn: 257594
2016-01-13 12:39:33 +00:00
Marek Olsak 4e99b6ec01 AMDGPU/SI: Allow more shader inputs
Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D16032

llvm-svn: 257593
2016-01-13 11:46:48 +00:00
Marek Olsak b6c8c3d165 AMDGPU/SI: Allow any number of PS inputs
Summary:
With the ability to concatenate shader binaries, the limit of 15 no longer
applies.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D16031

llvm-svn: 257592
2016-01-13 11:46:10 +00:00
Marek Olsak fccabaf57e AMDGPU/SI: Add new target attribute InitialPSInputAddr
Summary:
This allows Mesa to pass initial SPI_PS_INPUT_ADDR to LLVM.
The register assigns VGPR locations to PS inputs, while the ENA register
determines whether or not they are loaded.

Mesa needs to set some inputs as not-movable, so that a pixel shader prolog
binary appended at the beginning can assume where some inputs are.

v2: Make PSInputAddr private, because there is never enough silly getters
    and setters for people to read.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D16030

llvm-svn: 257591
2016-01-13 11:45:36 +00:00
Marek Olsak 926c56f50c AMDGPU/SI: Fix a bug in SIFoldOperands
Summary: ret.ll will contain a test for this

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D16029

llvm-svn: 257590
2016-01-13 11:44:29 +00:00
Andrey Turetskiy 1ce2c9973f LEA code size optimization pass (Part 2): Remove redundant LEA instructions.
Make x86 OptimizeLEAs pass remove LEA instruction if there is another LEA
(in the same basic block) which calculates address differing only be a
displacement. Works only for -Oz.

Differential Revision: http://reviews.llvm.org/D13295

llvm-svn: 257589
2016-01-13 11:30:44 +00:00
James Y Knight 7699494f08 [SPARC] Revamp AnalyzeBranch and add ReverseBranchCondition.
AnalyzeBranch on X86 (and, previously, SPARC, which implementation was
copied from X86) tries to modify the branches based on block
layout (e.g. checking isLayoutSuccessor), when AllowModify is true.

The rest of the architectures leave that up to the caller, which can
call InsertBranch, RemoveBranch, and ReverseBranchCondition as
appropriate. That appears to be the preferred way to do it nowadays.

This commit makes SPARC like the rest: replaces AnalyzeBranch with an
implementation cribbed from AArch64, and adds a ReverseBranchCondition
implementation.

Additionally, a test-case has been added (also cribbed from AArch64)
demonstrating that redundant branch sequences no longer get emitted.

E.g., it used to emit code like this:
         bne .LBB1_2
         nop
         ba .LBB1_1
         nop
 .LBB1_2:

And now emits:
        cmp %i0, 42
        be .LBB1_1
        nop

llvm-svn: 257572
2016-01-13 04:44:14 +00:00
Ana Pazos 359cab3bb3 Guard fabs to bfc convert with V6T2 flag
Summary:
BFC instructions are available in ARMv6T2 and above.


Reviewers: t.p.northover

Subscribers: aemerson

Differential Revision: http://reviews.llvm.org/D16076

llvm-svn: 257546
2016-01-13 00:03:35 +00:00
Quentin Colombet f8e3030794 [ARM] Mark VMOV with immediate: isAsCheapAsMove.
VMOVs are not strictly speaking cheap, but they are as expensive as a vector
copy (VORR), so we should prefer rematerialization over splitting when it
applies.

rdar://problem/23754176

llvm-svn: 257545
2016-01-13 00:02:40 +00:00
Derek Schuff 4377e2d713 [WebAssembly] Fix disassembler shared-libs build
llvm-svn: 257536
2016-01-12 23:03:40 +00:00
Dan Gohman 0656f5f845 [WebAsssembly] Register the MC register info.
llvm-svn: 257525
2016-01-12 21:27:55 +00:00
Michael Zuckerman 2ddcbcf464 [AVX512] adding PROLQ and PROLD Intrinsics
Differential Revision: http://reviews.llvm.org/D16048

llvm-svn: 257523
2016-01-12 21:19:17 +00:00
Kyle Butt cec40806f1 Codegen: [PPC] Handle weighted comparisons when inserting selects.
Only non-weighted predicates were handled in PPCInstrInfo::insertSelect. Handle
the weighted predicates as well.

This latent bug was triggered by r255398, because it added use of the
branch-weighted predicates.

While here, switch over an enum instead of an int to get the compiler to enforce
totality in the future.

llvm-svn: 257518
2016-01-12 21:00:43 +00:00
Dan Gohman 4635017176 [WebAssembly] Add a EM_WEBASSEMBLY value, and several bits of code that use it.
A request has been made to the official registry, but an official value is
not yet available. This patch uses a temporary value in order to support
development. When an official value is recieved, the value of EM_WEBASSEMBLY
will be updated.

llvm-svn: 257517
2016-01-12 20:56:01 +00:00
Dan Gohman 3469ee120c [WebAssembly] Introduce a WebAssemblyTargetStreamer class.
Refactor .param, .result, .local, and .endfunc, as directives, using the
proper MCTargetStreamer mechanism, rather than fake instructions.

llvm-svn: 257511
2016-01-12 20:30:51 +00:00
Krzysztof Parzyszek f62d44be28 Replace inherited constructor with an explicit one
Some bots failed when the inherited constructor was used.

llvm-svn: 257508
2016-01-12 19:27:59 +00:00
Dan Gohman 1d68e80f26 [WebAssembly] Make CFG stackification independent of basic-block labels.
This patch changes the way labels are referenced. Instead of referencing the
basic-block label name (eg. .LBB0_0), instructions now just have an immediate
which indicates the depth in the control-flow stack to find a label to jump to.
This makes them much closer to what we expect to have in the binary encoding,
and avoids the problem of basic-block label names not being explicit in the
binary encoding.

Also, it terminates blocks and loops with end_block and end_loop instructions,
rather than basic-block label names, for similar reasons.

This will also fix problems where two constructs appear to have the same label,
because we no longer explicitly use labels, so consumers that need labels will
presumably create their own labels, and presumably they won't reuse labels
when they do.

This patch does make the code a little more awkward to read; as a partial
mitigation, this patch also introduces comments showing where the labels are,
and comments on each branch showing where it's branching to.

llvm-svn: 257505
2016-01-12 19:14:46 +00:00
Krzysztof Parzyszek 1279881315 [Hexagon] Implement RDF-based post-RA optimizations
- Handle simple cases of register copies (what current RDF CP allows).
- Hexagon-specific dead code elimination: handles dead address updates
  in post-increment instructions.

llvm-svn: 257504
2016-01-12 19:09:01 +00:00
Krzysztof Parzyszek c09d630e50 RDF: Copy propagation
This is a very limited implementation of DFG-based copy propagation.
It only handles actual COPY instructions (does not handle other equivalents
such as add-immediate with a 0 operand).
The major limitation is that it does not update the DFG: that will be the
change required to make it more robust (hopefully coming up soon).

llvm-svn: 257490
2016-01-12 17:23:48 +00:00
Tom Stellard f421837250 AMDGPU: Emit note directive for HSA even if there are no functions
Reviewers: arsenm, echristo

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16010

llvm-svn: 257488
2016-01-12 17:18:17 +00:00
Krzysztof Parzyszek 6f4000e763 RDF: Dead code elimination
Utility class to perform DFG-based dead code elimination.

llvm-svn: 257485
2016-01-12 17:01:16 +00:00
Krzysztof Parzyszek 8dca45efa8 Fix compiler warnings from r257477
llvm-svn: 257483
2016-01-12 16:51:55 +00:00
Krzysztof Parzyszek acdff46a9c RDF: Implement register liveness analysis
Compute block live-ins and operand kill flags from the DFG.

llvm-svn: 257480
2016-01-12 15:56:33 +00:00
Daniel Sanders 5e1d5a789a [mips] Correct operand order in DSP's mthi/mtlo
Summary: The result register is the second operand as per the other mt* instructions.

Reviewers: vkalintiris

Subscribers: llvm-commits, dsanders

Differential Revision: http://reviews.llvm.org/D15993

llvm-svn: 257478
2016-01-12 15:15:14 +00:00
Krzysztof Parzyszek b5b5a1d7ad Register Data Flow: data flow graph
Target independent, SSA-based data flow framework for representing
data flow between physical registers.

This commit implements the creation of the actual data flow graph.

llvm-svn: 257477
2016-01-12 15:09:49 +00:00
Benjamin Kramer ab8cc02ba5 [Hexagon] Make helper function static. NFC.
llvm-svn: 257476
2016-01-12 14:58:49 +00:00