Commit Graph

35996 Commits

Author SHA1 Message Date
Kyle Butt d62d8b771d Codegen: [PPC] Fix PPCVSXFMAMutate to handle duplicates.
The purpose of PPCVSXFMAMutate is to elide copies by changing FMA forms
on PPC.

    %vreg6<def> = COPY %vreg96
    %vreg6<def,tied1> = XSMADDASP %vreg6<tied0>, %vreg5<kill>, %vreg7
    ;v6 = v6 + v5 * v7

is replaced by

    %vreg5<def,tied1> = XSMADDMSP %vreg5<tied0>, %vreg7, %vreg96
    ;v5 = v5 * v7 + v96

This was broken in the case where the target register was also used as a
multiplicand. Fix this case by checking for it and replacing both uses
with the copied register.

    %vreg6<def> = COPY %vreg96
    %vreg6<def,tied1> = XSMADDASP %vreg6<tied0>, %vreg5<kill>, %vreg6
    ;v6 = v6 + v5 * v6

is replaced by

    %vreg5<def,tied1> = XSMADDMSP %vreg5<tied0>, %vreg96, %vreg96
    ;v5 = v5 * v96 + v96

llvm-svn: 259617
2016-02-03 01:41:09 +00:00
Yunzhong Gao eb959722a7 Revert r259576: Disable the vzeroupper insertion pass on PS4.
Will re-implement based on review feedback.

llvm-svn: 259615
2016-02-03 01:25:12 +00:00
Yunzhong Gao b76ccacfb1 Disable the vzeroupper insertion pass on PS4.
See comments in test/CodeGen/X86/avx-vzeroupper.ll for more explanation.

Original patch by: Sean Silva

llvm-svn: 259576
2016-02-02 21:39:23 +00:00
Matt Arsenault de4208122b AMDGPU: Do not promote allocas with non-inbounds GEPs
If we can't assume the pointer value isn't within the bounds
of the object, it seems risky to try to replace the pointer
calculations.

llvm-svn: 259573
2016-02-02 21:16:12 +00:00
Matt Arsenault 7e747f1a38 AMDGPU: Handle promoting memmove
Also add missing tests for the others.

llvm-svn: 259558
2016-02-02 20:28:10 +00:00
Quentin Colombet b8fb2ba1bb [X86] Fix the merging of SP updates in prologue/epilogue insertions.
When the merging was involving LEAs, we were taking the wrong immediate
from the list of operands.

rdar://problem/24446069

llvm-svn: 259553
2016-02-02 20:11:17 +00:00
Matt Arsenault 8b175672cb AMDGPU: Skip promote alloca with no optimizations
llvm-svn: 259551
2016-02-02 19:32:42 +00:00
Matt Arsenault fb8cdbae0c AMDGPU: Minor cleanups for AMDGPUPromoteAlloca
Mostly convert to use range loops.

llvm-svn: 259550
2016-02-02 19:32:35 +00:00
Matt Arsenault e5737f7cac AMDGPU: Report AMDGPUPromoteAlloca changed the function
llvm-svn: 259547
2016-02-02 19:18:57 +00:00
Matt Arsenault ad1348459f AMDGPU: Whitelist handled intrinsics
We shouldn't crash on unhandled intrinsics.
Also simplify failure handling in loop.

llvm-svn: 259546
2016-02-02 19:18:53 +00:00
Matt Arsenault 853a1fc6d9 AMDGPU: Use inbounds when calculating workitem offset
When promoting allocas to LDS, we know we are indexing
into a specific area just created, and the calculation
will also never overflow.

Also emit some of the muls as nsw nuw, because instcombine
infers this already from the range metadata. I think
putting this on the other adds and muls might be OK too,
but I'm not 100% sure.

llvm-svn: 259545
2016-02-02 19:18:48 +00:00
Eugene Zelenko ecefe5a81f Fix Clang-tidy readability-redundant-control-flow warnings; other minor fixes.
Differential revision: http://reviews.llvm.org/D16793

llvm-svn: 259539
2016-02-02 18:20:45 +00:00
Derek Schuff c6d8fd3f54 [MC] Enable eip-relative addressing on x86-64 for X32 ABI
Summary:
Enables eip-based addressing, e.g.,

lea    constant(%eip), %rax
lea    constant(%eip), %eax

in MC, (used for the x32 ABI). EIP-base addressing is also valid in x86_64,
it is left enabled for that architecture as well.

Patch by João Porto

Differential Revision: http://reviews.llvm.org/D16581

llvm-svn: 259528
2016-02-02 17:20:04 +00:00
Chad Rosier 1142f3cf90 [AArch64] Add a FIXME comment.
llvm-svn: 259515
2016-02-02 15:22:55 +00:00
Chad Rosier bba881ef3d [AArch64] Allocate the modified and used regs only once per function.
llvm-svn: 259510
2016-02-02 15:02:30 +00:00
JF Bastien 926b189a81 WebAssembly: update expected GCC torture test failures
The 3 programs used __attribute__((mode(?))) on enum, which clang r259497 fixed.

llvm-svn: 259508
2016-02-02 14:27:34 +00:00
Oliver Stannard 7e7d983a87 Refactor backend diagnostics for unsupported features
Re-commit of r258951 after fixing layering violation.

The BPF and WebAssembly backends had identical code for emitting errors
for unsupported features, and AMDGPU had very similar code. This merges
them all into one DiagnosticInfo subclass, that can be used by any
backend.

There should be minimal functional changes here, but some AMDGPU tests
have been updated for the new format of errors (it used a slightly
different format to BPF and WebAssembly). The AMDGPU error messages will
now benefit from having precise source locations when debug info is
available.

llvm-svn: 259498
2016-02-02 13:52:43 +00:00
Simon Pilgrim 96fe4ef5f7 [X86][AVX512] Add support for AVX512 VMOVQ (load) shuffle decoding
llvm-svn: 259496
2016-02-02 13:32:56 +00:00
JF Bastien dc1255f02f WebAssembly: add option to disable register coloring
Having this hidden option makes it easier to debug other issues.

llvm-svn: 259482
2016-02-02 09:30:01 +00:00
Sjoerd Meijer ffe19f5245 Removed FeatureVFPOnlySP from the Cortex-R7 processor model
description and changed the regression test accordingly.
The default configuration of a Cortex-R7 is to implement the
VFPv3-D16 architecture and the feature line as it was is too
restrictive.

llvm-svn: 259480
2016-02-02 09:28:20 +00:00
Sanjoy Das 881de4d12a [X86] Fix a bug in getMemOpBaseRegImmOfs
Fix a crash in `getMemOpBaseRegImmOfs` that happens if the base of
`MemOp` is a frame index memory operand.  The fix is to have
`getMemOpBaseRegImmOfs` bail out in such cases.  We can possibly be more
clever here, if needed.

llvm-svn: 259456
2016-02-02 02:32:43 +00:00
Ahmed Bougacha 68a8efa374 [X86][FastISel] Don't force Nearest-Even rounding for VCVTPS2PH, use MXCSR.
FastISel counterpart to r259448.

llvm-svn: 259449
2016-02-02 01:44:03 +00:00
Ahmed Bougacha 55c6682ae2 [X86] Don't force Nearest-Even rounding for VCVTPS2PH, use MXCSR.
Officially, we don't acknowledge non-default configurations of MXCSR,
as getting there would require usage of the FENV_ACCESS pragma (at
least insofar as rounding mode is concerned).

We don't support the pragma, so we can assume that the default
rounding mode - round to nearest, ties to even - is always used.

However, it's inconsistent with the rest of the instruction set,
where MXCSR is always effective (unless otherwise specified).
Also, it's an unnecessary obstacle to the few brave souls that use
fenv.h with LLVM.

Avoid the hard-coded rounding mode for fp_to_f16; use MXCSR instead.

llvm-svn: 259448
2016-02-02 01:32:50 +00:00
Sanjay Patel c54600dbb1 fix typos; NFC
llvm-svn: 259438
2016-02-01 23:53:35 +00:00
Simon Pilgrim 5be17b6e3e [X86][AVX512] Add support for AVX512 VMOVD (load) shuffle decoding
llvm-svn: 259430
2016-02-01 23:04:05 +00:00
Simon Pilgrim f5c23ad3d7 [X86][AVX512] Add support for AVX512 VMOVSD/VMOVSS shuffle decoding
llvm-svn: 259427
2016-02-01 22:26:28 +00:00
Simon Pilgrim 025a3d857a [X86][AVX512] Add support for AVX512 VINSERTPS shuffle decoding
llvm-svn: 259420
2016-02-01 22:05:50 +00:00
Matthias Braun 3f88eabe93 SmallSet/SmallPtrSet: Refuse huge Small numbers
These sets do linear searching in small mode; It is not a good idea to
use huge numbers as the small value here, save people from themselves by
adding a static_assert.

Differential Revision: http://reviews.llvm.org/D16706

llvm-svn: 259419
2016-02-01 22:05:16 +00:00
Chad Rosier dbdb1d6eaf Move comments a bit closer to associated code. NFC.
llvm-svn: 259411
2016-02-01 21:38:31 +00:00
Chad Rosier 064261da16 Remove extra semicolon. NFC.
llvm-svn: 259402
2016-02-01 20:54:36 +00:00
Balaram Makam 92431703d7 AArch64: Implement missed conditional compare sequences.
Summary:
This is an extension to the existing implementation of r242436 which
restricts to only select inputs. This version fixes missed opportunities
in pr26084 by attempting to lower conditional compare sequences of
and/or trees with setcc leafs. This will additionaly handle the case
when a tree with select input is not a conjunction-disjunction tree
but some of the sub trees are conjunction-disjunction trees.

Reviewers: jmolloy, t.p.northover, mcrosier, MatzeB

Subscribers: mcrosier, llvm-commits, junbuml, haicheng, mssimpso, gberry

Differential Revision: http://reviews.llvm.org/D16291

llvm-svn: 259387
2016-02-01 19:13:07 +00:00
Geoff Berry 29d4a695f4 [AArch64] Simplify prolog/epilog callee save/restore. NFC.
Summary:
Factor out common code for callee-save register pair calculation.  This
is intended to simplify follow-on changes that reduce the number of
registers saved/restored.

Depends on D16732

Reviewers: mcrosier, jmolloy, t.p.northover

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D16734

llvm-svn: 259384
2016-02-01 19:07:06 +00:00
Ulrich Weigand 4a4d4ab7a4 [SystemZ] Fix wrong-code generation for certain always-false conditions
We've found another bug in the code generation logic conditions for a
certain class of always-false conditions, those of the form
   if ((a & 1) < 0)

These only reach the back end when compiling without optimization.

The bug was introduced by the choice of using TEST UNDER MASK
to implement a check for
   if ((a & MASK) < VAL)
as
   if ((a & MASK) == 0)

where VAL is less than the the lowest bit of MASK.  This is correct
in all cases except for VAL == 0, in which case the original
condition is always false, but the replacement isn't.

Fixed by excluding that particular case.

llvm-svn: 259381
2016-02-01 18:31:19 +00:00
Colin LeMahieu 6fdfa3dc32 [NFC] Referencing manual for reason why subregbit is checked
llvm-svn: 259380
2016-02-01 18:15:39 +00:00
Geoff Berry 04bf91a8c1 [AArch64] Simplify callee-save register save/restore. NFC.
Summary:
Simplify callee-save register save/restore code generation by
remembering the size of the callee-save area when it is computed so we
don't have to scan the prologue/epilogue instructions again later to
reconstruct it.

This is intended to simplify follow-on changes that reduce the number of
registers saved/restored.

Reviewers: mcrosier, jmolloy, t.p.northover

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D16732

llvm-svn: 259365
2016-02-01 16:29:19 +00:00
Asaf Badouh 5a3a0231f4 [X86][AVX512VBMI] add encoding and intrinsics for Multishift
Differential Revision: http://reviews.llvm.org/D16399

llvm-svn: 259363
2016-02-01 15:48:21 +00:00
Daniel Sanders f8bb23e509 [mips] Range check uimm16 and fix several bugs this revealed.
Summary:
The bugs were:
* teq and similar take 4-bit unsigned immediates on microMIPS.
* teqi and similar have side-effects like teq do.
* shll_s.w and shra_r.w take 5-bit unsigned immediates.
* The various DSP ext* instructions take a 5-bit immediate.
* repl.qh takes an 8-bit unsigned immediate.
* repl.ph takes a 10-bit unsigned immediate.
* rddsp/wrdsp take a 10-bit unsigned immediate.
* teqi and similar take signed 16-bit immediates (10-bit for microMIPS).
* Out-of-range immediate macros for or/xor take a simm32/simm64 depending
  on architecture. I'll fix the simm64 case properly when I reach simm32.

lui is a bit more lenient than GAS and accepts signed immediates in addition
to unsigned. This is because MipsMCExpr can produce signed values when
constant folding and it currently lacks a way of knowing it should fold to
an unsigned value.

Reviewers: vkalintiris

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D15446

llvm-svn: 259360
2016-02-01 15:13:31 +00:00
JF Bastien a5b8ea0d66 WebAssembly NFC: simplify control flow
This should now be easier to read.

llvm-svn: 259349
2016-02-01 10:46:16 +00:00
Igor Breger 56b039ea17 AVX512: fix mask handling for gather/scatter/prefetch intrinsics.
Differential Revision: http://reviews.llvm.org/D16755

llvm-svn: 259346
2016-02-01 09:57:15 +00:00
Simon Pilgrim 1358d86659 [X86][SSE] Find source of the inserted element of INSERTPS
Minor patch to trace back through target shuffles to the source of the inserted element in a (V)INSERTPS shuffle.

Differential Revision: http://reviews.llvm.org/D16652

llvm-svn: 259343
2016-02-01 08:59:30 +00:00
Igor Breger 6cc9115cec AVX512 : Fix SETCCE lowering for KNL 32 bit.
Differential Revision: http://reviews.llvm.org/D16752

llvm-svn: 259342
2016-02-01 07:56:09 +00:00
David Majnemer efb41741f2 [X86] Cleanup the WinEHState pass
Remove unnecessary includes and class state.

No functional change intended.

llvm-svn: 259340
2016-02-01 04:28:59 +00:00
Craig Topper 3ef74f5956 Replace usages of llvm::utostr_32 with just llvm::utostr. While this is less efficient, its unclear the few places that were using the _32 version were doing so for efficiency.
llvm-svn: 259330
2016-01-31 20:00:24 +00:00
JF Bastien 578c8cde53 WebAssembly: more failures are gone
llvm-svn: 259321
2016-01-31 08:19:40 +00:00
JF Bastien ac9e8664a4 WebAssembly: update expected failures
r259305 fixed a few assertions around FrameIndex, and I forgot to update these failures despite having run the torture tests.

llvm-svn: 259320
2016-01-31 08:05:05 +00:00
Derek Schuff c97ba939d1 [WebAssembly] Fix uses of FrameIndex as store values
Previously the code assumed all uses of FI on loads and stores were as
addresses. This checks whether the use is the address or a value and
handles the latter case as it does for non-memory instructions.

llvm-svn: 259306
2016-01-30 21:43:08 +00:00
JF Bastien fbc89d21dd WebAssembly: don't optimize frameindex store
The previous code was incorrect (can't getReg a frameindex). We could instead optimize it to reduce tree height, but I'm not sure that's worthwhile yet because we then try to eliminate the frameindex.

This patch also fixes frame index elimination for operations which may load or store: it used to assume the base was operand 2 and immediate offset operand 1. That's not true for stores, where they're 4 and 3.

llvm-svn: 259305
2016-01-30 14:11:26 +00:00
JF Bastien 3ca3ea690f WebAssembly NFC: fix build warning
WebAssemblyFrameLowering.cpp:158:44: warning: enumeral and non-enumeral type in conditional expression [enabled by default]

llvm-svn: 259303
2016-01-30 11:19:26 +00:00
Matt Arsenault e013246462 AMDGPU: Fix emitting invalid workitem intrinsics for HSA
The AMDGPUPromoteAlloca pass was emitting the read.local.size
calls, which with HSA was incorrectly selected to reading from
the offset mesa uses off of the kernarg pointer.

Error on intrinsics which aren't supported by HSA, and start
emitting the correct IR to read the workgroup size
out of the dispatch pointer.

Also initialize the pass so it can be tested with opt, and
start moving towards not depending on the subtarget as an
argument.

Start emitting errors for the intrinsics not handled with HSA.

llvm-svn: 259297
2016-01-30 05:19:45 +00:00
Matt Arsenault d0799df707 AMDGPU: Stop checking intrinsics not used by HSA for dispatch-ptr
Only the dispatch.ptr intrinsic is supposed to be used now to get
the workgroup size, and the read.local.size intrinsics do not
work correctly.

llvm-svn: 259296
2016-01-30 05:10:59 +00:00