Commit Graph

36959 Commits

Author SHA1 Message Date
Nicolai Haehnle 19f0f5177d AMDGPU: change a redundant if () to an assert(). NFC
Summary:
I've been carrying this change around with me for a while, because the if ()
managed to confuse me while following the code. All callers ensure that the
assertion holds.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19042

llvm-svn: 266344
2016-04-14 17:42:18 +00:00
Tom Stellard b72a65ff53 [GlobalISel] Coding style and whitespace fixes
Reviewers: qcolombet

Subscribers: joker.eph, llvm-commits, vkalintiris

Differential Revision: http://reviews.llvm.org/D19119

llvm-svn: 266342
2016-04-14 17:23:33 +00:00
Tim Northover cdf1529c01 AArch64: expand cmpxchg after regalloc at -O0.
FastRegAlloc works only at the basic-block level and spills all live-out
registers. Unfortunately for a stack-based cmpxchg near the spill slots, this
can perpetually clear the exclusive monitor, which means the cmpxchg will never
succeed.

I believe the only way to handle this within LLVM is by expanding the loop
post-regalloc. We don't want this in general because it severely limits the
optimisations that can be done, so we limit this to -O0 compilations.

It's an ugly hack, and about the one good point in the whole mess is that we
can treat all cmpxchg operations in the most naive way possible (seq_cst, no
clrex faff) without affecting correctness.

Should fix PR25526.

llvm-svn: 266339
2016-04-14 17:03:29 +00:00
Jacques Pienaar add4a274ba [lanai] Add areMemAccessesTriviallyDisjoint, getMemOpBaseRegImmOfs and getMemOpBaseRegImmOfsWidth.
Summary: Add getMemOpBaseRegImmOfsWidth to enable determining independence during MiSched.

Reviewers: eliben, majnemer

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D18903

llvm-svn: 266338
2016-04-14 16:47:42 +00:00
Tom Stellard 79a1fd718c AMDGPU: allow specifying a workgroup size that needs to fit in a compute unit
Summary:
For GL_ARB_compute_shader we need to support workgroup sizes of at least 1024. However, if we want to allow large workgroup sizes, we may need to use less registers, as we have to run more waves per SIMD.

This patch adds an attribute to specify the maximum work group size the compiled program needs to support. It defaults, to 256, as that has no wave restrictions.

Reducing the number of registers available is done similarly to how the registers were reserved for chips with the sgpr init bug.

Reviewers: mareko, arsenm, tstellarAMD, nhaehnle

Subscribers: FireBurn, kerberizer, llvm-commits, arsenm

Differential Revision: http://reviews.llvm.org/D18340

Patch By: Bas Nieuwenhuizen

llvm-svn: 266337
2016-04-14 16:27:07 +00:00
Tom Stellard f110f8f9f7 AMDGPU/SI: Use the correct scratch wave offset register for shaders.
Summary:
The code previously always used s1 as it was using the user + system SGPR
information for compute kernels. This is incorrect for Mesa shaders though,

The register should be the next SGPR after all user and system SGPR's.
We use that Mesa adds arguments for all input and system SGPR's and
take the next available SGPR for the scratch wave offset register.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

Reviewers: mareko, arsenm, nhaehnle, tstellarAMD

Subscribers: qcolombet, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18941

Patch By: Bas Nieuwenhuizen

llvm-svn: 266336
2016-04-14 16:27:03 +00:00
Simon Dardis 53a3492b71 Summary:
Alias 'jic $reg, 0' to 'jrc $reg' and 'jialc $reg, 0' to 'jalrc $reg' like
binutils.

This patch was previous committed as r266055 as seemed to have caused some spurious
test failures. They did not reappear after further local testing.

llvm-svn: 266301
2016-04-14 13:43:17 +00:00
Mehdi Amini 867e91468b Do not use getGlobalContext()... ever.
This code was creating a new type in the global context, regardless
of which context the user is sitting in, what can possibly go wrong?

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266275
2016-04-14 04:36:40 +00:00
Matt Arsenault 9cd90712f0 AMDGPU: Implement canonicalize
Also add generic DAG node for it.

llvm-svn: 266272
2016-04-14 01:42:16 +00:00
Matthias Braun 46b0f03e12 TargetLowering: Factor out common code for tail call eligibility checking; NFC
llvm-svn: 266270
2016-04-14 01:10:42 +00:00
Tim Northover 5c02f9ad28 ARM: override cost function to re-enable ConstantHoisting (& fix it).
At some point, ARM stopped getting any benefit from ConstantHoisting because
the pass called a different variant of getIntImmCost. Reimplementing the
correct variant revealed some problems, however:

  + ConstantHoisting was modifying switch statements. This is simply invalid,
    the cases must remain integer constants no matter the notional cost.
  + ConstantHoisting was mangling alloca instructions in the entry block. These
    should be handled by FrameLowering, so constants actually have a cost of 0.
    Worse, the resulting bitcasts meant they became dynamic allocas.

rdar://25707382

llvm-svn: 266260
2016-04-13 23:08:27 +00:00
Matthias Braun 707e02c273 ARM: Use a callee save register for the swiftself parameter.
It is very likely that the swiftself parameter is alive throughout most
functions function so putting it into a callee save register should
avoid spills for the callers with only a minimum amount of extra spills
in the callees.

Currently the generated code is correct but unnecessarily spills and
reloads arguments passed in callee save registers, I will address this
in upcoming patches.

This also adds a missing check that for tail calls the preserved value
of the caller must be the same as the callees parameter.

Differential Revision: http://reviews.llvm.org/D18901

llvm-svn: 266253
2016-04-13 21:43:25 +00:00
Matthias Braun 588d1cdad4 X86: Use a callee save register for the swiftself parameter.
It is very likely that the swiftself parameter is alive throughout most
functions function so putting it into a callee save register should
avoid spills for the callers with only a minimum amount of extra spills
in the callees.

Currently the generated code is correct but unnecessarily spills and
reloads arguments passed in callee save registers, I will address this
in upcoming patches.

This also adds a missing check that for tail calls the preserved value
of the caller must be the same as the callees parameter.

Differential Revision: http://reviews.llvm.org/D18902

llvm-svn: 266252
2016-04-13 21:43:21 +00:00
Matthias Braun 74a0bd319a AArch64: Use a callee save registers for swiftself parameters
It is very likely that the swiftself parameter is alive throughout most
functions function so putting it into a callee save register should
avoid spills for the callers with only a minimum amount of extra spills
in the callees.

Currently the generated code is correct but unnecessarily spills and
reloads arguments passed in callee save registers, I will address this
in upcoming patches.

This also adds a missing check that for tail calls the preserved value
of the caller must be the same as the callees parameter.

Differential Revision: http://reviews.llvm.org/D19007

llvm-svn: 266251
2016-04-13 21:43:16 +00:00
Tom Stellard b951f10701 AMDGPU/SI: Add support for spilling VGPRs without having to scavenge registers
Summary:
When we are spilling SGPRs to scratch memory, we usually don't have
free SGPRs to do the address calculation, so we need to re-use the
ScratchOffset register for the calculation.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18917

llvm-svn: 266244
2016-04-13 20:44:16 +00:00
Nemanja Ivanovic 87bcae366d [PowerPC] Basic support for P9 byte comparison and count trailing zero insns
This patch corresponds to review:
http://reviews.llvm.org/D17850

This patch implements the following instructions:
cmprb, cmpeqb, cnttzw, cnttzw., cnttzd, cnttzd.

llvm-svn: 266228
2016-04-13 18:51:18 +00:00
Evandro Menezes 8d53f88162 [AArch64] Disable LDP/STP for quads
Disable LDP/STP for quads on Exynos M1 as they are not as efficient as pairs
of regular LDR/STR.

Patch by Abderrazek Zaafrani <a.zaafrani@samsung.com>.

llvm-svn: 266223
2016-04-13 18:31:45 +00:00
Tim Northover b8a1ecfc62 AArch64: don't create instructions that write to xzr/wzr twice.
These are unpredictable even on AArch64.

Patch by Yichao Yu.

llvm-svn: 266206
2016-04-13 16:25:39 +00:00
Artem Tamazov eb4d5a9b0b [AMDGPU][llvm-mc] Support of Trap Handler registers (TTMP0..11 and TBA/TMA)git status
Tests added along with implemented feature.
Note that there is a small leftover of unecessary MI sheduling issue
(more info in the review). CodeGen/AMDGPU/salu-to-valu.ll updated to fix
the false regression.

TODO: Support for TTMP quads, comma-separated syntax in "[]" and more.

Differential Revision: http://reviews.llvm.org/D17825

llvm-svn: 266205
2016-04-13 16:18:41 +00:00
Zoran Jovanovic 2f6845ba39 [mips] Fix emitAtomicCmpSwapPartword to handle 64 bit pointers correctly
Differential Revision: http://reviews.llvm.org/D18995

llvm-svn: 266204
2016-04-13 16:02:25 +00:00
Vasileios Kalintiris 3751d4114c [mips] Sign-extend i32 values truncated from previously zero-extended i32 values.
Summary:
This is a special case for MIPS64 because the architecture requires
properly 32-bit sign-extended values in the register containers.

Additionaly, we merge consecutive trunc + AssertZExt nodes in order
to avoid unnecessary sign-extensions when the extension comes from a
type smaller than i32.

Reviewers: dsanders

Subscribers: dsanders, sdardis, llvm-commits

Differential Revision: http://reviews.llvm.org/D18893

llvm-svn: 266203
2016-04-13 15:07:45 +00:00
Zlatko Buljan 58d6a959be [mips][microMIPS] Add CodeGen support for DIV, MOD, DIVU, MODU, DDIV, DMOD, DDIVU and DMODU instructions
Differential Revision: http://reviews.llvm.org/D17137

This patch was reverted after the revertion of dependant patch http://reviews.llvm.org/D17068.
There was the problem with test-suite failure.
The problem is hopefully solved with dependant patch so this patch is commited again.

llvm-svn: 266179
2016-04-13 08:02:26 +00:00
Hrvoje Varga 11dd31df9a [mips][microMIPS] Fix for "Cannot copy registers" assertion
Differential Revision: http://reviews.llvm.org/D17068

This changes contains fix for failing test-suite. So, this patch should hopefully work now.

llvm-svn: 266171
2016-04-13 06:17:21 +00:00
Tom Stellard 703b2ec43f AMDGPU/SI: Fix spilling of 96-bit registers
Summary:
It seems like this was broken in r252327.  I thought we had test cases
for this, but it's really hard to tirgger spills of this exact register
size since they aren't used very much.

Reviewers: arsenm, nhaehnle

Subscribers: nhaehnle, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19021

llvm-svn: 266152
2016-04-12 23:57:30 +00:00
Evandro Menezes 551af44e31 [AArch64] Fuse AES{D,E}/AESMC for Exynos M1. (NFC)
llvm-svn: 266144
2016-04-12 22:42:36 +00:00
David L Kreitzer 99775c1b6e Fixed a few typos and formatting problems. NFCI.
llvm-svn: 266135
2016-04-12 21:45:09 +00:00
Justin Bogner 32ad24d4ef X86: Avoid accessing SDValues after they've been RAUW'd
This fixes two use-after-frees in selectLEA64_32Addr. If matchAddress
matches an ADD with an AND as an operand, and that AND hits one of the
"heroic transforms" that folds masks and shifts, we end up with N
pointing to an SDNode that was deleted. Make sure we're done accessing
it before that.

Found by ASan with the recycling allocator changes in llvm.org/PR26808.

llvm-svn: 266130
2016-04-12 21:34:24 +00:00
Nicolai Haehnle df77c9ada4 AMDGPU: add llvm.amdgcn.buffer.load/store intrinsics
Summary:
They correspond to BUFFER_LOAD/STORE_DWORD[_X2,X3,X4] and mostly behave like
llvm.amdgcn.buffer.load/store.format. They will be used by Mesa for SSBO and
atomic counters at least when robust buffer access behavior is desired.
(These instructions perform no format conversion and do buffer range checking
per component.)

As a side effect of sharing patterns with llvm.amdgcn.buffer.store.format,
it has become trivial to add support for the f32 and v2f32 variants of that
intrinsic, so the patch does so.

Also DAG-ify (and fix) some tests that I noticed intermittent failures in
while developing this patch.

Some tests were (temporarily) adjusted for the required mayLoad/hasSideEffects
changes to the BUFFER_STORE_DWORD* instructions. See also
http://reviews.llvm.org/D18291.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18292

llvm-svn: 266126
2016-04-12 21:18:10 +00:00
James Y Knight 19f6cce4e3 Add __atomic_* lowering to AtomicExpandPass.
(Recommit of r266002, with r266011, r266016, and not accidentally
including an extra unused/uninitialized element in LibcallRoutineNames)

AtomicExpandPass can now lower atomic load, atomic store, atomicrmw, and
cmpxchg instructions to __atomic_* library calls, when the target
doesn't support atomics of a given size.

This is the first step towards moving all atomic lowering from clang
into llvm. When all is done, the behavior of __sync_* builtins,
__atomic_* builtins, and C11 atomics will be unified.

Previously LLVM would pass everything through to the ISelLowering
code. There, unsupported atomic instructions would turn into __sync_*
library calls. Because of that behavior, Clang currently avoids emitting
llvm IR atomic instructions when this would happen, and emits __atomic_*
library functions itself, in the frontend.

This change makes LLVM able to emit __atomic_* libcalls, and thus will
eventually allow clang to depend on LLVM to do the right thing.

It is advantageous to do the new lowering to atomic libcalls in
AtomicExpandPass, before ISel time, because it's important that all
atomic operations for a given size either lower to __atomic_*
libcalls (which may use locks), or native instructions which won't. No
mixing and matching.

At the moment, this code is enabled only for SPARC, as a
demonstration. The next commit will expand support to all of the other
targets.

Differential Revision: http://reviews.llvm.org/D18200

llvm-svn: 266115
2016-04-12 20:18:48 +00:00
Tom Stellard ab1d3a9d50 AMDGPU/SI: Insert wait states required after v_readfirstlane on SI
Summary:
We will be able to handle this case much better once the hazard recognizer
is finished, but this conservative implementation  fixes a hang with the piglit
test:

spec/arb_arrays_of_arrays/execution/sampler/fs-nested-struct-arrays-nonconst-nested-arra

Reviewers: arsenm, nhaehnle

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18988

llvm-svn: 266105
2016-04-12 18:40:43 +00:00
Matt Arsenault 3b08238f78 AMDGPU: Eliminate half of i64 or if one operand is zero_extend from i32
This helps clean up some of the mess when expanding unaligned 64-bit
loads when changed to be promote to v2i32, and fixes situations
where or x, 0 was emitted after splitting 64-bit ors during moveToVALU.

I think this could be a generic combine but I'm not sure.

llvm-svn: 266104
2016-04-12 18:24:38 +00:00
Sanjay Patel e6a0a23e08 fix indentation; NFC
llvm-svn: 266097
2016-04-12 18:01:48 +00:00
Nicolai Haehnle 279970c0dc AMDGPU/SI: Fix a mis-compilation of multi-level breaks
Summary:
Under certain circumstances, multi-level breaks (or what is understood by
the control flow passes as such) could be miscompiled in a way that causes
infinite loops, by emitting incorrect control flow intrinsics.

This fixes a hang in
dEQP-GLES3.functional.shaders.loops.while_dynamic_iterations.conditional_continue_vertex

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18967

llvm-svn: 266088
2016-04-12 16:10:38 +00:00
Petar Jovanovic 48e4db1ca2 [mips] add assembler support for .set arch=octeon
This patch enables assembler support for .set arch=octeon.
It will fix issues with inline assembler when this directive is used.

Patch by Strahinja Petrovic.

Differential Revision: http://reviews.llvm.org/D18548

llvm-svn: 266081
2016-04-12 15:28:16 +00:00
Matt Arsenault 64fa2f4513 AMDGPU: Implement i64 global atomics
llvm-svn: 266075
2016-04-12 14:05:11 +00:00
Matt Arsenault a9dbdcae04 AMDGPU: Add atomic_inc + atomic_dec intrinsics
These are different than atomicrmw add 1 because they have
an additional input value to clamp the result.

llvm-svn: 266074
2016-04-12 14:05:04 +00:00
Matt Arsenault 21ecfe43ba AMDGPU: Remove trailing whitespace
llvm-svn: 266073
2016-04-12 14:04:54 +00:00
Rafael Espindola d41b54be11 This reverts commit r266002, r266011 and r266016.
They broke the msan bot.

Original message:

Add __atomic_* lowering to AtomicExpandPass.

AtomicExpandPass can now lower atomic load, atomic store, atomicrmw,and
cmpxchg instructions to __atomic_* library calls, when the target
doesn't support atomics of a given size.

This is the first step towards moving all atomic lowering from clang
into llvm. When all is done, the behavior of __sync_* builtins,
__atomic_* builtins, and C11 atomics will be unified.

Previously LLVM would pass everything through to the ISelLowering
code. There, unsupported atomic instructions would turn into __sync_*
library calls. Because of that behavior, Clang currently avoids emitting
llvm IR atomic instructions when this would happen, and emits __atomic_*
library functions itself, in the frontend.

This change makes LLVM able to emit __atomic_* libcalls, and thus will
eventually allow clang to depend on LLVM to do the right thing.

It is advantageous to do the new lowering to atomic libcalls in
AtomicExpandPass, before ISel time, because it's important that all
atomic operations for a given size either lower to __atomic_*
libcalls (which may use locks), or native instructions which won't. No
mixing and matching.

At the moment, this code is enabled only for SPARC, as a
demonstration. The next commit will expand support to all of the other
targets.

Differential Revision: http://reviews.llvm.org/D18200

llvm-svn: 266062
2016-04-12 12:30:25 +00:00
Simon Dardis ee1590f5f0 Revert "[mips] MIPSR6 Compact branch aliases"
This reverts commit r266055.

ps4-buildslave2 is highlighting a failure.

llvm-svn: 266061
2016-04-12 12:22:45 +00:00
Jonas Paulsson f0fc50905f [SystemZ] Use LDE32 instead of LE, when Offset is small.
On z13, if eliminateFrameIndex() chooses LE (and not LEY), immediately
transform that LE to LDE32 to avoid partial register dependencies.

LEY should be generally preferred for big offsets over an expansion
into LAY + LDE32.

Reviewed by Ulrich Weigand.

llvm-svn: 266060
2016-04-12 12:07:23 +00:00
Simon Dardis 703c864fe3 [mips] MIPSR6 Compact branch aliases
Summary:
Alias 'jic $reg, 0' to 'jrc $reg' and 'jialc $reg, 0' to 'jalrc $reg' like
binutils.

Reviewers: dsanders

Differential Revision: http://reviews.llvm.org/D18856

llvm-svn: 266055
2016-04-12 10:41:53 +00:00
Chuang-Yu Cheng 94f58e79ae [PPC64] Mark CR0 Live if PPCInstrInfo::optimizeCompareInstr Creates a Use of CR0
Resolve Bug 27046 (https://llvm.org/bugs/show_bug.cgi?id=27046).
The PPCInstrInfo::optimizeCompareInstr function could create a new use of
CR0, even if CR0 were previously dead. This patch marks CR0 live if a use of
CR0 is created.

Author: Tom Jablin (tjablin)
Reviewers: hfinkel kbarton cycheng

http://reviews.llvm.org/D18884

llvm-svn: 266040
2016-04-12 03:10:52 +00:00
Chuang-Yu Cheng 6efde2fb45 [PPC64] Use mfocrf in prologue when we only need to save 1 nonvolatile CR field
In the ELFv2 ABI, we are not required to save all CR fields. If only one
nonvolatile CR field is clobbered, use mfocrf instead of mfcr to
selectively save the field, because mfocrf has short latency compares to
mfcr.

Thanks Nemanja's invaluable hint!
Reviewers: nemanjai tjablin hfinkel kbarton

http://reviews.llvm.org/D17749

llvm-svn: 266038
2016-04-12 03:04:44 +00:00
Matthias Braun dff243e597 AArch64: Drive-by cleanup
llvm-svn: 266035
2016-04-12 02:16:13 +00:00
Tim Northover a6dea06fe3 ARM: use r7 as the frame-pointer on all MachO targets.
This is better for a few reasons:
  + It matches the other tooling for iOS.
  + It matches EABI in more cases (i.e. Thumb-mode, and in practice we don't
    use ARM mode).
  + It leads to infinitesimally smaller code (0.2%, yay!).

rdar://25369506

llvm-svn: 266003
2016-04-11 22:27:40 +00:00
James Y Knight b91d38c5fe Add __atomic_* lowering to AtomicExpandPass.
AtomicExpandPass can now lower atomic load, atomic store, atomicrmw, and
cmpxchg instructions to __atomic_* library calls, when the target
doesn't support atomics of a given size.

This is the first step towards moving all atomic lowering from clang
into llvm. When all is done, the behavior of __sync_* builtins,
__atomic_* builtins, and C11 atomics will be unified.

Previously LLVM would pass everything through to the ISelLowering
code. There, unsupported atomic instructions would turn into __sync_*
library calls. Because of that behavior, Clang currently avoids emitting
llvm IR atomic instructions when this would happen, and emits __atomic_*
library functions itself, in the frontend.

This change makes LLVM able to emit __atomic_* libcalls, and thus will
eventually allow clang to depend on LLVM to do the right thing.

It is advantageous to do the new lowering to atomic libcalls in
AtomicExpandPass, before ISel time, because it's important that all
atomic operations for a given size either lower to __atomic_*
libcalls (which may use locks), or native instructions which won't. No
mixing and matching.

At the moment, this code is enabled only for SPARC, as a
demonstration. The next commit will expand support to all of the other
targets.

Differential Revision: http://reviews.llvm.org/D18200

llvm-svn: 266002
2016-04-11 22:22:33 +00:00
Manman Ren 5751814eda Swift Calling Convention: swifterror target support.
Differential Revision: http://reviews.llvm.org/D18716

llvm-svn: 265997
2016-04-11 21:08:06 +00:00
Tom Stellard 0ffdf65eaa Revert "AMDGPU/SI: Do not generate s_waitcnt after ds_permute/ds_bpermute"
This reverts commit r263720.

Just confirmed that s_waitcnt is required after ds_permute/ds_bpermute.

llvm-svn: 265992
2016-04-11 20:38:40 +00:00
Hans Wennborg 1f09485c40 Fix broken assert, PR24624
llvm-svn: 265989
2016-04-11 20:35:41 +00:00
Sriraman Tallam f39e190ad8 Test commit.
llvm-svn: 265976
2016-04-11 18:40:50 +00:00
Petar Jovanovic e578e970cb [mips] Make Static a default relocation model for MIPS codegen
This change follows up defaults for GCC and Clang, so LLVM does not differ
from them. While number of the test files are touched with this change, they
all keep the old (expected) behaviour with the explicit option:
"-relocation-model=pic"
The tests that have not been touched are insensitive to relocation model.

Differential Revision: http://reviews.llvm.org/D17995

llvm-svn: 265949
2016-04-11 15:24:23 +00:00
Daniel Sanders a45d3e439f [mips] Trivial corrections to range checked immediates.
Summary:
SYNC has a 5-bit unsigned immediate.
Move MIPS16-specific pcrel16 operand to Mips16 files.

Reviewers: vkalintiris

Subscribers: dsanders, sdardis, llvm-commits

Differential Revision: http://reviews.llvm.org/D18755

llvm-svn: 265947
2016-04-11 15:20:40 +00:00
Ulrich Weigand aa04768600 [SystemZ] README: remove an implemented idea, add some new ones
The note about conditional returns can now be removed, as they are
implemented. Let's also add 2 new ones in exchange.

Author: koriakin
Differential Revision: http://reviews.llvm.org/D18962

llvm-svn: 265944
2016-04-11 14:38:47 +00:00
Ulrich Weigand 1bac911c58 [SystemZ] Add SVC instruction
This is going to be useful for inline assembly only.

Author: koriakin
Differential Revision: http://reviews.llvm.org/D18952

llvm-svn: 265943
2016-04-11 14:35:39 +00:00
Oliver Stannard c869e9158d [ARM] Avoid switching ARM/Thumb mode on .arch/.cpu directive
When we see a .arch or .cpu directive, we should try to avoid switching
ARM/Thumb mode if possible.

If we do have to switch modes, we also need to emit the correct mapping
symbol for the new ISA. We did not do this previously, so could emit
ARM code with Thumb mapping symbols (or vice-versa).

The GAS behaviour is to always stay in the same mode, and to emit an
error on any instructions seen when the current mode is not available on
the current target. We can't represent that situation easily (we assume
that Thumb mode is available if ModeThumb is set), so we differ from the
GAS behaviour when switching to a target that can't support the old
mode. I've added a warning for when this implicit mode-switch occurs.

Differential Revision: http://reviews.llvm.org/D18955

llvm-svn: 265936
2016-04-11 13:06:28 +00:00
Ulrich Weigand 848a513d0a [SystemZ] Support conditional indirect sibling calls via BCR
This adds a conditional variant of CallBR instruction, CallBCR. Also,
it can be fused with integer comparisons, resulting in one of the new
C*BCall instructions.

In addition to CallBRCL limitations, this has another one: it won't
trigger if the function to call isn't already in %r1 - see f22 in the
test for an example (it's also why the loads in tests are volatile).

Author: koriakin
Differential Revision: http://reviews.llvm.org/D18928

llvm-svn: 265933
2016-04-11 12:12:32 +00:00
Ulrich Weigand fb97c51f6f [SystemZ] Remove incorrect CC use for C*BReturn instructions
These are fused compare-and-branches, so they obviously don't use CC.

Author: koriakin
Differential Revision: http://reviews.llvm.org/D18927

llvm-svn: 265932
2016-04-11 12:03:30 +00:00
Andrey Turetskiy 9df334c28e [X86] Restrict max long nop length for Lakemont.
Restrict the max length of long nops for Lakemont to 7. Experiments on MCU
benchmarks (Dhrystone, Coremark) show that this is the most optimal length.

Differential Revision: http://reviews.llvm.org/D18897

llvm-svn: 265924
2016-04-11 10:07:36 +00:00
Simon Pilgrim d263fdc512 [X86][AVX512BW] Add support for v64i8 multiplies
Extend the existing lowering of vXi8 multiplies to support v64i8 on avx512bw targets.

I added the Lower512IntArith helper function to help with this - not sure how often this could be used in the future, but it seemed better than putting all that logic inside LowerMUL.

Differential Revision: http://reviews.llvm.org/D18937

llvm-svn: 265902
2016-04-10 17:02:48 +00:00
Craig Topper 35db8ecb50 [X86] Use for loops over types to reduce code for setting up operation actions.
llvm-svn: 265893
2016-04-10 05:39:32 +00:00
Craig Topper dcc8f49bf0 [X86] Remove unnecessary setOperationAction for SRA v2i64/v4i64 when VLX is suppored. This is already done for SSE2/AVX2 which VLX implies. NFC
llvm-svn: 265892
2016-04-10 05:39:28 +00:00
Davide Italiano 7aa47094b2 [MC] support TLSDESC and TLSCALL / GNU2 tls dialect
Differential Revision:  http://reviews.llvm.org/D18885

llvm-svn: 265881
2016-04-09 20:32:33 +00:00
Sanjay Patel 4abae4e0fa [x86] use BMI 'andn' for logic + compare ops
With BMI, we can use 'andn' to save an instruction when the result is only used in a compare.
This is related to one of the potential sequences to check 'isfinite' in:
https://llvm.org/bugs/show_bug.cgi?id=27164

Differential Revision: http://reviews.llvm.org/D18910

llvm-svn: 265875
2016-04-09 16:02:52 +00:00
Simon Pilgrim 1cc5712763 [X86][XOP] Support for VPPERM 2-input shuffle mask decoding
This patch adds support for decoding XOP VPPERM instruction when it represents a basic shuffle.

The mask decoding required the existing MCInstrLowering code to be updated to support binary shuffles - the implementation now matches what is done in X86InstrComments.cpp.

Differential Revision: http://reviews.llvm.org/D18441

llvm-svn: 265874
2016-04-09 14:51:26 +00:00
Craig Topper f027107094 [X86] Use for loops over types to reduce code for setting up operation actions. NFC
llvm-svn: 265871
2016-04-09 06:31:02 +00:00
Craig Topper e801ed9e15 [X86] Remove calls to setOperationAction that set CTLZ_ZERO_UNDEF for some vector types to Expand. Expand is already set for all operations for all vector types earlier so this is redundant. NFC
llvm-svn: 265870
2016-04-09 05:53:48 +00:00
Tim Shen 0012756489 [SSP] Remove llvm.stackprotectorcheck.
This is a cleanup patch for SSP support in LLVM. There is no functional change.
llvm.stackprotectorcheck is not needed, because SelectionDAG isn't
actually lowering it in SelectBasicBlock; rather, it adds check code in
FinishBasicBlock, ignoring the position where the intrinsic is inserted
(See FindSplitPointForStackProtector()).

llvm-svn: 265851
2016-04-08 21:26:31 +00:00
Hans Wennborg e25b65bdb7 Rangeify a loop. NFC.
llvm-svn: 265846
2016-04-08 20:46:09 +00:00
Hans Wennborg 74ff770670 Remove some redundant variables from X86TargetLowering::LowerDYNAMIC_STACKALLOC
These are already defined, with the same values, a few lines up. NFC.

llvm-svn: 265845
2016-04-08 20:46:00 +00:00
Kevin B. Smith e0a6fc3bcc [X86] Fix PR23155 by turning on X86FixupBWInsts by default.
Differential Revision: http://reviews.llvm.org/D18866

llvm-svn: 265830
2016-04-08 18:58:29 +00:00
Ulrich Weigand fa2dffbc1a [SystemZ] Support conditional sibling calls via BRCL
This adds a conditional variant of CallJG instruction, CallBRCL.
It can be used for conditional sibling calls. Unfortunately, due
to IfCvt limitations, it only really works well for functions without
arguments.

Author: koriakin
Differential Revision: http://reviews.llvm.org/D18864

llvm-svn: 265814
2016-04-08 17:22:19 +00:00
Sam Parker 2d5126cdf5 [ARM] Enable SMLAW[B|T] and SMLUW[B|T] instruction selection
Added ISelDAGToDAG functions to enable selection of the smlawb, smlawt,
smulwb and smulwt instructions for the ARM backend. Also updated the smul
CodeGen test and removed the smulw one.

Differential Revision: http://reviews.llvm.org/D18892

llvm-svn: 265793
2016-04-08 16:02:53 +00:00
Simon Pilgrim 476170384f [X86] Tidied up shuffle decode function doxygen descriptions
As discussed on D18441 - auto brief is used so we don't need /brief, we don't need to include the function name and added some missing descriptions.

llvm-svn: 265785
2016-04-08 14:17:07 +00:00
Chuang-Yu Cheng 98c1894755 CXX_FAST_TLS calling convention: performance improvement for PPC64
This is the same change on PPC64 as r255821 on AArch64. I have even borrowed
his commit message.

The access function has a short entry and a short exit, the initialization
block is only run the first time. To improve the performance, we want to
have a short frame at the entry and exit.

We explicitly handle most of the CSRs via copies. Only the CSRs that are not
handled via copies will be in CSR_SaveList.

Frame lowering and prologue/epilogue insertion will generate a short frame
in the entry and exit according to CSR_SaveList. The majority of the CSRs will
be handled by register allcoator. Register allocator will try to spill and
reload them in the initialization block.

We add CSRsViaCopy, it will be explicitly handled during lowering.

1> we first set FunctionLoweringInfo->SplitCSR if conditions are met (the target
   supports it for the given machine function and the function has only return
   exits). We also call TLI->initializeSplitCSR to perform initialization.
2> we call TLI->insertCopiesSplitCSR to insert copies from CSRsViaCopy to
   virtual registers at beginning of the entry block and copies from virtual
   registers to CSRsViaCopy at beginning of the exit blocks.
3> we also need to make sure the explicit copies will not be eliminated.

Author: Tom Jablin (tjablin)
Reviewers: hfinkel kbarton cycheng

http://reviews.llvm.org/D17533

llvm-svn: 265781
2016-04-08 12:04:32 +00:00
Vasileios Kalintiris 957d849e03 [mips] Use range-based for loops. NFC.
llvm-svn: 265780
2016-04-08 10:33:00 +00:00
Zlatko Buljan 53a037f5cc [mips][microMIPS] Add CodeGen support for ADD, ADDIU*, ADDU* and DADD* instructions
Differential Revision: http://reviews.llvm.org/D16454

llvm-svn: 265772
2016-04-08 07:27:26 +00:00
Quentin Colombet 88f7f6bc4f [AArch64] Fix a typo in the register class to register bank mapping.
For GPR family we want the GPR register bank, not FPR!

llvm-svn: 265743
2016-04-07 23:10:14 +00:00
Quentin Colombet 846219ae10 [AArch64] Get rid of some GlobalISel ifdefs.
llvm-svn: 265725
2016-04-07 21:24:40 +00:00
Quentin Colombet 6cc73ce808 [AArch64] gcc does not like litteral without quotes even on preprocessor macros.
llvm-svn: 265720
2016-04-07 20:49:15 +00:00
Quentin Colombet 789ad56248 [AArch64][CallLowering] Do not build the API if GlobalISel is not built.
This gets rid of some ifdefs and dummy implementations that were here
just to fill the blanks.

llvm-svn: 265719
2016-04-07 20:47:51 +00:00
Quentin Colombet d4131814b3 [GlobalISel] Add RegBankSelect hooks into the pass pipeline.
Now, RegBankSelect will happen after the IRTranslation and the target
may optionally add additional passes in between.

llvm-svn: 265716
2016-04-07 20:27:33 +00:00
Jan Vesely 43b7b5b846 AMDGPU/SI: Implement atomic load/store for i32 and i64
Standard load/store instructions with GLC bit set.

Reviewers: tstellardAMD, arsenm

Differential Revision: http://reviews.llvm.org/D18760

llvm-svn: 265709
2016-04-07 19:23:11 +00:00
Tom Stellard 9112758077 AMDGPU/SI: Add latency for export instructions
Reviewers: arsenm, nhaehnle

Subscribers: nhaehnle, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18599

llvm-svn: 265708
2016-04-07 18:30:05 +00:00
Quentin Colombet d21115876c [RegisterBank] Rename RegisterBank::contains into RegisterBank::covers.
llvm-svn: 265695
2016-04-07 17:09:39 +00:00
Ulrich Weigand 79391ee0f2 [SystemZ] Fix build break from r265689
Fix build error seen on some build bots due to:
error: default label in switch which covers all enumeration values

llvm-svn: 265693
2016-04-07 16:33:25 +00:00
Kevin B. Smith 3802c4af59 [X86]: Fix for PR27251.
Differential Revision: http://reviews.llvm.org/D18850

llvm-svn: 265690
2016-04-07 16:15:34 +00:00
Ulrich Weigand 2eb027d21f [SystemZ] Implement conditional returns
Return is now considered a predicable instruction, and is converted
to a newly-added CondReturn (which maps to BCR to %r14) instruction by
the if conversion pass.

Also, fused compare-and-branch transform knows about conditional
returns, emitting the proper fused instructions for them.

This transform triggers on a *lot* of tests, hence the huge diffstat.
The changes are mostly jX to br %r14 -> bXr %r14.

Author: koriakin

Differential Revision: http://reviews.llvm.org/D17339

llvm-svn: 265689
2016-04-07 16:11:44 +00:00
Ehsan Amiri 4701a91e59 [PPC] Enable transformations in PPCPassConfig::addIRPasses at O2
http://reviews.llvm.org/D18562

A large number of testcases has been modified so they pass after this test.
One testcase is deleted, because I realized even after undoing the original
change that was committed with this testcase, the testcase still passes. So
I removed it. The change to one other testcase (test/CodeGen/PowerPC/pr25802.ll)
is an arbitrary change to keep it passing. Given the original intention of the
testcase, and the fact that fixing it will require some time to change the testcase,
we concluded that this quick change will be enough.

llvm-svn: 265683
2016-04-07 15:30:55 +00:00
Tom Stellard d37630e461 AMDGPU/SI: Add MachineBasicBlock parameter to SIInstrInfo::insertWaitStates
Summary: This makes it possible to insert nops at the end of blocks.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18549

llvm-svn: 265678
2016-04-07 14:47:07 +00:00
Valery Pykhtin e23b6deb01 [AMDGPU] fix readlane/readfirstlane src vgpr operand type.
For VGPR_32 operand disassembler expects a VGPR register encoded as 0..255 (enum8 src operand).
readfirstlane/readline actually has enum9 operand and this change fixes VGPR_32 to VS_32 (enum9 encoding).

Differential Revision: http://reviews.llvm.org/D18696

llvm-svn: 265670
2016-04-07 13:41:51 +00:00
Benjamin Kramer 4fb78518f1 Make helper functions static. NFC.
llvm-svn: 265653
2016-04-07 10:10:09 +00:00
Simon Pilgrim d54bae6525 [X86][SSE] Add support for VZEXT constant folding
llvm-svn: 265646
2016-04-07 07:52:45 +00:00
Ahmed Bougacha 1cf67fb9cb [X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC.
Re-apply r265450 which caused PR27245 and was reverted in r265559
because of a wrong generalization: the fetch_and_add->add_and_fetch
combine only works in specific, but pretty common, cases:
  (icmp slt x, 0) -> (icmp sle (add x, 1), 0)
  (icmp sge x, 0) -> (icmp sgt (add x, 1), 0)
  (icmp sle x, 0) -> (icmp slt (sub x, 1), 0)
  (icmp sgt x, 0) -> (icmp sge (sub x, 1), 0)

Original Message:

We only generate LOCKed versions of add/sub when the result is unused.
It often happens that the result is used, but only by a comparison. We
can optimize those out by reusing EFLAGS, which lets us use the proper
instructions, instead of having to fallback to LXADD.

Instead of doing this as an MI peephole (as we do for the other
non-LOCKed (really, non-MR) forms), do it in ISel. It becomes quite
tricky later.

This also makes it eventually possible to stop expanding and/or/xor
if the only user is an icmp (also see D18141).

This uses the LOCK ISD opcodes added by r262244.

Differential Revision: http://reviews.llvm.org/D17633

llvm-svn: 265636
2016-04-07 02:07:10 +00:00
Quentin Colombet b073c12912 [AArch64] Teach RegisterBankInfo about the CC register bank.
We need to cover each register class with a register bank.

llvm-svn: 265629
2016-04-07 00:39:29 +00:00
Quentin Colombet cbc353a422 [AArch64] Teach RegisterBankInfo about the mapping of register classes
on register banks.

llvm-svn: 265626
2016-04-07 00:14:30 +00:00
Hans Wennborg ab16be799c Re-commit r265039 "[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)"
Third time's the charm? The previous attempt (r265345) caused ASan test
failures on X86, as broken CFI caused stack traces to not work.

This version of the patch makes sure not to merge with stack adjustments
that have CFI, and to not add merged instructions' offests to the CFI
about to be generated.

This is already covered by the lit tests; I just got the expectations
wrong previously.

llvm-svn: 265623
2016-04-07 00:05:49 +00:00
JF Bastien 800f87a871 NFC: make AtomicOrdering an enum class
Summary:
In the context of http://wg21.link/lwg2445 C++ uses the concept of
'stronger' ordering but doesn't define it properly. This should be fixed
in C++17 barring a small question that's still open.

The code currently plays fast and loose with the AtomicOrdering
enum. Using an enum class is one step towards tightening things. I later
also want to tighten related enums, such as clang's
AtomicOrderingKind (which should be shared with LLVM as a 'C++ ABI'
enum).

This change touches a few lines of code which can be improved later, I'd
like to keep it as NFC for now as it's already quite complex. I have
related changes for clang.

As a follow-up I'll add:
  bool operator<(AtomicOrdering, AtomicOrdering) = delete;
  bool operator>(AtomicOrdering, AtomicOrdering) = delete;
  bool operator<=(AtomicOrdering, AtomicOrdering) = delete;
  bool operator>=(AtomicOrdering, AtomicOrdering) = delete;
This is separate so that clang and LLVM changes don't need to be in sync.

Reviewers: jyknight, reames

Subscribers: jyknight, llvm-commits

Differential Revision: http://reviews.llvm.org/D18775

llvm-svn: 265602
2016-04-06 21:19:33 +00:00
Ehsan Amiri 322eca3849 [PPC] Use VSX/FP Facility integer load when an integer load's only users are conversion to FP
http://reviews.llvm.org/D18405

When the integer value loaded is never used directly as integer we should use VSX 
or Floating Point Facility integer loads and avoid extra direct move

llvm-svn: 265593
2016-04-06 20:12:29 +00:00
James Y Knight 037b9894bd Put quotes around #error string.
GCC reports "missing terminating ' character", even when it's being
skipped by preprocessing.

llvm-svn: 265590
2016-04-06 19:52:32 +00:00
Nicolai Haehnle df3a20cd80 AMDGPU: Add a shader calling convention
This makes it possible to distinguish between mesa shaders
and other kernels even in the presence of compute shaders.

Patch By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

Differential Revision: http://reviews.llvm.org/D18559

llvm-svn: 265589
2016-04-06 19:40:20 +00:00