Commit Graph

33184 Commits

Author SHA1 Message Date
Sanjay Patel 801caff64d propagate IR-level fast-math-flags to DAG nodes (NFC)
This patch adds the minimum plumbing necessary to use IR-level
fast-math-flags (FMF) in the backend without actually using
them for anything yet. This is a follow-on to:
http://reviews.llvm.org/rL235997

...which split the existing nsw / nuw / exact flags and FMF
into their own struct.

There are 2 structural changes here:

1. The main diff is that we're preparing to extend the optimization
flags to affect more than just binary SDNodes. Eg, IR intrinsics 
( https://llvm.org/bugs/show_bug.cgi?id=21290 ) or non-binop nodes
that don't even exist in IR such as FMA, FNEG, etc.

2. The other change is that we're actually copying the FP fast-math-flags
from the IR instructions to SDNodes. 

Differential Revision: http://reviews.llvm.org/D8900

llvm-svn: 236546
2015-05-05 21:40:38 +00:00
Sanjay Patel fbca70d767 use range-based for-loop; NFC
llvm-svn: 236544
2015-05-05 21:20:52 +00:00
Peter Collingbourne 85a0e23bc8 Thumb2SizeReduction: Check the correct set of registers for LDMIA.
The register set for LDMIA begins at offset 3, not 4. We were previously
missing the short encoding of this instruction in the case where the base
register was the first register in the register set.

Also clean up some dead code:

- The isARMLowRegister check is redundant with what VerifyLowRegs does;
  replace with an assert.
- Remove handling of LDMDB instruction, which has no short encoding (and
  does not appear in ReduceTable).

Differential Revision: http://reviews.llvm.org/D9485

llvm-svn: 236535
2015-05-05 20:07:10 +00:00
Ulrich Weigand c1708b2618 [SystemZ] Add vector intrinsics
This adds intrinsics to allow access to all of the z13 vector instructions.
Note that instructions whose semantics can be described by standard LLVM IR
do not get any intrinsics.

For each instructions whose semantics *cannot* (fully) be described, we
define an LLVM IR target-specific intrinsic that directly maps to this
instruction.

For instructions that also set the condition code, the LLVM IR intrinsic
returns the post-instruction CC value as a second result.  Instruction
selection will attempt to detect code that compares that CC value against
constants and use the condition code directly instead.

Based on a patch by Richard Sandiford.

llvm-svn: 236527
2015-05-05 19:31:09 +00:00
Ulrich Weigand 5211f9ff4d [SystemZ] Mark v1i128 and v1f128 as unsupported
The ABI specifies that <1 x i128> and <1 x fp128> are supposed to be
passed in vector registers.  We do not yet support those types, and
some infrastructure is missing before we can do so.

In order to prevent accidentally generating code violating the ABI,
this patch adds checks to detect those types and error out if user
code attempts to use them.

llvm-svn: 236526
2015-05-05 19:30:05 +00:00
Ulrich Weigand cd2a1b5341 [SystemZ] Handle sub-128 vectors
The ABI allows sub-128 vectors to be passed and returned in registers,
with the vector occupying the upper part of a register.  We therefore
want to legalize those types by widening the vector rather than promoting
the elements.

The patch includes some simple tests for sub-128 vectors and also tests
that we can recognize various pack sequences, some of which use sub-128
vectors as temporary results.  One of these forms is based on the pack
sequences generated by llvmpipe when no intrinsics are used.

Signed unpacks are recognized as BUILD_VECTORs whose elements are
individually sign-extended.  Unsigned unpacks can have the equivalent
form with zero extension, but they also occur as shuffles in which some
elements are zero.

Based on a patch by Richard Sandiford.

llvm-svn: 236525
2015-05-05 19:29:21 +00:00
Ulrich Weigand 49506d78e7 [SystemZ] Add CodeGen support for scalar f64 ops in vector registers
The z13 vector facility includes some instructions that operate only on the
high f64 in a v2f64, effectively extending the FP register set from 16
to 32 registers.  It's still better to use the old instructions if the
operands happen to fit though, since the older instructions have a shorter
encoding.

Based on a patch by Richard Sandiford.

llvm-svn: 236524
2015-05-05 19:28:34 +00:00
Ulrich Weigand 80b3af7ab3 [SystemZ] Add CodeGen support for v4f32
The architecture doesn't really have any native v4f32 operations except
v4f32->v2f64 and v2f64->v4f32 conversions, with only half of the v4f32
elements being used.  Even so, using vector registers for <4 x float>
and scalarising individual operations is much better than generating
completely scalar code, since there's much less register pressure.
It's also more efficient to do v4f32 comparisons by extending to 2
v2f64s, comparing those, then packing the result.

This particularly helps with llvmpipe.

Based on a patch by Richard Sandiford.

llvm-svn: 236523
2015-05-05 19:27:45 +00:00
Ulrich Weigand cd808237b2 [SystemZ] Add CodeGen support for v2f64
This adds ABI and CodeGen support for the v2f64 type, which is natively
supported by z13 instructions.

Based on a patch by Richard Sandiford.

llvm-svn: 236522
2015-05-05 19:26:48 +00:00
Ulrich Weigand ce4c109585 [SystemZ] Add CodeGen support for integer vector types
This the first of a series of patches to add CodeGen support exploiting
the instructions of the z13 vector facility.  This patch adds support
for the native integer vector types (v16i8, v8i16, v4i32, v2i64).

When the vector facility is present, we default to the new vector ABI.
This is characterized by two major differences:
- Vector types are passed/returned in vector registers
  (except for unnamed arguments of a variable-argument list function).
- Vector types are at most 8-byte aligned.

The reason for the choice of 8-byte vector alignment is that the hardware
is able to efficiently load vectors at 8-byte alignment, and the ABI only
guarantees 8-byte alignment of the stack pointer, so requiring any higher
alignment for vectors would require dynamic stack re-alignment code.

However, for compatibility with old code that may use vector types, when
*not* using the vector facility, the old alignment rules (vector types
are naturally aligned) remain in use.

These alignment rules are not only implemented at the C language level
(implemented in clang), but also at the LLVM IR level.  This is done
by selecting a different DataLayout string depending on whether the
vector ABI is in effect or not.

Based on a patch by Richard Sandiford.

llvm-svn: 236521
2015-05-05 19:25:42 +00:00
Ulrich Weigand a8b04e1cbc [SystemZ] Add z13 vector facility and MC support
This patch adds support for the z13 processor type and its vector facility,
and adds MC support for all new instructions provided by that facilily.

Apart from defining the new instructions, the main changes are:

- Adding VR128, VR64 and VR32 register classes.
- Making FP64 a subclass of VR64 and FP32 a subclass of VR32.
- Adding a D(V,B) addressing mode for scatter/gather operations
- Adding 1-, 2-, and 3-bit immediate operands for some 4-bit fields.
  Until now all immediate operands have been the same width as the
  underlying field (hence the assert->return change in decode[SU]ImmOperand).

In addition, sys::getHostCPUName is extended to detect running natively
on a z13 machine.

Based on a patch by Richard Sandiford.

llvm-svn: 236520
2015-05-05 19:23:40 +00:00
Reid Kleckner 0738a9c02e Re-land "[WinEH] Add an EH registration and state insertion pass for 32-bit x86"
This reverts commit r236360.

This change exposed a bug in WinEHPrepare by opting win32 code into EH
preparation. We already knew that WinEHPrepare has bugs, and is the
status quo for x64, so I don't think that's a reason to hold off on this
change. I disabled exceptions in the sanitizer tests in r236505 and an
earlier revision.

llvm-svn: 236508
2015-05-05 17:44:16 +00:00
Quentin Colombet 61b305edfd [ShrinkWrap] Add (a simplified version) of shrink-wrapping.
This patch introduces a new pass that computes the safe point to insert the
prologue and epilogue of the function.
The interest is to find safe points that are cheaper than the entry and exits
blocks.

As an example and to avoid regressions to be introduce, this patch also
implements the required bits to enable the shrink-wrapping pass for AArch64.


** Context **

Currently we insert the prologue and epilogue of the method/function in the
entry and exits blocks. Although this is correct, we can do a better job when
those are not immediately required and insert them at less frequently executed
places.
The job of the shrink-wrapping pass is to identify such places.


** Motivating example **

Let us consider the following function that perform a call only in one branch of
a if:
define i32 @f(i32 %a, i32 %b)  {
 %tmp = alloca i32, align 4
 %tmp2 = icmp slt i32 %a, %b
 br i1 %tmp2, label %true, label %false

true:
 store i32 %a, i32* %tmp, align 4
 %tmp4 = call i32 @doSomething(i32 0, i32* %tmp)
 br label %false

false:
 %tmp.0 = phi i32 [ %tmp4, %true ], [ %a, %0 ]
 ret i32 %tmp.0
}

On AArch64 this code generates (removing the cfi directives to ease
readabilities):
_f:                                     ; @f
; BB#0:
  stp x29, x30, [sp, #-16]!
  mov  x29, sp
  sub sp, sp, #16             ; =16
  cmp  w0, w1
  b.ge  LBB0_2
; BB#1:                                 ; %true
  stur  w0, [x29, #-4]
  sub x1, x29, #4             ; =4
  mov  w0, wzr
  bl  _doSomething
LBB0_2:                                 ; %false
  mov  sp, x29
  ldp x29, x30, [sp], #16
  ret

With shrink-wrapping we could generate:
_f:                                     ; @f
; BB#0:
  cmp  w0, w1
  b.ge  LBB0_2
; BB#1:                                 ; %true
  stp x29, x30, [sp, #-16]!
  mov  x29, sp
  sub sp, sp, #16             ; =16
  stur  w0, [x29, #-4]
  sub x1, x29, #4             ; =4
  mov  w0, wzr
  bl  _doSomething
  add sp, x29, #16            ; =16
  ldp x29, x30, [sp], #16
LBB0_2:                                 ; %false
  ret

Therefore, we would pay the overhead of setting up/destroying the frame only if
we actually do the call.


** Proposed Solution **

This patch introduces a new machine pass that perform the shrink-wrapping
analysis (See the comments at the beginning of ShrinkWrap.cpp for more details).
It then stores the safe save and restore point into the MachineFrameInfo
attached to the MachineFunction.
This information is then used by the PrologEpilogInserter (PEI) to place the
related code at the right place. This pass runs right before the PEI.

Unlike the original paper of Chow from PLDI’88, this implementation of
shrink-wrapping does not use expensive data-flow analysis and does not need hack
to properly avoid frequently executed point. Instead, it relies on dominance and
loop properties.

The pass is off by default and each target can opt-in by setting the
EnableShrinkWrap boolean to true in their derived class of TargetPassConfig.
This setting can also be overwritten on the command line by using
-enable-shrink-wrap.

Before you try out the pass for your target, make sure you properly fix your
emitProlog/emitEpilog/adjustForXXX method to cope with basic blocks that are not
necessarily the entry block.


** Design Decisions **

1. ShrinkWrap is its own pass right now. It could frankly be merged into PEI but
for debugging and clarity I thought it was best to have its own file.
2. Right now, we only support one save point and one restore point. At some
point we can expand this to several save point and restore point, the impacted
component would then be:
- The pass itself: New algorithm needed.
- MachineFrameInfo: Hold a list or set of Save/Restore point instead of one
  pointer.
- PEI: Should loop over the save point and restore point.
Anyhow, at least for this first iteration, I do not believe this is interesting
to support the complex cases. We should revisit that when we motivating
examples.

Differential Revision: http://reviews.llvm.org/D9210

<rdar://problem/3201744>

llvm-svn: 236507
2015-05-05 17:38:16 +00:00
Kit Barton d4eb73c00e This patch adds ABI support for v1i128 data type.
It adds v1i128 to the appropriate register classes and checks parameter passing
and return values.

This is related to http://reviews.llvm.org/D9081, which will add instructions
that exploit the v1i128 datatype.

Phabricator review: http://reviews.llvm.org/D9475

llvm-svn: 236503
2015-05-05 16:10:44 +00:00
Daniel Sanders eda60d217b [mips] Generate code for insert/extract operations when using the N64 ABI and MSA.
Summary:
When using the N64 ABI, element-indices use the i64 type instead of i32.
In many cases, we can use iPTR to account for this but additional patterns
and pseudo's are also required.

This fixes most (but not quite all) failures in the test-suite when using
N64 and MSA together.

Reviewers: vkalintiris

Reviewed By: vkalintiris

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9342

llvm-svn: 236494
2015-05-05 10:32:24 +00:00
Daniel Sanders 4160c802d9 [mips][msa] Test basic operations for the N32 ABI too.
Summary:
This required adding instruction aliases for dneg.

N64 will be enabled shortly but requires additional bugfixes.

Reviewers: vkalintiris

Reviewed By: vkalintiris

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9341

llvm-svn: 236489
2015-05-05 08:48:35 +00:00
Reid Kleckner 9dad227b85 [X86] Fix assertion while DAG combining offsets and ExternalSymbols
ExternalSymbol nodes do not contain offsets, unlike GlobalValue nodes.

llvm-svn: 236471
2015-05-04 23:22:36 +00:00
Pete Cooper 4dddbcfbb1 [ARM] IT block insertion needs to update kill flags
When forming an IT block from the first MOV here:

	%R2<def> = t2MOVr %R0, pred:1, pred:%CPSR, opt:%noreg
	%R3<def> = tMOVr %R0<kill>, pred:14, pred:%noreg

the move in to R3 is moved out of the IT block so that later instructions on the same predicate can be inside this block, and we can share the IT instruction.

However, when moving the R3 copy out of the IT block, we need to clear its kill flags for anything in use at this point in time, ie, R0 here.

This appeases the machine verifier which thought that R0 wasn't defined when used.

I have a test case, but its extremely register allocator specific.  It would be too fragile to commit a test which depends on the register allocator here.

llvm-svn: 236468
2015-05-04 22:44:47 +00:00
Reid Kleckner b61f06c9c2 Fix -Wmicrosoft warning by making enum unsigned
llvm-svn: 236436
2015-05-04 18:21:35 +00:00
Ulrich Weigand 9ac2f9b2d8 [SystemZ] Reclassify f32 subregs of f64 registers
At the moment, all subregs defined by the SystemZ target can be modified
independently of the wider register.  E.g. writing to a GR32 does not
change the upper 32 bits of the GR64.  Writing to an FP32 does not change
the lower 32 bits of the FP64.

Hoewver, the upcoming support for the vector extension redefines FP64 as
one half of a V128.  Floating-point operations leave the other half of
a V128 in an unpredictable state, so it's no longer the case that writing
to an FP32 leaves the bits of the underlying register (the V128) alone.
I'd prefer to have separate subreg_ names for this situation, so that
it's obvious at a glance whether we're talking about a subreg that leaves
the other parts of the register alone.

No behavioral change intended.

Patch originally by Richard Sandiford.

llvm-svn: 236433
2015-05-04 17:41:22 +00:00
Ulrich Weigand 1f698b003c [SystemZ] Clean up AsmParser isMem() handling
We know what MemoryKind an operand has at the time we construct it,
so we might as well just record it in an unused part of the structure.
This makes it easier to add scatter/gather addresses later.

No behavioral change intended.

Patch originally by Richard Sandiford.

llvm-svn: 236432
2015-05-04 17:40:53 +00:00
Ulrich Weigand 1c6f07d616 [SystemZ] Fix getTargetNodeName
It seems SystemZTargetLowering::getTargetNodeName got out of sync with
some recent changes to the SystemZISD opcode list.  Add back all the
missing opcodes (and re-sort to the same order as SystemISelLowering.h).

llvm-svn: 236430
2015-05-04 17:39:40 +00:00
Tom Stellard b81f4aa952 R600/SI: Code cleanup
This is a follow-up to r236004

llvm-svn: 236427
2015-05-04 16:45:08 +00:00
Elena Demikhovsky 60eb9db7bb AVX-512: added calling convention for i1 vectors in 32-bit mode.
Fixed some bugs in extend/truncate for AVX-512 target.
Removed VBROADCASTM (masked broadcast) node, since it is not used any more.

llvm-svn: 236420
2015-05-04 12:40:50 +00:00
Elena Demikhovsky 52266388f8 AVX-512: added integer "add" and "sub" instructions with saturation for SKX
with intrinsics and tests

by Asaf Badouh (asaf.badouh@intel.com)

llvm-svn: 236418
2015-05-04 12:35:55 +00:00
Elena Demikhovsky 2557a22be7 AVX-512: Added VPACK* instructions forms for KNL and SKX
and their intrinsics
by Asaf Badouh (asaf.badouh@intel.com)

llvm-svn: 236414
2015-05-04 09:14:02 +00:00
Elena Demikhovsky 1b60ed7069 Masked gather and scatter intrinsics - enabled codegen for KNL.
llvm-svn: 236394
2015-05-03 07:12:25 +00:00
Simon Pilgrim d5e20306cc [SSE2] Minor tidyup of v16i8 SHL lowering. NFC.
Removed code that was replicating v8i16 'shift + mask' implementation that is done more nicely by making use of LowerScalarImmediateShift

llvm-svn: 236388
2015-05-02 14:42:43 +00:00
Reid Kleckner 83d89fa546 Revert "[WinEH] Add an EH registration and state insertion pass for 32-bit x86"
This reverts commit r236359. Things are still broken despite testing. :(

llvm-svn: 236360
2015-05-01 22:50:14 +00:00
Reid Kleckner 51476acd77 Re-land "[WinEH] Add an EH registration and state insertion pass for 32-bit x86"
This reverts commit r236340.

llvm-svn: 236359
2015-05-01 22:40:25 +00:00
Quentin Colombet 0de2346859 [AArch64][FastISel] Variant of the logical instructions that use two input
registers cannot write on SP.

rdar://problem/20748715

llvm-svn: 236352
2015-05-01 21:34:57 +00:00
Colin LeMahieu 6efd273a61 [Hexagon] Removing variable unused in release.
llvm-svn: 236351
2015-05-01 21:30:22 +00:00
Colin LeMahieu b662565475 [Hexagon] Adding expression MC emission and removing XFAIL from test that hits this code path.
llvm-svn: 236348
2015-05-01 21:14:21 +00:00
Quentin Colombet 9df2fa261b [AArch64][FastISel] Fix the setting of kill flags for MUL -> UMULH sequences.
rdar://problem/20748715

llvm-svn: 236346
2015-05-01 20:57:11 +00:00
Reid Kleckner 2747d3d55a Revert "[WinEH] Add an EH registration and state insertion pass for 32-bit x86"
This reverts commit r236339, it breaks the win32 clang-cl self-host.

llvm-svn: 236340
2015-05-01 20:14:04 +00:00
Reid Kleckner 4856fc61b4 [WinEH] Add an EH registration and state insertion pass for 32-bit x86
This pass is responsible for constructing the EH registration object
that gets linked into fs:00, which is all it does in this change. In the
future, it will also insert stores to update the EH state number.

I considered keeping this functionality in WinEHPrepare, but it's pretty
separable and X86 specific. It has conceptually very little to do with
the task of WinEHPrepare, which is currently outlining.  WinEHPrepare is
also in theory useful on ARM, but this logic is pretty x86 specific.

Reviewers: andrew.w.kaylor, majnemer

Differential Revision: http://reviews.llvm.org/D9422

llvm-svn: 236339
2015-05-01 20:04:54 +00:00
Pete Cooper f68d5038e6 [ARM] Transfer the internal flag in thumb2 size reduction.
Converting from t2LDRs to tLDRr caused the shift argument to drop the internal flag.  This would then throw machine verifier errors.

Unfortunately i'm having trouble reducing a test case.  I'm going to keep trying, but so far its a scary combination of machine sinking, an 'and i1', loads feeding loads, and a bunch of code which shouldn't change IT block formation, but does.  Its not useful to commit a test in that state as we have no way of knowing if it even hits this code reliably in future.

rdar://problem/20752113

llvm-svn: 236333
2015-05-01 18:57:32 +00:00
Peter Collingbourne d27d3a151f ARM: Align functions containing Thumb-2 jump tables to 4 bytes.
Functions with jump tables need an alignment of 4 because they use the ADR
instruction, which aligns the PC to 4 bytes before adding an offset.

Differential Revision: http://reviews.llvm.org/D9424

llvm-svn: 236327
2015-05-01 18:05:59 +00:00
James Y Knight 35e04e84fa [Sparc] Repair fixups in little endian mode.
Differential Revision: http://reviews.llvm.org/D9434

llvm-svn: 236324
2015-05-01 17:13:02 +00:00
Toma Tabacu 00e9867988 [mips] [IAS] Fix error messages for using LI with 64-bit immediates.
Summary:
LI should never accept immediates larger than 32 bits.
The additional Is32BitImm boolean also paves the way for unifying the functionality that LA and LI have in common.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9289

llvm-svn: 236313
2015-05-01 12:19:27 +00:00
Toma Tabacu a2861db834 [mips] [IAS] Slightly improve shift instruction generation in expandLoadImm.
Summary:
Generate one DSLL32 of 0 instead of two consecutive DSLL of 16.
In order to do this I had to change createLShiftOri's template argument from a bool to an unsigned.

This also gave me the opportunity to rewrite the mips64-expansions.s test, as it was testing the same cases multiple times and skipping over other cases.
It was also somewhat unreadable, as the CHECK lines were grouped in a huge block of text at the beginning of the file.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8974

llvm-svn: 236311
2015-05-01 10:26:47 +00:00
Tom Stellard aa798340c3 R600/SI: Add VCC as an implict def of SI_KILL
When SI_KILL has a register operand, its lowered form writes to vcc.

llvm-svn: 236307
2015-05-01 03:44:09 +00:00
Tom Stellard 0b7feb1cb7 R600/SI: Fix verifier errors from the SIAnnotateControlFlow pass
This pass was generating 'Instruction does not dominate all uses!'
errors for programs which had loops with a condition variable that
depended on the result of a phi instruction from outside of the loop.

The pass was inserting new phi nodes outside of the loop which used values
defined inside the loop.

http://bugs.freedesktop.org/show_bug.cgi?id=90056

llvm-svn: 236306
2015-05-01 03:44:08 +00:00
Pete Cooper 2127b00cd5 [ARM] optimizeSelect should clear kill flags.
If we move an instruction from one block down to a MOVC and predicate it,
then the original instruction could be moved in to a loop.  In this case,
its invalid for any kill flags to remain on there.

Fails with -verfy-machineinstrs.

rdar://problem/20752113

llvm-svn: 236290
2015-04-30 23:57:47 +00:00
Quentin Colombet 329fa890ba [AArch64] Fix bad register class constraint in fast-isel for TST instruction.
rdar://problem/20748715

llvm-svn: 236273
2015-04-30 22:27:20 +00:00
Pete Cooper 5111881cfc Don't always apply kill flag in thumb2 ABS pseudo expansion.
The expansion for t2ABS was always setting the kill flag on the rsb instruction.
It should instead only be set on rsb if it was set on the original ABS instruction.

rdar://problem/20752113

llvm-svn: 236272
2015-04-30 22:15:59 +00:00
Reid Kleckner 60d5232be2 [X86] Use 4 byte preferred aggregate alignment on Win32
This helps reduce the frequency of stack realignment prologues in 32-bit
X86 Windows code. Before this change and the corresponding clang change,
we would take the max of the type preferred alignment and the explicit
alignment on the alloca.

If you don't override aggregate alignment in datalayout, you get a
default of 8. This dates back to 2007 / r34356, and changing it seems
prohibitively difficult at this point.

llvm-svn: 236270
2015-04-30 22:11:59 +00:00
Matt Arsenault d42e017ee4 Mips: Remove dead declaration
llvm-svn: 236250
2015-04-30 19:35:43 +00:00
Quentin Colombet 0a905042cd [ARM] Do not generate invalid encoding for stack adjust, even if this is just
temporary.

Because of that:
1. The machine verifier was complaining on such code.
2. The generate code worked just because the thumb reduction size pass fixed the
opcode.

rdar://problem/20749824

llvm-svn: 236247
2015-04-30 18:52:49 +00:00
Tim Northover 03b99f66d7 AArch64: add BFC alias for the BFI/BFM instructions.
Unlike 32-bit ARM, AArch64 can use wzr/xzr to implement this without the need
for a separate instruction.

rdar://18679590

llvm-svn: 236245
2015-04-30 18:28:58 +00:00
Jan Vesely 808fff585b Reinstate revisions r234755, r234759, r234760
changes:
  Don't apply on hexagon and NVPTX since they no longer claim to support UADDO/USUBO
  Add location to getConstant
  Drop comment about the ops being turned into expand

llvm-svn: 236240
2015-04-30 17:15:56 +00:00
Daniel Jasper 232778a7a0 Silence unused warning in non-assert builds.
llvm-svn: 236213
2015-04-30 09:01:21 +00:00
Elena Demikhovsky e1eda8a9e6 Masked gather and scatter - added DAGCombine visitors
and AVX-512 instruction selection patterns.
All other patches, including tests will follow.

http://reviews.llvm.org/D7665

llvm-svn: 236211
2015-04-30 08:38:48 +00:00
Simon Pilgrim ecf5875bd5 [SSE] Fix for MUL v16i8 on pre-SSE41 targets (PR23369).
Sign extension of i8 to i16 was placing the unpacked bytes in the lower byte instead of the upper byte.

llvm-svn: 236209
2015-04-30 08:23:16 +00:00
Pete Cooper 46361a1ea1 Change x86 CMOVE_F to read it source, not write it.
This was breaking sqlite with the machine verifier because operand 0 was a def according to tablegen, but didn't have the 'isDef' flag set.

Looking at the ISA, its clear that this operand is a source as writing to st(0) is implicit.  So move the operand to the correct place in the td file.

rdar://problem/20751584

llvm-svn: 236183
2015-04-29 23:51:33 +00:00
Douglas Katzman 9160e78ac8 [Sparc] Really add sparcel architecture support.
Mostly copy-and-paste from Sparc v8 architecture.

Differential Revision: http://reviews.llvm.org/D8741

llvm-svn: 236146
2015-04-29 20:30:57 +00:00
Manman Ren 0e20822887 [AArch64] Refactor out codes that depend on specific CS save sequence.
No functionality change.

llvm-svn: 236143
2015-04-29 20:03:38 +00:00
Tim Northover 5211715360 ARM: mark branch-like instructions with correct flags.
There's probably no way to test BXJ, but if the compiler ever did emit it
during CodeGen it would have to be a block terminator so "isBranch" is
appropriate.

BLX is more tricky. Clearly a call, but it affects surprisingly little.

rdar://18719544

llvm-svn: 236140
2015-04-29 19:16:38 +00:00
Douglas Katzman 9cb88b73c6 Make Sparc assembler accept parenthesized constant expressions.
Differential Revision: http://reviews.llvm.org/D9087

llvm-svn: 236137
2015-04-29 18:48:29 +00:00
Zoran Jovanovic 387ce30685 [mips][microMIPSr6] Implement MUL, MUH, MULU and MUHU instructions
Differential Revision: http://reviews.llvm.org/D8894

llvm-svn: 236131
2015-04-29 17:23:22 +00:00
Reid Kleckner c695471365 [X86] Avoid mangling frameescape labels
x86 Windows uses the '_' prefix for all global symbols, and this was
mistakenly being applied to frameescape labels, which are not externally
visible global symbols. They use the private global prefix 'L'.

The *right* way to fix this is probably to stop masquerading this label
as an ExternalSymbol and create a new SDNode type. These labels are not
"external", and we know they will be resolved by assembly time. Having a
custom SDNode type would allow us to do better X86 address mode
matching, so it's probably worth doing eventually.

llvm-svn: 236123
2015-04-29 16:46:01 +00:00
Duncan P. N. Exon Smith a9308c49ef IR: Give 'DI' prefix to debug info metadata
Finish off PR23080 by renaming the debug info IR constructs from `MD*`
to `DI*`.  The last of the `DIDescriptor` classes were deleted in
r235356, and the last of the related typedefs removed in r235413, so
this has all baked for about a week.

Note: If you have out-of-tree code (like a frontend), I recommend that
you get everything compiling and tests passing with the *previous*
commit before updating to this one.  It'll be easier to keep track of
what code is using the `DIDescriptor` hierarchy and what you've already
updated, and I think you're extremely unlikely to insert bugs.  YMMV of
course.

Back to *this* commit: I did this using the rename-md-di-nodes.sh
upgrade script I've attached to PR23080 (both code and testcases) and
filtered through clang-format-diff.py.  I edited the tests for
test/Assembler/invalid-generic-debug-node-*.ll by hand since the columns
were off-by-three.  It should work on your out-of-tree testcases (and
code, if you've followed the advice in the previous paragraph).

Some of the tests are in badly named files now (e.g.,
test/Assembler/invalid-mdcompositetype-missing-tag.ll should be
'dicompositetype'); I'll come back and move the files in a follow-up
commit.

llvm-svn: 236120
2015-04-29 16:38:44 +00:00
Zoran Jovanovic cca29e8f6e [mips][microMIPSr6] Implement SUB and SUBU instructions
Differential Revision: http://reviews.llvm.org/D8764

llvm-svn: 236118
2015-04-29 16:22:46 +00:00
Zoran Jovanovic 5f34d44354 [mips][microMIPSr6] Implement ADD, ADDU and ADDIU instructions
Differential Revision: http://reviews.llvm.org/D8704

llvm-svn: 236111
2015-04-29 15:11:07 +00:00
James Y Knight c09bdfa4cb Sparc: Prefer reg+reg address encoding when only one register used.
Reg+%g0 is preferred to Reg+imm0 by the manual, and is what GCC produces.

Futhermore, reg+imm is invalid for the (not yet supported) "alternate
address space" instructions.

Differential Revision: http://reviews.llvm.org/D8753

llvm-svn: 236107
2015-04-29 14:54:44 +00:00
Vasileios Kalintiris 1249e74648 Mips fast-isel - handle functions which return i8 or i6 .
Summary: Allow Mips fast-isel to handle functions which return i8/i16 signed/unsigned.

Test Plan:
Make check tests are forthcoming.
Already passes test-suite at O0/O2 for Mips 32 r1/r2

Reviewers: dsanders, rkotler

Subscribers: llvm-commits, rfuhler

Differential Revision: http://reviews.llvm.org/D6765

llvm-svn: 236103
2015-04-29 14:17:14 +00:00
Daniel Sanders 301f937765 [mips] Correct 128-bit shifts on 64-bit targets.
Summary:
The existing code was correct for 32-bit GPR's but not 64-bit GPR's. It now
accounts for both cases.

Reviewers: vkalintiris

Reviewed By: vkalintiris

Subscribers: llvm-commits, mohit.bhakkad, sagar

Differential Revision: http://reviews.llvm.org/D9337

llvm-svn: 236099
2015-04-29 12:28:58 +00:00
Toma Tabacu 79588100d7 [mips] [IAS] Inline assemble-time shifting out of createLShiftOri. NFC.
Summary:
Do the assemble-time shifts from createLShiftOri at the source, which groups all the shifting together, closer to the main logic path, and
store the results in concisely-named variables to improve code clarity.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8973

llvm-svn: 236096
2015-04-29 10:19:56 +00:00
Elena Demikhovsky a9f20495a2 fixed 80-chars; NFC
llvm-svn: 236093
2015-04-29 08:49:57 +00:00
Eric Christopher 0ba41a6841 Reuse a lookup in an assert.
llvm-svn: 236054
2015-04-28 22:38:35 +00:00
Tim Northover e18d662201 ARM: fix peephole optimisation of TST
We were trying to look through COPY instructions, but only to the next
instruction in a BB and incorrectly anyway. The cases where that would actually
be a good idea are rare enough (and not even tested!) that it's not worth
trying to get right.

rdar://20721342

llvm-svn: 236050
2015-04-28 22:03:55 +00:00
James Y Knight e8da8096ec Sparc: Add alternate aliases for conditional branch instructions.
llvm-svn: 236042
2015-04-28 21:27:31 +00:00
Alexei Starovoitov 659ece9ddb [bpf] fix build
Patch by Brenden Blanco.

llvm-svn: 236030
2015-04-28 20:38:56 +00:00
Sanjay Patel f75ee4dc07 [x86] remove RCPPS and RSQRTPS intrinsic instruction definitions
We don't need codegen-only intrinsic instructions for the vector forms of these instructions.

This makes the reciprocal estimate instruction lowering identical to how we handle normal
square roots: (V)SQRTPS / (V)SQRTPD.

No existing regression tests fail with this patch.

Differential Revision: http://reviews.llvm.org/D9301

llvm-svn: 236013
2015-04-28 18:48:45 +00:00
Eric Christopher 35a8a62125 Add a fixme to resetTargetOptions to explain why it needs to go
away.

llvm-svn: 236009
2015-04-28 18:09:05 +00:00
Eric Christopher f4bf3779d8 Fix a [-Werror,-Winconsistent-missing-override] problem in the
NVPTX overrides.

llvm-svn: 236007
2015-04-28 18:06:27 +00:00
Tom Stellard 96301d2455 R600: Fix up for AsmPrinter's OutStreamer being a unique_ptr
Fixes a crash with basically any OpenGL application using the radeonsi
driver.

Patch by: Michel Dänzer

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90176
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 236004
2015-04-28 17:37:03 +00:00
Tom Stellard 0a0fa03d5a R600/SI: Add a lower case alias for subtarget feature: +DumpCode
llc converts all feature strings to lower case, while the LLVM C API
does not, so we need a lower case alias in order to test this with llc.

llvm-svn: 236003
2015-04-28 17:37:00 +00:00
Justin Holewinski 3d2a976197 [NVPTX] Handle addrspacecast constant expressions in aggregate initializers
We need to track if an AddrSpaceCast expression was seen when
generating an MCExpr for a ConstantExpr.  This change introduces a
custom lowerConstant method to the NVPTX asm printer that will create
NVPTXGenericMCSymbolRefExpr nodes at the appropriate places to encode
the information that a given symbol needs to be casted to a generic
address.

llvm-svn: 236000
2015-04-28 17:18:30 +00:00
Sanjay Patel ba55804ea3 move IR-level optimization flags into their own struct
This is a preliminary step to using the IR-level floating-point fast-math-flags in the SDAG (D8900).

In this patch, we introduce the optimization flags as their own struct. As noted in the TODO comment, 
we should eventually share this data between the IR passes and the backend.

We also switch the existing nsw / nuw / exact bit functionality of the BinaryWithFlagsSDNode class to
use the new struct.

The tradeoff is that instead of using the free but limited space of SDNode's SubclassData, we add a
data member to the subclass. This means we don't have to repeat all of the get/set methods per flag,
but we're potentially adding size to all nodes of this subclassi type.

In practice on 64-bit systems (measured on Linux and MacOS X), there is no size difference between an
SDNode and BinaryWithFlagsSDNode after this change: they're both 80 bytes. This means that we had at
least one free byte to play with due to struct alignment.

Differential Revision: http://reviews.llvm.org/D9325

llvm-svn: 235997
2015-04-28 16:39:12 +00:00
Elena Demikhovsky 1f7b3644d3 Fixed crash of variable shift inst on AVX2
https://llvm.org/bugs/show_bug.cgi?id=22955

llvm-svn: 235993
2015-04-28 14:46:35 +00:00
Toma Tabacu 7dea2e3982 [mips] [IAS] Do not generate redundant ORi in createLShiftOri.
Summary: If the immediate is 0, the ORi is pointless.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8969

llvm-svn: 235990
2015-04-28 14:06:35 +00:00
Sergey Dmitrouk 842a51bad8 Reapply r235977 "[DebugInfo] Add debug locations to constant SD nodes"
[DebugInfo] Add debug locations to constant SD nodes

This adds debug location to constant nodes of Selection DAG and updates
all places that create constants to pass debug locations
(see PR13269).

Can't guarantee that all locations are correct, but in a lot of cases choice
is obvious, so most of them should be. At least all tests pass.

Tests for these changes do not cover everything, instead just check it for
SDNodes, ARM and AArch64 where it's easy to get incorrect locations on
constants.

This is not complete fix as FastISel contains workaround for wrong debug
locations, which drops locations from instructions on processing constants,
but there isn't currently a way to use debug locations from constants there
as llvm::Constant doesn't cache it (yet). Although this is a bit different
issue, not directly related to these changes.

Differential Revision: http://reviews.llvm.org/D9084

llvm-svn: 235989
2015-04-28 14:05:47 +00:00
Daniel Jasper 48e93f7181 Revert "[DebugInfo] Add debug locations to constant SD nodes"
This breaks a test:
http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/23870

llvm-svn: 235987
2015-04-28 13:38:35 +00:00
Toma Tabacu 6114565269 [mips] [IAS] Rename the createShiftOr function to createLShiftOri. NFC.
Summary: The new name is more accurate with regard to the functionality.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8968

llvm-svn: 235984
2015-04-28 13:16:06 +00:00
Toma Tabacu 137d90ab88 [mips] [IAS] Store the expandLoadImm destination register in a variable. NFC.
Summary: This removes multiple calls to getReg() and saves us column space in the source file.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8924

llvm-svn: 235978
2015-04-28 12:04:53 +00:00
Sergey Dmitrouk adb4c69d5c [DebugInfo] Add debug locations to constant SD nodes
This adds debug location to constant nodes of Selection DAG and updates
all places that create constants to pass debug locations
(see PR13269).

Can't guarantee that all locations are correct, but in a lot of cases choice
is obvious, so most of them should be. At least all tests pass.

Tests for these changes do not cover everything, instead just check it for
SDNodes, ARM and AArch64 where it's easy to get incorrect locations on
constants.

This is not complete fix as FastISel contains workaround for wrong debug
locations, which drops locations from instructions on processing constants,
but there isn't currently a way to use debug locations from constants there
as llvm::Constant doesn't cache it (yet). Although this is a bit different
issue, not directly related to these changes.

Differential Revision: http://reviews.llvm.org/D9084

llvm-svn: 235977
2015-04-28 11:56:37 +00:00
Elena Demikhovsky ae51853924 AVX-512: Added "pandn" intrinsics set
by Asaf Badouh (asaf.badouh@intel.com)

llvm-svn: 235971
2015-04-28 08:12:42 +00:00
Ahmed Bougacha 190528703f [MC] Use LShr for constant evaluation of ">>" on ELF/arm64--darwin.
This matches other assemblers and is less unexpected (e.g. PR23227).
On ELF, I tried binutils gas v2.24 and nasm 2.10.09, and they both
agree on LShr.  On COFF, I couldn't get my hands on an assembler yet,
so don't change the behavior.  For now, don't change it on non-AArch64
Darwin either, as the other assembler is gas v1.38, which does an AShr.

llvm-svn: 235963
2015-04-28 01:37:11 +00:00
Matthias Braun eec4efcca5 Cleanup, remove unused return value
llvm-svn: 235952
2015-04-28 00:37:05 +00:00
Sanjay Patel ca5ad5fb6d remove obsolete pattern matches for scalar SSE ops
The blendi pattern should always replace the insertps pattern after:
http://reviews.llvm.org/rL232850
http://reviews.llvm.org/rL235124

llvm-svn: 235930
2015-04-27 22:23:17 +00:00
Ahmed Bougacha c004c60c0a [AArch64] Also combine vector selects fed by non-i1 SETCCs.
After legalization, scalar SETCC has an i32 result type on AArch64.
The i1 requirement seems too conservative, replace it with an assert.

This also means that we now can run after legalization. That should also
be fine, since the ops legalizer runs again after each combine, and
all types created all have the same sizes as the (legal) inputs.

Exposed by r235917; while there, robustize its tests (bsl also uses the
register it defines).

llvm-svn: 235922
2015-04-27 21:43:12 +00:00
Ahmed Bougacha 89bba61c84 [AArch64] Don't assert when combining (v3f32 select (setcc f64)).
When the setcc has f64 operands, we can't build a vector setcc mask
to feed a vselect, because f64 doesn't divide v3f32 evenly.
Just bail out when that happens.

llvm-svn: 235917
2015-04-27 21:01:20 +00:00
Bill Schmidt e71db85bed Silence unused variable errors for no-asserts builds
llvm-svn: 235913
2015-04-27 20:22:35 +00:00
Bill Schmidt fe723b9a6d [PPC64LE] Remove unnecessary swaps from lane-insensitive vector computations
This patch adds a new SSA MI pass that runs on little-endian PPC64
code with VSX enabled. Loads and stores of 4x32 and 2x64 vectors
without alignment constraints are accomplished for little-endian using
lxvd2x/xxswapd and xxswapd/stxvd2x. The existence of the additional
xxswapd instructions hurts performance in comparison with big-endian
code, but they are necessary in the general case to support correct
semantics.

However, the general case does not apply to most vector code. Many
vector instructions are lane-insensitive; they do not "care" which
lanes the parallel computations are performed within, provided that
the resulting data is stored into the correct locations. Thus this
pass looks for computations that perform only lane-insensitive
operations, and remove the unnecessary swaps from loads and stores in
such computations.

Future improvements will allow computations using certain
lane-sensitive operations to also be optimized in this manner, by
modifying the lane-sensitive operations to account for the permuted
order of the lanes. However, this patch only adds the infrastructure
to permit this; no lane-sensitive operations are optimized at this
time.

This code is heavily exercised by the various vectorizing applications
in the projects/test-suite tree. For the time being, I have only added
one simple test case to demonstrate what the pass is doing. Although
it is quite simple, it provides coverage for much of the code,
including the special case handling of copies and subreg-to-reg
operations feeding the swaps. I plan to add additional tests in the
future as I fill in more of the "special handling" code.

Two existing tests were affected, because they expected the swaps to
be present, but they are now removed.

llvm-svn: 235910
2015-04-27 19:57:34 +00:00
Sanjay Patel 8fd573e87f fix 80-cols; NFC
llvm-svn: 235902
2015-04-27 17:45:44 +00:00
Sanjay Patel 912315811e fix typos; NFC
llvm-svn: 235896
2015-04-27 17:03:31 +00:00
Toma Tabacu bda745f532 [mips] Correct bytes to bits in 2 comments. NFC.
llvm-svn: 235891
2015-04-27 15:21:38 +00:00
Elena Demikhovsky a480ef5494 AVX-512: added calling conventions for i1 vectors.
Fixed bug: https://llvm.org/bugs/show_bug.cgi?id=20724

llvm-svn: 235889
2015-04-27 15:11:19 +00:00
Brendon Cahoon 55bdeb7bc7 [Hexagon] Use constant extenders to fix up hardware loops
Use a loop instruction with a constant extender for a hardware
loop instruction that is too far away from the start of the loop.
This is cheaper than changing the SA register value.

Differential Revision: http://reviews.llvm.org/D9262

llvm-svn: 235882
2015-04-27 14:16:43 +00:00
Toma Tabacu d9d344b485 [mips] [IAS] Improve warning for using AT with .set noat.
Summary:
Changed the warning message to show the current value of $at, similar to what clang does for typedef's, and renamed warnIfAssemblerTemporary to a more descriptive name.

I also changed the type of variables which store registers from int to unsigned, updated the relevant test and tried to make the related comments clearer.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8479

llvm-svn: 235881
2015-04-27 14:05:04 +00:00
Vasileios Kalintiris 7a6b18783f Reapply "[mips][FastISel] Implement shift ops for Mips fast-isel.""
This reapplies r235194, which was reverted in r235495 because it was causing a
failure in our out-of-tree buildbots for MIPS. With the sign-extension patch
in r235718, this patch doesn't cause any problem any more.

llvm-svn: 235878
2015-04-27 13:28:05 +00:00
Toma Tabacu b19cf2082f [mips] [IAS] Rename getATRegNum and setATReg to {g,s}etATRegIndex. NFC.
Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8480

llvm-svn: 235877
2015-04-27 13:12:59 +00:00
Elena Demikhovsky d1084c5b3f AVX-512: Extend/Truncate operations for SKX,
SETCC for bit-vectors

llvm-svn: 235875
2015-04-27 12:57:59 +00:00
Simon Pilgrim 4f683c264a [X86][SSE] Add v16i8/v32i8 multiplication support
Patch to allow int8 vectors to be multiplied on the SSE unit instead of being scalarized.

The patch sign extends the i8 lanes to i16, uses the SSE2 pmullw multiplication instruction, then packs the lower byte from each result.

Differential Revision: http://reviews.llvm.org/D9115

llvm-svn: 235837
2015-04-27 07:55:46 +00:00
Alexei Starovoitov f26c748b1b [bpf] fix build and remove a compiler warning in Release mode
Patch by Brenden Blanco.

llvm-svn: 235814
2015-04-26 01:58:08 +00:00
Benjamin Kramer a44b37e676 [ARM] Simplify code. NFC.
llvm-svn: 235803
2015-04-25 17:25:13 +00:00
Benjamin Kramer 6246069c89 [hexagon] Use range-based for loops. No functionality change intended.
llvm-svn: 235802
2015-04-25 14:46:53 +00:00
Benjamin Kramer a37c809ce5 [hexagon] Remove setHexLibcallName, it leaks memory.
Just spell out the full names, it's not that much more code.
No functional change intended.

llvm-svn: 235801
2015-04-25 14:46:46 +00:00
Lang Hames 9ff69c8f4d [AsmPrinter] Make AsmPrinter's OutStreamer member a unique_ptr.
AsmPrinter owns the OutStreamer, so an owning pointer makes sense here. Using a
reference for this is crufty.

llvm-svn: 235752
2015-04-24 19:11:51 +00:00
Vasileios Kalintiris 1202f36b10 [mips][FastISel] Specify which types we handle for integer extension.
Summary:
Perform integer extension only when the destination type is one of
i8, i16 & i32 and when the source type is i1, i8 or i16. For other
combinations we fall back to SelectionDAG.

This fixes the test MultiSource/Benchmarks/7zip that was failing in our
out-of-tree MIPS buildbots.

Reviewers: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9243

llvm-svn: 235718
2015-04-24 13:48:19 +00:00
Jingyue Wu 72fca6c89b Resurrect r235688
We should skip vector types which are not SCEVable.

test/CodeGen/NVPTX/sched2.ll passes

llvm-svn: 235695
2015-04-24 04:22:39 +00:00
Jingyue Wu 62af99b0db Revert r235688
Seems breaking builds

llvm-svn: 235690
2015-04-24 03:26:11 +00:00
Jingyue Wu 312fd0242d [NVPTX] Emits "generic()" depending on the original address space
Summary:
Fixes a bug in the NVPTX codegen. The code used to miss necessary "generic()"
on aggregates of addrspacecasts.

Test Plan: addrspacecast-gvar.ll

Reviewers: eliben, jholewinski

Reviewed By: jholewinski

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D9130

llvm-svn: 235689
2015-04-24 02:57:30 +00:00
Jingyue Wu 3daace5295 [NVPTX] enable NaryReassociate in NVPTX
Summary:
We run NaryReassociate right after SLSR because SLSR enables many
opportunities for NaryReassociate. For example, in nary-slsr.ll

  foo((a + b) + c);
  foo((a + b * 2) + c);
  foo((a + b * 3) + c);   // 2 muls and 6 adds

after SLSR:

  ab = a + b;
  foo(ab + c);
  ab2 = ab + b;
  foo(ab2 + c);
  ab3 = ab2 + b;
  foo(ab3 + c);           // 6 adds

after NaryReassociate:

  abc = (a + b) + c;
  foo(abc);
  ab2c = abc + b;
  foo(ab2c);
  ab3c = ab2c + b;
  foo(ab3c);              // 4 adds

Test Plan: nary-slsr.ll

Reviewers: jholewinski, eliben

Reviewed By: eliben

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D9066

llvm-svn: 235688
2015-04-24 02:54:06 +00:00
Matt Arsenault 5e10016f03 R600/SI: Fix verifier error when producing v_madmk_f32
Copy the kill flags when swapping the operands.

llvm-svn: 235687
2015-04-24 01:57:58 +00:00
Matthias Braun e1a67412cf R600/RegisterCoalescer: Enable more rematerialization/add missing testcase
This enables the rematerialization of some R600 MOV instructions in the
RegisterCoalescer and adds a testcase for r235668.

llvm-svn: 235675
2015-04-24 00:25:50 +00:00
Matt Arsenault a48b866068 R600/SI: Special case v_mov_b32 as really rematerializable
This should be fixed to properly understand all rematerializable
instructions while ignoring implicit reads of exec.

llvm-svn: 235671
2015-04-23 23:34:48 +00:00
Hal Finkel 4dc8fcc224 [PowerPC] Support register name prefixes for vector registers
Match binutils by supporting the optional register name prefix for new vector
registers ("vs" for VSX registers and "q" for QPX registers).

llvm-svn: 235665
2015-04-23 23:16:22 +00:00
Hal Finkel d86e90abdd [PowerPC] Use sync inst alias when printing
So long as the choice between printing msync and sync is not ambiguous, we can
print 'sync 0' and just 'sync'.

llvm-svn: 235663
2015-04-23 23:05:08 +00:00
Tom Stellard ff5cf0e1fd R600: Correctly lower CONCAT_VECTOR nodes with more than 2 operands
llvm-svn: 235662
2015-04-23 22:59:24 +00:00
Hal Finkel fefcfffe68 [PowerPC] Add asm/disasm support for dcbt with hint
Add assembler/disassembler support for dcbt/dcbtst (and aliases) with the hint
field specified (non-zero). Unforunately, the syntax for this instruction is
special in that it differs for server vs. embedded cores:
   dcbt ra, rb, th [server]
   dcbt th, ra, rb [embedded]
where th can be omitted when it is 0. dcbtst is the same. Thus we need to play
games in the parser and the printer to flip the operands around on the embedded
cores. We'll use the server syntax as the default (binutils currently uses the
embedded form by default, but IBM is changing that).

We also stop marking dcbtst as having unmodeled side effects (this is not
necessary, it is just a hint like dcbt -- noticed by inspection, so no separate
test case).

llvm-svn: 235657
2015-04-23 22:47:57 +00:00
Krzysztof Parzyszek ed75e7aece Unbreak build
llvm-svn: 235646
2015-04-23 20:57:39 +00:00
Krzysztof Parzyszek 27ba19a177 [Hexagon] Minor cleanup in HexagonFrameLowering
llvm-svn: 235645
2015-04-23 20:42:20 +00:00
Tom Stellard 8b0182af2f R600/SI: Fix indirect addressing with a negative constant offset
When the base register index of the vector plus the constant offset
was less than zero, we were passing the wrong base register to the indirect
addressing instruction.

In this case, we need to set the base register to v0 and then add
the computed (negative) index to m0.

llvm-svn: 235641
2015-04-23 20:32:01 +00:00
Peter Collingbourne 167668f8c8 Thumb2: When applying branch optimizations, visit branches in reverse order.
The order in which branches appear in ImmBranches is approximately their
order within the function body. By visiting later branches first, we reduce
the distance between earlier forward branches and their targets, making it
more likely that the cbn?z optimization, which can only apply to forward
branches, will succeed for those earlier branches.

Differential Revision: http://reviews.llvm.org/D9185

llvm-svn: 235640
2015-04-23 20:31:35 +00:00
Peter Collingbourne cfee5b04bc ARM: When re-creating a branch via InsertBranch, preserve CPSR flags.
In particular, this preserves the kill flag, which allows the Thumb2 cbn?z
optimization to be applied in cases where a branch has been re-created after
the live variables analysis pass, e.g. by the machine block placement pass.

This appears to be low risk; a number of other targets seem to already be
doing something similar, e.g. AArch64, PowerPC.

Differential Revision: http://reviews.llvm.org/D9184

llvm-svn: 235639
2015-04-23 20:31:32 +00:00
Peter Collingbourne 6529523151 Thumb2: When optimizing for size, do not if-convert branches involving comparisons with zero.
This allows the constant island pass to lower these branches to cbn?z
instructions, resulting in a shorter instruction sequence.

Differential Revision: http://reviews.llvm.org/D9183

llvm-svn: 235638
2015-04-23 20:31:30 +00:00
Peter Collingbourne 78f1ecc59c ARM: When spilling extra registers for alignment, prefer low registers on all Thumb targets.
This makes it more likely that we can use the 16-bit push and pop instructions
on Thumb-2, saving around 4 bytes per function.

Differential Revision: http://reviews.llvm.org/D9165

llvm-svn: 235637
2015-04-23 20:31:26 +00:00
Peter Collingbourne 1213918bf4 ARM: Only enforce 4-byte alignment on Thumb-2 functions with constant pools.
This appears to have been introduced back in r76698 as part of an unrelated
change. I can find no official ARM documentation stating that Thumb-2 functions
require 4-byte alignment; in fact, ARM documentation appears to contradict
this (see, e.g., ARM Architecture Reference Manual Thumb-2 Supplement,
section 2.6.1: "Thumb-2 enforces 16-bit alignment on all instructions.").

Also remove code that sets alignment for ARM functions, which is redundant
with code in the MachineFunction constructor, and remove the hidden
-arm-align-constant-islands flag, which has been enabled by default since
r146739 (Dec 2011) and has probably received sufficient testing by now.

Differential Revision: http://reviews.llvm.org/D9138

llvm-svn: 235636
2015-04-23 20:31:22 +00:00
Krzysztof Parzyszek e568967986 [Hexagon] Fix compiler warnings in release build
Patch by Aditya Nandakumar.

llvm-svn: 235635
2015-04-23 20:26:21 +00:00
Jingyue Wu 3286ec1484 [NVPTX] run SeparateConstOffsetFromGEP before SLSR
Summary:
We pick this order because SeparateConstOffsetFromGEP may create more
opportunities for SLSR.

Test Plan:
reassociate-geps-and-slsr.ll
no performance regression on internal benchmarks

Reviewers: meheff

Subscribers: llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D9230

llvm-svn: 235632
2015-04-23 20:00:04 +00:00
Tom Stellard d1f0f0268c R600/SI: Add assembler support for all CI and VI VOP1 instructions
llvm-svn: 235629
2015-04-23 19:33:54 +00:00
Tom Stellard 4b3e755480 R600/SI: v_mov_fed_b32 does not exist on VI
llvm-svn: 235628
2015-04-23 19:33:52 +00:00
Tom Stellard 21cce29041 R600/SI: Use a better error message for unsupported instructions in the assembler
llvm-svn: 235627
2015-04-23 19:33:51 +00:00
Tom Stellard 7130ef49cb R600/SI: Improve AsmParser support for forced e64 encoding
We can now force e64 encoding even when the operands would be legal
for e32 encoding.

llvm-svn: 235626
2015-04-23 19:33:48 +00:00
Hal Finkel 7c5cb066d0 [PowerPC] Enable printing instructions using aliases
TableGen had been nicely generating code to print a number of instructions using
shorter aliases (and PowerPC has plenty of short mnemonics), but we were not
calling it. For some of the aliases we support in the parser, TableGen can't
infer the "inverse" alias relationship, so there is still more to do.

Thus, after some hours of updating test cases...

llvm-svn: 235616
2015-04-23 18:30:38 +00:00
Pirama Arumuga Nainar 745615ca00 [AArch64] Add nvcast patterns for v4f16 and v8f16
Summary:
Constant stores of f16 vectors can create NvCast nodes from various
operand types to v4f16 or v8f16 depending on patterns in the stored
constants.  This patch adds nvcast rules with v4f16 and v8f16 values.

AArchISelLowering::LowerBUILD_VECTOR has the details on which constant
patterns generate the nvcast nodes.

Reviewers: jmolloy, srhines, ab

Subscribers: rengolin, aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D9201

llvm-svn: 235610
2015-04-23 17:32:25 +00:00
Pirama Arumuga Nainar b18815354d [AArch64] Handle vec4, vec8, vec16 *itofp for half
Summary:
Set operation action for SINT_TO_FP and UINT_TO_FP nodes with v4i32,
v8i8, v8i16 inputs to allow promotion of v4f16 results.

Add tests for sitofp and uitofp for vec4, vec8, vec16, and i8, i16, i32,
and i64 vectors.  Only missing tests are for v16i8 and v16i16 as the
shift operations are too complicated to write a proper check sequence.

The conversions from v4i64 to v4f16 do not depend on this patch - v4i64
is split and the conversion gets handled while lowering v2i64.  I am
adding a test here for completeness.

Reviewers: aemerson, rengolin, ab, jmolloy, srhines

Subscribers: rengolin, aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D9166

llvm-svn: 235609
2015-04-23 17:16:27 +00:00
Hans Wennborg 0867b151c9 Re-commit r235560: Switch lowering: extract jump tables and bit tests before building binary tree (PR22262)
Third time's the charm. The previous commit was reverted as a
reverse for-loop in SelectionDAGBuilder::lowerWorkItem did 'I--'
on an iterator at the beginning of a vector, causing asserts
when using debugging iterators. This commit fixes that.

llvm-svn: 235608
2015-04-23 16:45:24 +00:00
Krzysztof Parzyszek 876a19d855 [Hexagon] Shrink-wrap stack frame (Hexagon-specific)
llvm-svn: 235603
2015-04-23 16:05:39 +00:00
Toma Tabacu 7fc89d2141 [mips] [IAS] Move NOP emission after pseudo-instruction expansion. NFC.
As suggested in the review for http://reviews.llvm.org/D8537.

llvm-svn: 235601
2015-04-23 14:48:38 +00:00
Aaron Ballman 0be238cebd Revert r235560; this commit was causing several failed assertions in Debug builds using MSVC's STL. The iterator is being used outside of its valid range.
llvm-svn: 235597
2015-04-23 13:41:59 +00:00
Hans Wennborg 15823d49b6 Switch lowering: extract jump tables and bit tests before building binary tree (PR22262)
This is a re-commit of r235101, which also fixes the problems with the previous patch:

- Switches with only a default case and non-fallthrough were handled incorrectly

- The previous patch tickled a bug in PowerPC Early-Return Creation which is fixed here.

> This is a major rewrite of the SelectionDAG switch lowering. The previous code
> would lower switches as a binary tre, discovering clusters of cases
> suitable for lowering by jump tables or bit tests as it went along. To increase
> the likelihood of finding jump tables, the binary tree pivot was selected to
> maximize case density on both sides of the pivot.
>
> By not selecting the pivot in the middle, the binary trees would not always
> be balanced, leading to performance problems in the generated code.
>
> This patch rewrites the lowering to search for clusters of cases
> suitable for jump tables or bit tests first, and then builds the binary
> tree around those clusters. This way, the binary tree will always be balanced.
>
> This has the added benefit of decoupling the different aspects of the lowering:
> tree building and jump table or bit tests finding are now easier to tweak
> separately.
>
> For example, this will enable us to balance the tree based on profile info
> in the future.
>
> The algorithm for finding jump tables is quadratic, whereas the previous algorithm
> was O(n log n) for common cases, and quadratic only in the worst-case. This
> doesn't seem to be major problem in practice, e.g. compiling a file consisting
> of a 10k-case switch was only 30% slower, and such large switches should be rare
> in practice. Compiling e.g. gcc.c showed no compile-time difference.  If this
> does turn out to be a problem, we could limit the search space of the algorithm.
>
> This commit also disables all optimizations during switch lowering in -O0.
>
> Differential Revision: http://reviews.llvm.org/D8649

llvm-svn: 235560
2015-04-22 23:14:56 +00:00
Krzysztof Parzyszek 952d951418 [Hexagon] Some cleanup of instruction selection code
llvm-svn: 235552
2015-04-22 21:17:00 +00:00
Krzysztof Parzyszek cd97c985c7 [Hexagon] Use A2_tfrsi for constant pool and jump table addresses
llvm-svn: 235535
2015-04-22 18:25:53 +00:00
Pete Cooper 037b700b7f [AArch64] Use MachineRegisterInfo instead of LiveIntervals to calculate liveness. NFC.
The CondOpt pass currently uses LiveIntervals to set the dead flag on a def.  This patch uses MachineRegisterInfo::use_empty instead as that is equivalent to the def being dead.

This removes an instance of LiveIntervals in the pass manager pipeline and saves 3.8% of compile time on llc conpiled for AArch64.

Reviewed by Chad Rosier and Zhaoshi.

llvm-svn: 235532
2015-04-22 18:05:13 +00:00
Krzysztof Parzyszek 05902163b6 [Hexagon] Consider constant-extended offsets to be valid
llvm-svn: 235529
2015-04-22 17:51:26 +00:00
Krzysztof Parzyszek 9ee04e401a Fix Windows build break: use LLVM_FUNCTION_NAME instead of __func__.
llvm-svn: 235525
2015-04-22 17:19:44 +00:00
Matt Arsenault deaef8e24b R600: Fix always inline pass breaking noinline functions
No test since calls are not actually supported yet.

llvm-svn: 235524
2015-04-22 17:10:44 +00:00
Krzysztof Parzyszek 4fa2a9f7fd [Hexagon] Overhaul of stack object allocation
- Use static allocation for aligned stack objects.
- Simplify dynamic stack object allocation.
- Simplify elimination of frame-indices.

llvm-svn: 235521
2015-04-22 16:43:53 +00:00
Sanjay Patel cab567873f [x86] Add store-folded memop patterns for vcvtps2ph
Differential Revision: http://reviews.llvm.org/D7296

llvm-svn: 235517
2015-04-22 16:11:19 +00:00
Krzysztof Parzyszek 6bbcb31fda [Hexagon] Treat CFI as solo instructions
llvm-svn: 235516
2015-04-22 15:47:35 +00:00
Krzysztof Parzyszek badf3a6356 [Hexagon] Implement HexagonInstPrinter::printRegName
llvm-svn: 235514
2015-04-22 15:38:17 +00:00
Andrea Di Biagio 6cd2f42fac [X86][AVX] Fix failure due to a missing ISel pattern to select VBROADCAST nodes (PR23259).
This fixes a regression introduced at revision 218263.

On AVX, if we optimize for size, a splat build_vector of a load
is lowered into a VBROADCAST node. This is done even if the value type of the
splat build_vector node is v2i64.

Since AVX doesn't support v2f64/v2i64 broadcasts, revision 218263 added two
extra tablegen patterns to allow selecting a VMOVDDUPrm from an X86VBroadcast
where the scalar element comes from a loadi64/loadf64.

However, revision 218263 forgot to add an extra fallback pattern for the case
where we have a X86VBroadcast of a loadi64 with multiple uses.

This patch adds the missing tablegen pattern in X86InstrSSE.td.
This patch also adds an extra test to 'splat-for-size.ll' to verify that ISel
doesn't crash with a 'fatal error in the backend' due to a missing AVX pattern
to select v2i64 X86ISD::BROADCAST nodes.

llvm-svn: 235509
2015-04-22 14:53:39 +00:00
Zoran Jovanovic b59a541926 [mips][microMIPSr6] Implement mips32 to microMIPSr6 mapping support
Differential Revision: http://reviews.llvm.org/D8661

llvm-svn: 235505
2015-04-22 13:27:34 +00:00
Vasileios Kalintiris e7508c9fc7 Revert "[mips][FastISel] Implement shift ops for Mips fast-isel."
This reverts commit r235194. It was causing a failure in FastISel buildbots
due to sign-extension issues.

llvm-svn: 235495
2015-04-22 10:08:46 +00:00
James Molloy cd2334e86e [AArch64] Disable complex GEP optimization by default.
Enough concerns were raised that this optimization is pessimising some code patterns.

The obvious fix, to add a Reassociate run afterwards, causes even more pessimisation in some cases due to fewer complex addressing modes being matched. As there isn't a trivial fix for this, backing this out by default until someone gets a chance to fix the addressing mode matcher.

llvm-svn: 235491
2015-04-22 09:11:38 +00:00
Lang Hames 65613a634a [patchpoint] Add support for symbolic patchpoint targets to SelectionDAG and the
X86 backend.

The code generated for symbolic targets is identical to the code generated for
constant targets, except that a relocation is emitted to fix up the actual
target address at link-time. This allows IR and object files containing
patchpoints to be cached across JIT-invocations where the target address may
change.

llvm-svn: 235483
2015-04-22 06:02:31 +00:00
Sanjay Patel fe1365ac50 [x86] allow 64-bit extracted vector element integer stores on a 32-bit system
With SSE2, we can generate a 'movq' or other 64-bit store op on a 32-bit system
even though 64-bit integers are not legal types.

So instead of producing this:

  pshufd	$229, %xmm0, %xmm1      ## xmm1 = xmm0[1,1,2,3]
  movd	%xmm0, (%eax)
  movd	%xmm1, 4(%eax)

We can do:

  movq %xmm0, (%eax)

This is a fix for the problem noted in D7296.

Differential Revision: http://reviews.llvm.org/D9134

llvm-svn: 235460
2015-04-22 00:24:30 +00:00
Krzysztof Parzyszek 499bc5faa1 [Hexagon] Patterns for frame index with offset for isel
llvm-svn: 235418
2015-04-21 21:28:03 +00:00
Jingyue Wu 66a161f05e [NVPTX] do not run DCE after SLSR and SeparateConstOffsetFromGEP
Summary:
With D9096 and D9101, there's no need to run DCE after SLSR and
SeparateConstOffsetFromGEP.

Test Plan: no regression

Reviewers: jholewinski, meheff

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D9172

llvm-svn: 235415
2015-04-21 20:47:15 +00:00
Matthias Braun 9e9e8b3230 X86: Match for X86ISD nodes in LowerBUILD_VECTOR instead of BUILD_VECTORCombine
There doesn't seem to be a reason to perform this target ISD node matching
in an DAGCombine, moving it to lowering fixes PR23296.

Differential Revision: http://reviews.llvm.org/D9137

llvm-svn: 235394
2015-04-21 17:21:36 +00:00
Elena Demikhovsky 0e6d6d54ce AVX-512: Added VPMOVx2M instructions for SKX,
fixed encoding of VPMOVM2x.

llvm-svn: 235385
2015-04-21 14:38:31 +00:00
Elena Demikhovsky 431b81e41f AVX-512: Added VPTESTM and VPTESTNM instructions for SKX
llvm-svn: 235383
2015-04-21 13:13:46 +00:00
Toma Tabacu 11e14a9467 [mips] [IAS] Implement the .asciiz directive.
Summary:
This directive is exactly the same as .asciz, except it's only used by MIPS.
It is used to store null terminated strings in object files.

Reviewers: rafael, dsanders, echristo

Reviewed By: dsanders, echristo

Subscribers: echristo, llvm-commits

Differential Revision: http://reviews.llvm.org/D7530

llvm-svn: 235382
2015-04-21 11:50:52 +00:00
Jozef Kolek 8e086cedfa [mips][microMIPSr6] Implement CACHE and PREF instructions
Implement CACHE and PREF instructions using mapping.

Differential Revision: http://reviews.llvm.org/D8893

llvm-svn: 235379
2015-04-21 11:17:25 +00:00
Vasileios Kalintiris 41b0100dea [mips] Cleanup old floating-point flag conditions definitions. NFC.
Reviewers: dsanders

Differential Revision: http://reviews.llvm.org/D7947

llvm-svn: 235377
2015-04-21 10:53:57 +00:00
Vasileios Kalintiris 32177d6bec [mips] Optimize code generation for 64-bit variable shift instructions.
Summary:
The 64-bit version of the variable shift instructions uses the
shift_rotate_reg class which uses a GPR32Opnd to specify the variable
shift amount. With this patch we avoid the generation of a redundant
SLL instruction for the variable shift instructions in 64-bit targets.

Reviewers: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7413

llvm-svn: 235376
2015-04-21 10:49:03 +00:00
Elena Demikhovsky 50b88ddb87 AVX-512: Added logical and arithmetic instructions for SKX
by Asaf Badouh (asaf.badouh@intel.com)

llvm-svn: 235375
2015-04-21 10:27:40 +00:00
Simon Pilgrim 398ce22b86 [X86][SSE] Provide execution domains for scalar floating point operations
This is an updated version of Chandler's patch D7402 that got accepted but never committed, and has bit-rotted a bit since.

I've updated the execution domain declarations to match the approach of the packed templates and also added some extra scalar unary tests.

Differential Revision: http://reviews.llvm.org/D9095

llvm-svn: 235372
2015-04-21 08:40:22 +00:00
Matthias Braun b6b5aaad98 X86: Do not select X86 custom vector nodes if operand types don't match
X86ISD::ADDSUB, X86ISD::(F)HADD, X86ISD::(F)HSUB should not be selected
if the operand types do not match the result type because vector type
legalization cannot deal with this for custom nodes.

Testcase X86ISD::ADDSUB is attached. I could not create a testcase for
the FHADD/FHSUB cases because of: https://llvm.org/bugs/show_bug.cgi?id=23296

Differential Revision: http://reviews.llvm.org/D9120

llvm-svn: 235367
2015-04-21 01:13:41 +00:00
Pirama Arumuga Nainar 34056dea1b [MIPS] OperationAction for FP_TO_FP16, FP16_TO_FP
Summary:
Set operation action for FP16 conversion opcodes, so the Op legalizer
can choose the gnu_* libcalls for Mips.

Set LoadExtAction and TruncStoreAction for f16 scalars and vectors to
prevent (fpext (load )) and (store (fptrunc)) from getting combined into
unsupported operations.

Added test cases to test that these operations are handled correctly
for f16 scalars and vectors.  This patch depends on
http://reviews.llvm.org/D8755.

Reviewers: srhines

Subscribers: llvm-commits, ab

Differential Revision: http://reviews.llvm.org/D8804

llvm-svn: 235341
2015-04-20 20:15:36 +00:00
Jozef Kolek 207d248eba [mips][microMIPSr6] Implement BITSWAP instruction
Implement BITSWAP instruction using mapping.

Differential Revision: http://reviews.llvm.org/D8857

llvm-svn: 235321
2015-04-20 18:14:59 +00:00
Vladimir Sukharev bad1d1dc02 [AArch64] LORID_EL1 register must be treated as read-only
Patch by: John Brawn

Reviewers: jmolloy

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9105

llvm-svn: 235314
2015-04-20 16:54:37 +00:00
Bill Schmidt 6779075c44 [PowerPC] Flow oversized lines for r235309
llvm-svn: 235310
2015-04-20 15:58:46 +00:00
Bill Schmidt 1962f709c7 [PowerPC] Add future work for vector insert/extract to README_ALTIVEC.txt
llvm-svn: 235309
2015-04-20 15:54:26 +00:00
Jozef Kolek 676d60125c [mips][microMIPSr6] Implement disassembler support
Implement disassembler support for microMIPS32r6.

Differential Revision: http://reviews.llvm.org/D8490

llvm-svn: 235307
2015-04-20 14:40:38 +00:00
Jozef Kolek 5de4a6c0af [mips][microMIPSr6] Implement BALC and BC instructions
This patch implements BALC and BC instructions using mapping.

Differential Revision: http://reviews.llvm.org/D8388

llvm-svn: 235302
2015-04-20 13:04:14 +00:00
Jozef Kolek 6ca13eaf82 [mips][microMIPSr6] Implement initial mapping support
Differential Revision: http://reviews.llvm.org/D8387

llvm-svn: 235298
2015-04-20 12:42:08 +00:00
Jozef Kolek c22555d977 [mips][microMIPSr6] Implement initial subtarget support
Differential Revision: http://reviews.llvm.org/D8386

llvm-svn: 235296
2015-04-20 12:23:06 +00:00
Andrea Di Biagio 98c367093d [X86][FastIsel] Fix assertion failure when selecting int-to-double conversion (PR23273).
This fixes a regression introduced at revision 231243.
The target-independent selection algorithm in FastISel knows how to select
a SINT_TO_FP if the target is SSE but not AVX. That is because on X86, the
tablegen'd 'fastEmit' functions know how to select CVTSI2SSrr and CVTSI2SDrr.

Method X86FastISel::X86SelectSIToFP was therefore working under the
wrong assumption that the target was AVX. That assumption was incorrect since
we can have a target that is neither AVX nor SSE.

So, rather than asserting for the presence of AVX, we should have had an
early exit from 'X86SelectSIToFP' if the target was not AVX.
This patch fixes the issue replacing the invalid assertion with an early exit.

Thanks to Dimitry Andric for reporting this problem and for providing a small
reproducible testcase. Added test pr23273.ll.

llvm-svn: 235295
2015-04-20 11:56:59 +00:00
Simon Pilgrim 749953eebb [X86][SSE] Fix for getScalarValueForVectorElement to detect scalar sources requiring truncation.
The fix ensures that scalar sources inserted into a vector are the correct bit size.

Integer scalar sources from BUILD_VECTOR and SCALAR_TO_VECTOR nodes may require truncation that this function doesn't currently support.

llvm-svn: 235281
2015-04-19 22:16:49 +00:00
Craig Topper 43d413b698 Remove unnecessary include and probably a layering violation.
llvm-svn: 235262
2015-04-19 00:57:33 +00:00
Ahmed Bougacha e14a4d487e [AArch64] Don't force MVT::Untyped when selecting LD1LANEpost.
The result is either an Untyped reg sequence, on ldN with N > 1, or
just the type of the input vector, on ld1.  Don't force Untyped.
Instead, just use the type of the reg sequence.

This mirrors the behavior of createTuple, which feeds the LD1*_POST.

The narrow code path wasn't actually covered by tests, because V64
insert_vector_elt are widened to V128 before the LD1LANEpost combine
has the chance to run, usually.

The only case where it does run on V64 vectors is if the vector ops
legalizer ran.  So, tickle the code with a ctpop.

Fixes PR23265.

llvm-svn: 235243
2015-04-17 23:43:33 +00:00
Ahmed Bougacha 2448ef5f33 [AArch64] Avoid vector->load dependency cycles when creating LD1*post.
They would break the SelectionDAG.
Note that the opposite load->vector dependency is already obvious in:
  (LD1*post vec, ..)

llvm-svn: 235224
2015-04-17 21:02:30 +00:00
Vasileios Kalintiris 816ea84e7a [mips][FastISel] Implement FastMaterializeAlloca in Mips fast-isel.
Summary: Implement the method FastMaterializeAlloca in Mips fast-isel

Based on a patch by Reed Kotler.

Test Plan:
Passes test-suite at O0/O2 for mips32 r1/r2
fastalloca.ll

Reviewers: dsanders, rkotler

Subscribers: rfuhler, llvm-commits

Differential Revision: http://reviews.llvm.org/D6742

llvm-svn: 235213
2015-04-17 17:29:58 +00:00
Sanjay Patel 2161c49a4e [X86, AVX] add an exedepfix entry for vmovq == vmovlps == vmovlpd
This is the AVX extension of r235014:
http://llvm.org/viewvc/llvm-project?view=revision&revision=235014

Review:
http://reviews.llvm.org/D8691

llvm-svn: 235210
2015-04-17 17:02:37 +00:00
Vasileios Kalintiris a4035e6284 [mips][FastISel] Implement shift ops for Mips fast-isel.
Summary:
Add shift operators implementation to fast-isel for Mips.  These are shift ops
for non legal forms, i.e. i8 and i16.

Based on a patch by Reed Kotler.

Test Plan:

Reviewers: dsanders

Subscribers: echristo, rfuhler, llvm-commits

Differential Revision: http://reviews.llvm.org/D6726

llvm-svn: 235194
2015-04-17 14:29:21 +00:00
Rafael Espindola 7f4e07befc Move AliasedSymbol to MachObjectWriter.
It was only used by MachO.
Part of pr19627.

llvm-svn: 235185
2015-04-17 12:28:43 +00:00
Vasileios Kalintiris bb60cfb5c4 [mips] Teach the delay slot filler to remove needless KILL instructions.
Summary:
Previously, the presence of KILL instructions would block valid candidates
from filling a specific delay slot. With the elimination of the KILL
instructions, in the appropriate range, we are able to fill more slots and
keep the information from future def/use analysis consistent.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: hfinkel, llvm-commits

Differential Revision: http://reviews.llvm.org/D7724

llvm-svn: 235183
2015-04-17 12:01:02 +00:00
Benjamin Kramer 97fbdd5a39 [mc] Clean up emission of byte sequences
No functional change intended.

llvm-svn: 235178
2015-04-17 11:12:43 +00:00
Daniel Sanders 81eb66c992 [mips] Move ABI-dependent register selections to MipsABIInfo. NFC.
Summary:
For example, a common idiom was 'isN64 ? Mips::SP_64 : Mips::SP'. This has
been moved to MipsABIInfo and replaced with 'ABI.GetStackPtr()'.

There are others that should also be moved. This patch sticks to the ones that
are obviously non-functional. The others have minor mistakes that need fixing
at the same time, mostly involving checks for 64-bit GPR's instead of checks
for 64-bit pointers.

Reviewers: tomatabacu

Reviewed By: tomatabacu

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8972

llvm-svn: 235173
2015-04-17 09:50:21 +00:00
Ahmed Bougacha 941420d9ea [AArch64] Don't assert on f16 in DUP PerfectShuffle generator.
Found by code inspection, but breaking i16 at least breaks other tests.
They aren't checking this in particular though, so also add some
explicit tests for the already working types.

llvm-svn: 235148
2015-04-16 23:57:07 +00:00
Pete Cooper 19d704d13c Disable AArch64 fast-isel on big-endian call vector returns.
A big-endian vector return needs a byte-swap which we aren't doing right now.

For now just bail on these cases to get correctness back.

llvm-svn: 235133
2015-04-16 21:19:36 +00:00
Vladimir Sukharev 6334cf3d69 [AArch64] Add v8.1a "Virtualization Host Extensions"
Reviewers: t.p.northover

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8500

Patch by: Tom Coxon

llvm-svn: 235107
2015-04-16 15:38:58 +00:00
Vladimir Sukharev d49cb8fdd7 [AArch64] Add v8.1a "Limited Ordering Regions" extension
Reviewers: 	t.p.northover

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8499

Patch by: Tom Coxon

llvm-svn: 235105
2015-04-16 15:30:43 +00:00
Vladimir Sukharev 251ce0c2db [AArch64] Add v8.1a "Privileged Access Never" extension
Reviewers: jmolloy

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8498

llvm-svn: 235104
2015-04-16 15:20:51 +00:00
Vladimir Sukharev a11db3eb88 [AArch64] Handle Cyclone-specific register in common way
Reviewers: jmolloy

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8584

Patch by: Tom Coxon

llvm-svn: 235102
2015-04-16 15:01:20 +00:00
Vladimir Sukharev 950b606a2b [AArch64] Follow-up to: Refactor AArch64NamedImmMapper to become dependent on subtarget features
Fixed compilation with clang on some buildbots with "-Werror -Wmissing-field-initializers"

Related to: http://reviews.llvm.org/rL235089

llvm-svn: 235099
2015-04-16 14:36:13 +00:00
Toma Tabacu 2cc44f50a5 [mips] [IAS] Preserve microMIPS label marking for objects when assigning.
Summary: Previously, this was only happening for functions, but because of .insn, objects can also be marked now.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8007

llvm-svn: 235095
2015-04-16 13:37:32 +00:00
Benjamin Kramer 727b505161 [Mips] Use unique_ptr to manage ownership.
Required some tweaking of ValueMap to accommodate a move-only value
type. No functional change intended.

llvm-svn: 235091
2015-04-16 12:43:33 +00:00
Benjamin Kramer 90a84a33f6 Make it obvious that we're iterating over a range of pointers.
Found by -Wrange-loop-analysis.

llvm-svn: 235090
2015-04-16 12:43:07 +00:00
Vladimir Sukharev a98f6897a2 [AArch64] Refactor AArch64NamedImmMapper to become dependent on subtarget features.
In order to introduce v8.1a-specific entities, Mappers should be aware of SubtargetFeatures available.

This patch introduces refactoring, that will then allow to easily introduce:

- v8.1-specific "pan" PState for PStateMapper (PAN extension)

- v8.1-specific sysregs for SysRegMapper (LOR,VHE extensions)

Reviewers: jmolloy

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8496

Patch by Tom Coxon

llvm-svn: 235089
2015-04-16 12:15:27 +00:00
James Molloy f8aa57aa3b [AArch64] Fix invalid use of references to BuildMI.
This was found in GCC PR65773 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65773).

We shouldn't be taking a reference to the temporary that BuildMI returns, we must copy it.

llvm-svn: 235088
2015-04-16 11:37:40 +00:00
Vladimir Sukharev 0e0f8d2c1f [ARM] Add v8.1a "Privileged Access Never" extension
Reviewers: jmolloy

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8504

llvm-svn: 235087
2015-04-16 11:34:25 +00:00
Toma Tabacu 9ca5096f59 [mips] [IAS] Add support for the .insn directive.
Summary:
This assembler directive marks the current label as an instruction label in microMIPS and MIPS16.

This initial implementation works only for microMIPS.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8006

llvm-svn: 235084
2015-04-16 09:53:47 +00:00
Duncan P. N. Exon Smith b273d06b63 DebugInfo: Gut DIScope, DIEnumerator and DISubrange
The only class the still has API left is `DIDescriptor` itself.

llvm-svn: 235067
2015-04-16 01:37:00 +00:00
Duncan P. N. Exon Smith 35ef22cf53 DebugInfo: Gut DICompileUnit and DIFile
Continuing gutting `DIDescriptor` subclasses; this edition,
`DICompileUnit` and `DIFile`.  In the name of PR23080.

llvm-svn: 235055
2015-04-15 23:19:27 +00:00
Charlie Turner 6f13d0ca84 Fix BXJ is undefined in AArch32.
BXJ was incorrectly said to be unsupported in ARMv8-A. It is not
supported in the A64 instruction set, but it is supported in the T32
and A32 instruction sets, because it's listed as an instruction in the
ARM ARM section F7.1.28.

Using SP as an operand to BXJ changed from UNPREDICTABLE to
PREDICTABLE in v8-A. This patch reflects that update as well.

This was found by MCHammer.

llvm-svn: 235024
2015-04-15 17:28:23 +00:00
Sanjay Patel c03d93baa0 [X86] add an exedepfix entry for movq == movlps == movlpd
This is a 1-line patch (with a TODO for AVX because that will affect
even more regression tests) that lets us substitute the appropriate
64-bit store for the float/double/int domains.

It's not clear to me exactly what the difference is between the 0xD6 (MOVPQI2QImr) and 
0x7E (MOVSDto64mr) opcodes, but this is apparently the right choice.

Differential Revision: http://reviews.llvm.org/D8691

llvm-svn: 235014
2015-04-15 15:47:51 +00:00
Sanjay Patel 7024b8121a [x86] Implement combineRepeatedFPDivisors
Set the transform bar at 2 divisions because the fastest current
x86 FP divider circuit is in SandyBridge / Haswell at 10 cycle
latency (best case) relative to a 5 cycle multiplier. 
So that's the worst case for this transform (no latency win), 
but multiplies are obviously pipelined while divisions are not,
so there's still a big throughput win which we would expect to
show up in typical FP code.

These are the sequences I'm comparing:

  divss   %xmm2, %xmm0
  mulss   %xmm1, %xmm0
  divss   %xmm2, %xmm0

Becomes:

  movss   LCPI0_0(%rip), %xmm3    ## xmm3 = mem[0],zero,zero,zero
  divss   %xmm2, %xmm3
  mulss   %xmm3, %xmm0
  mulss   %xmm1, %xmm0
  mulss   %xmm3, %xmm0

[Ignore for the moment that we don't optimize the chain of 3 multiplies
into 2 independent fmuls followed by 1 dependent fmul...this is the DAG
version of: https://llvm.org/bugs/show_bug.cgi?id=21768 ...if we fix that,
then the transform becomes even more profitable on all targets.]

Differential Revision: http://reviews.llvm.org/D8941

llvm-svn: 235012
2015-04-15 15:22:55 +00:00
Daniel Sanders 93ea6ab136 [msp430] Only support the 'm' inline assembly memory constraint. NFC.
Summary:
MSP430 doesn't seem to have any additional constraints. Therefore remove
the target hook.

No functional change intended.

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8208

llvm-svn: 235003
2015-04-15 12:51:28 +00:00
Toma Tabacu 89a712b0be [mips] [IAS] Refactor the function which checks for the availability of AT. NFC.
Summary:
Refactor MipsAsmParser::getATReg to return an internal register number instead of a register index.
Also change all the int's to unsigned, seeing as the current AT register index is stored as an unsigned in MipsAssemblerOptions.



Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8478

llvm-svn: 234996
2015-04-15 10:48:56 +00:00
Alexei Starovoitov 1a8102e020 [bpf] fix build
fix build due to refactoring in DIL/MDL and raw_pwrite_stream

llvm-svn: 234971
2015-04-15 02:48:57 +00:00
Richard Trieu 6b1aa5f5e1 Change range-based for-loops to be -Wrange-loop-analysis clean.
No functionality change.

llvm-svn: 234963
2015-04-15 01:21:15 +00:00
Rafael Espindola 5560a4cfbd Use raw_pwrite_stream in the object writer/streamer.
The ELF object writer will take advantage of that in the next commit.

llvm-svn: 234950
2015-04-14 22:14:34 +00:00
Ed Maste 8ed40ce56d Correct 'teh' and other typos / repeated words.
Patch by Eitan Adler.

Differential Revision:	http://reviews.llvm.org/D8514

llvm-svn: 234939
2015-04-14 20:52:58 +00:00
Alexander Kornienko fb37cfa346 Refactor: Simplify boolean expressions in ARM target
Simplify boolean expressions using `true` and `false` with `clang-tidy`

http://reviews.llvm.org/D8524

Patch by Richard Thomson!

llvm-svn: 234901
2015-04-14 15:32:58 +00:00
Bradley Smith b913653b91 [AArch64] Allow non-standard INS/DUP encodings
The ARMv8 ARMARM states that for these instructions in A64 state:

  "Unspecified bits in "imm5" are ignored but should be set to zero by an assembler.", (imm4 for INS).

Make the disassembler accept any encoding with these ignored bits set to 1.

llvm-svn: 234896
2015-04-14 15:07:26 +00:00
Tom Stellard d4a1950500 R600/SI: Fix verifier error caused by SIAnnotateControlFlow
This pass will always try to insert llvm.SI.ifbreak intrinsics
in the same block that its conditional value is computed in.  This is
a problem when conditions for breaks or continue are computed outside
of the loop, because the llvm.SI.ifbreak intrinsic ends up being inserted
outside of the loop.

This patch fixes this problem by inserting the llvm.SI.ifbreak
intrinsics in the loop header when the condition is computed outside
the loop.

llvm-svn: 234891
2015-04-14 14:36:45 +00:00
Petar Jovanovic 0380d0b88f Re-enable target-specific relocation table sorting and use it for Mips
Some targets (ie. Mips) have additional rules for ordering the relocation
table entries. Allow them to override generic sortRelocs(), which sorts
entries by Offset.
Then override this function for Mips, to emit HI16 and GOT16 relocations
against the local symbol in pair with the corresponding LO16 relocation.

Patch by Vladimir Stefanovic.

Differential Revision: http://reviews.llvm.org/D7414

llvm-svn: 234883
2015-04-14 13:23:34 +00:00
Duncan P. N. Exon Smith 537b4a8159 DebugInfo: Gut DISubprogram and DILexicalBlock*
Gut the `DIDescriptor` wrappers around `MDLocalScope` subclasses.  Note
that `DILexicalBlock` wraps `MDLexicalBlockBase`, not `MDLexicalBlock`.

llvm-svn: 234850
2015-04-14 03:40:37 +00:00
Duncan P. N. Exon Smith 7348ddaa74 DebugInfo: Gut DIVariable and DIGlobalVariable
Gut all the non-pointer API from the variable wrappers, except an
implicit conversion from `DIGlobalVariable` to `DIDescriptor`.  Note
that if you're updating out-of-tree code, `DIVariable` wraps
`MDLocalVariable` (`MDVariable` is a common base class shared with
`MDGlobalVariable`).

llvm-svn: 234840
2015-04-14 02:22:36 +00:00
Krzysztof Parzyszek 2c4487de6d Expand ADDO/SUBO on Hexagon
llvm-svn: 234795
2015-04-13 20:37:01 +00:00
Jan Vesely ffcd968647 Revert revisions r234755, r234759, r234760
Revert "Remove default in fully-covered switch (to fix Clang -Werror -Wcovered-switch-default)"
Revert "R600: Add carry and borrow instructions. Use them to implement UADDO/USUBO"
Revert "LegalizeDAG: Try to use Overflow operations when expanding ADD/SUB"

Using overflow operations fails CodeGen/Generic/2011-07-07-ScheduleDAGCrash.ll
on hexagon, nvptx, and r600. Revert while I investigate.

llvm-svn: 234768
2015-04-13 17:47:15 +00:00
Krzysztof Parzyszek a46c36b8f4 Allow memory intrinsics to be tail calls
llvm-svn: 234764
2015-04-13 17:16:45 +00:00
Jan Vesely d579048464 R600: Add carry and borrow instructions. Use them to implement UADDO/USUBO
v2: tighten the sub64 tests
v3: rename to CARRY/BORROW
v4: fixup test cmdline
    add known bits computation
    use sign extend instead of sub 0,x
    better add test
v5: remove redundant break
    move lowering to separate functions
    fix comments

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewers: arsenm
llvm-svn: 234759
2015-04-13 16:26:00 +00:00
Jan Vesely 452b036697 R600: Make FMIN/MAXNUM legal on all asics
v2: Add tests

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
reviewer: arsenm
llvm-svn: 234716
2015-04-12 23:45:05 +00:00
Jan Vesely 811ef52db7 R600: remove manual BFE optimization
Fixed since r233079

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
reviewer: arsenm
llvm-svn: 234715
2015-04-12 23:45:01 +00:00
Hal Finkel 5551f2528c [PowerPC] Really iterate over all loops in PPCLoopDataPrefetch/PPCLoopPreIncPrep
When I fixed these a couple of days ago to iterate over all loops, not just
depth == 1 loops, I inadvertently made it such that we'd only look at the first
top-level loop. Make sure that we really look at all of them.

llvm-svn: 234705
2015-04-12 17:18:56 +00:00
Hal Finkel 58f5f9c393 [PowerPC] Disable part-word atomics on the P7
As it turns out, even though these are part of ISA 2.06, the P7 does not
support them (or, at least, not any P7s we're tested so far).

llvm-svn: 234686
2015-04-11 13:40:36 +00:00
Nemanja Ivanovic c38b5311cb Add direct moves to/from VSR and exploit them for FP/INT conversions
This patch corresponds to review:
http://reviews.llvm.org/D8928

It adds direct move instructions to/from VSX registers to GPR's. These are
exploited for FP <-> INT conversions.

llvm-svn: 234682
2015-04-11 10:40:42 +00:00
Alexander Kornienko f817c1cb9a Use 'override/final' instead of 'virtual' for overridden methods
The patch is generated using clang-tidy misc-use-override check.

This command was used:

  tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py \
    -checks='-*,misc-use-override' -header-filter='llvm|clang' \
    -j=32 -fix -format

http://reviews.llvm.org/D8925

llvm-svn: 234679
2015-04-11 02:11:45 +00:00
Hal Finkel ffb460fdf0 [PowerPC] Fix PPCLoopPreIncPrep for depth > 1 loops
This pass had the same problem as the data-prefetching pass: it was only
checking for depth == 1 loops in practice. Fix that, add some debugging
statements, and make sure that, when we grab an AddRec, it is for the loop we
expect.

llvm-svn: 234670
2015-04-11 00:33:08 +00:00
Ahmed Bougacha b96444efd1 [CodeGen] Split -enable-global-merge into ARM and AArch64 options.
Currently, there's a single flag, checked by the pass itself.
It can't force-enable the pass (and is on by default), because it
might not even have been created, as that's the targets decision.
Instead, have separate explicit flags, so that the decision is
consistently made in the target.

Keep the flag as a last-resort "force-disable GlobalMerge" for now,
for backwards compatibility.

llvm-svn: 234666
2015-04-11 00:06:36 +00:00
Quentin Colombet fd7475b5e8 [AArch64] Strengthen the code for the prologue insertion.
The spilled registers are pristine and thus, correctly handled by
the register scavenger and so on, but the liveness information is
strictly speaking wrong at this point.
Fix that.

llvm-svn: 234664
2015-04-10 23:14:34 +00:00
Hal Finkel a9fceb803d [PowerPC] Prefetching should also consider depth > 1 loops
Iterating over loops from the LoopInfo instance only provides top-level loops.
We need to search the whole tree of loops to find the inner ones.

llvm-svn: 234603
2015-04-10 15:05:02 +00:00
Toma Tabacu ae47f93b74 [mips] [IAS] Improve comments in MipsAsmParser::expandLoadImm. NFC.
llvm-svn: 234595
2015-04-10 13:28:16 +00:00
Chad Rosier 518659d9b4 [AArch64] Changes some SchedAlias to WriteRes for Cortex-A57.
Using SchedAliases is convenient and works well for latency and resource
lookup for instructions.  However, this creates an entry in
AArch64WriteLatencyTable with a WriteResourceID of 0, breaking any
SchedReadAdvance since the lookup will fail.

http://reviews.llvm.org/D8043
Patch by Dave Estes <cestes@codeaurora.org>!

llvm-svn: 234594
2015-04-10 13:19:27 +00:00
Chad Rosier a82c876045 [AArch64] Adjusts Cortex-A57 machine model to handle zero shift.
http://reviews.llvm.org/D8043
Patch by Dave Estes <cestes@codeaurora.org>!

llvm-svn: 234593
2015-04-10 13:19:21 +00:00
Benjamin Kramer 619c4e57ba Reduce dyn_cast<> to isa<> or cast<> where possible.
No functional change intended.

llvm-svn: 234586
2015-04-10 11:24:51 +00:00
Jingyue Wu 5da831cc31 Divergence analysis for GPU programs
Summary:
Some optimizations such as jump threading and loop unswitching can negatively
affect performance when applied to divergent branches. The divergence analysis
added in this patch conservatively estimates which branches in a GPU program
can diverge. This information can then help LLVM to run certain optimizations
selectively.

Test Plan: test/Analysis/DivergenceAnalysis/NVPTX/diverge.ll

Reviewers: resistor, hfinkel, eliben, meheff, jholewinski

Subscribers: broune, bjarke.roune, madhur13490, tstellarAMD, dberlin, echristo, jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D8576

llvm-svn: 234567
2015-04-10 05:03:50 +00:00
Hal Finkel 93138503ae [PowerPC] Don't crash on PPC32 i64 fp_to_uint on modern cores
When we have an instruction for this (and, thus, don't generate a runtime
call), we need to custom type legalize this (in a trivial way, just as we do
for fp_to_sint).

Fixes PR23173.

llvm-svn: 234561
2015-04-10 03:39:00 +00:00
Ahmed Bougacha 1ffe7c7d36 [AArch64] Promote f16 operations to f32.
For the most common ones (such as fadd), we already did the promotion.
Do the same thing for all the others.

Currently, we'll just crash/assert on all these operations, as
there's no hardware or libcall support whatsoever.

f16 (half) is specified as an interchange - not arithmetic - format,
and is expected to be promoted to single-precision for arithmetic
operations.

While there, teach the legalizer about promoting some of the (mostly
floating-point) operations that we never needed before.

Differential Revision: http://reviews.llvm.org/D8648
See related discussion on the thread for: http://reviews.llvm.org/D8755

llvm-svn: 234550
2015-04-10 00:08:48 +00:00
Nemanja Ivanovic c09047916a Add LLVM support for remaining integer divide and permute instructions from ISA 2.06
This is the patch corresponding to review:
http://reviews.llvm.org/D8406

It adds some missing instructions from ISA 2.06 to the PPC back end.

llvm-svn: 234546
2015-04-09 23:54:37 +00:00
Rafael Espindola 5682ce2ceb Simplify use of formatted_raw_ostream.
formatted_raw_ostream is a wrapper over another stream to add column and line
number tracking.

It is used only for asm printing.

This patch moves the its creation down to where we know we are printing
assembly. This has the following advantages:

* Simpler lifetime management: std::unique_ptr
* We don't compute column and line number of object files :-)

llvm-svn: 234535
2015-04-09 21:06:08 +00:00
Juergen Ributzka bd0c7eb4dc [AArch64][FastISel] Fix integer extend optimization.
The integer extend optimization tries to fold the extend into the load
instruction. This requires us to identify if the extend has already been
emitted or not and act accordingly on it.

The check that was originally performed for this was not sufficient. Besides
checking the ValueMap for a mapped register we also need to check if the
virtual register has already an associated machine instruction that defines it.

This fixes rdar://problem/20470788.

llvm-svn: 234529
2015-04-09 20:00:46 +00:00
Eric Christopher fbe80f5f63 Remove duplicated code and consolidate initializers.
llvm-svn: 234525
2015-04-09 19:20:37 +00:00
Rafael Espindola 49286e9f4a clang-format bits of code to make a followup patch easy to read.
llvm-svn: 234519
2015-04-09 18:32:58 +00:00
Rafael Espindola f546d0f9c5 Use a raw_svector_ostream instead of a raw_string_ostream.
It saves a bit of copying.

llvm-svn: 234507
2015-04-09 17:16:25 +00:00
Rafael Espindola df7305a438 Don't repeat name in comment. NFC.
llvm-svn: 234506
2015-04-09 17:10:57 +00:00
Rafael Espindola ee0dd4d289 This reverts commit r234460 and r234461.
Revert "Add classof implementations to the raw_ostream classes."
Revert "Use the cast machinery to remove dummy uses of formatted_raw_ostream."

The underlying issue can be fixed without classof.

llvm-svn: 234495
2015-04-09 15:54:59 +00:00
Javed Absar 5c5e3c5e36 [ARM] support for Cortex-R4/R4F
Currently, llvm (backend) doesn't know cortex-r4, even though it is the
default target for armv7r. Using "--target=armv7r-arm-none-eabi" provokes
'cortex-r4' is not a recognized processor for this target' by llvm.
This patch adds support for cortex-r4 and, very closely related, r4f.

llvm-svn: 234486
2015-04-09 14:07:28 +00:00
Toma Tabacu be218927f8 [mips] Refactor saved-registers bitmask creation in MipsAsmPrinter::printSavedRegsBitmask. NFC.
Summary:
Make the code more readable by fusing the for-loops together and explicitly checking for each register class.

Also, this version is more straightforward because it doesn't assume that FPU registers always come before CPU registers in the CalleeSavedInfo vector.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8033

llvm-svn: 234475
2015-04-09 10:54:16 +00:00
Kristof Beyls 17cb8982f4 [AArch64] Add support for dynamic stack alignment
Differential Revision: http://reviews.llvm.org/D8876

llvm-svn: 234471
2015-04-09 08:49:47 +00:00
Lang Hames 522bf13b83 [AArch64] Remove redundant -march option. Also fix a think-o from r234462.
llvm-svn: 234467
2015-04-09 05:34:57 +00:00
Lang Hames 903338511b [AArch64] Teach AArch64TargetLowering::getOptimalMemOpType to consider alignment
restrictions when choosing a type for small-memcpy inlining in
SelectionDAGBuilder.

This ensures that the loads and stores output for the memcpy won't be further
expanded during legalization, which would cause the total number of instructions
for the memcpy to exceed (often significantly) the inlining thresholds.

<rdar://problem/17829180>

llvm-svn: 234462
2015-04-09 03:40:33 +00:00
Rafael Espindola 132381f981 Use the cast machinery to remove dummy uses of formatted_raw_ostream.
If we know we are producing an object, we don't need to wrap the stream
in a formatted_raw_ostream anymore.

llvm-svn: 234461
2015-04-09 02:28:12 +00:00
Scott Douglass 7ad7792088 [ARM] make vminnm/vmaxnm work with ?le, ?ge and no-nans-fp-math
Because -menable-no-nans causes fcmp conditions to be rewritten
without 'o' or 'u' the recognition code in needs to cope. Also
extended it to handle 'le' and 'ge.

Differential Revision: http://reviews.llvm.org/D8725

llvm-svn: 234421
2015-04-08 17:18:28 +00:00
Toma Tabacu c6ce0749ad [mips] [IAS] Do not generate redundant move when expanding lw/sw with symbol.
Summary:
Even though there is no 2nd register operand in the "lw/sw $8, symbol" case, we still try to find one, 
and we end up with $0, which makes us generate an unnecessary "addu $8, $8, $0" (a.k.a. "move $8, $8").

We can avoid this by checking if the 2nd register operand is different from $0, before generating the addu.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8055

llvm-svn: 234406
2015-04-08 13:52:41 +00:00
Toma Tabacu 91fc0b3c10 [mips] [IAS] Add support for the BNEZL and BEQZL pseudo-instructions.
Summary:
They are of the form "bnezl/beqzl $rs, offset" and expand to "bnel/beql $rs, $zero, offset".

These instructions are used in Linux inline assembly.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8540

llvm-svn: 234401
2015-04-08 12:15:05 +00:00
Sergey Dmitrouk 3cc62b3715 [ARM][Debug Info] Restore emitting of .cfi_def_cfa_offset for functions without stack frame
Summary: Looks like new code from [[ http://reviews.llvm.org/rL222057 | rL222057 ]] doesn't account for early `return` in `ARMFrameLowering::emitPrologue`, which leads to loosing `.cfi_def_cfa_offset` directive for functions without stack frame.

Reviewers: echristo, rengolin, asl, t.p.northover

Reviewed By: t.p.northover

Subscribers: llvm-commits, rengolin, aemerson

Differential Revision: http://reviews.llvm.org/D8606

llvm-svn: 234399
2015-04-08 10:10:12 +00:00
Toma Tabacu 7567a10c47 [mips] [IAS] Remove AssemblerPredicate's from RelocPIC and RelocStatic.
Summary:
These AssemblerPredicate's are unnecessary and actually make some instructions unusable when assembling pre-MIPS32 ISAs.
For example, this was causing the IAS to reject the 'j' instruction for MIPS I-V.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8300

llvm-svn: 234398
2015-04-08 10:06:45 +00:00
Alexei Starovoitov f28ede810c [bpf] support BPF backend as shared library
dependencies were not set correctly for shared library build.
static was ok

Patch by Brenden Blanco.

llvm-svn: 234386
2015-04-08 03:46:51 +00:00
Tom Stellard 80e169a435 R600/SI: Add some missing overrides
llvm-svn: 234384
2015-04-08 02:07:05 +00:00
Tom Stellard d7e6f13671 R600/SI: Initial support for assembler and inline assembly
This is currently considered experimental, but most of the more
commonly used instructions should work.

So far only SI has been extensively tested, CI and VI probably work too,
but may be buggy.  The current set of tests cases do not give complete
coverage, but I think it is sufficient for an experimental assembler.

See the documentation in R600Usage for more information.

llvm-svn: 234381
2015-04-08 01:09:26 +00:00
Tom Stellard 8980dc323b R600/SI: Add missing SOPK instructions
llvm-svn: 234380
2015-04-08 01:09:22 +00:00
Tom Stellard 1f3416a63d R600/SI: Don't print offset0/offset1 DS operands when they are 0
llvm-svn: 234379
2015-04-08 01:09:19 +00:00
Tim Northover 5b44f1ba19 AArch64: disallow "fmov sD, #-0.0" during assembly.
We weren't checking the sign of the floating point immediate before translating
it to "fmov sD, wzr". Similarly for D-regs.

Technically "movi vD.2s, #0x80, lsl #24" would work most of the time, but it's
not a blessed alias (and I don't think it should be since people expect writing
sD to zero out the high lanes, and there's no dD equivalent). So an error it is.

rdar://20455398

llvm-svn: 234372
2015-04-07 22:49:47 +00:00
Ahmed Bougacha 273a9b4f03 [ARM] Mark a bunch of .td Operands with type _MEMORY.
This shouldn't affect anything in-tree, as the OperandType users are
mostly smart disassemblers and such; more information is helpful there.
However, on the flip side, that + the fact that this is just hinting at
the meaning of operands makes this not really test-worthy or testable.

Differential Revision: http://reviews.llvm.org/D8620

llvm-svn: 234350
2015-04-07 20:31:16 +00:00
Alexei Starovoitov 94c259d2ae [bpf] fix build
fix the build and remove unused variable warnings in Release mode.

Patch by Brenden Blanco.

llvm-svn: 234349
2015-04-07 20:25:34 +00:00
Matthias Braun b6ac8fa39e AArch64: Don't lower ISD::SELECT to ISD::SELECT_CC
Instead of lowering SELECT to SELECT_CC which is further lowered later
immediately call the SELECT_CC lowering code. This is preferable
because:
- Avoids an unnecessary roundtrip through the legalization queues with
  an intermediate node.
- More importantly: Lowered operations get visited last leading to SELECT_CC
  getting visited with legalized operands and unlegalized ones for preexisting
  SELECT_CC nodes. This does not hurt the current code (hence no testcase) but
  is required for another patch I am working on.

Differential Revision: http://reviews.llvm.org/D8187

llvm-svn: 234334
2015-04-07 17:33:05 +00:00
Toma Tabacu f25949b588 [mips] [IAS] Allow .set assignments for already defined symbols.
Summary:
This is not possible when using the IAS for MIPS, but it is possible when using the IAS for other architectures and when using GAS for MIPS.


Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8578

llvm-svn: 234316
2015-04-07 13:59:39 +00:00
Rafael Espindola b91455b5c0 Refactor a lot of duplicated code for stub output.
This also moves it earlier so that it they are produced before we print
an end symbol for the data section.

llvm-svn: 234315
2015-04-07 13:42:44 +00:00
Aaron Ballman ac33624075 Silencing several "enumeral and non-enumeral type in conditional expression" warnings; NFC.
llvm-svn: 234314
2015-04-07 13:28:37 +00:00
Duncan P. N. Exon Smith e686f1591f CodeGen: Stop using DIDescriptor::is*() and auto-casting
Same as r234255, but for lib/CodeGen and lib/Target.

llvm-svn: 234258
2015-04-06 23:27:40 +00:00
Tim Northover 42335572bb ARM: do not relax Thumb1 -> Thumb2 if only Thumb1 is available.
After recognising that a certain narrow instruction might need a relocation to
be represented, we used to unconditionally relax it to a Thumb2 instruction to
permit this. Unfortunately, some CPUs (e.g. v6m) don't even have most Thumb2
instructions, so we end up emitting a completely invalid instruction.

Theoretically, ELF does have relocations for these situations; but they are
fairly unusable with such short ranges and the ABI document even says they're
documented "for completeness". So an error is probably better there too.

rdar://20391953

llvm-svn: 234195
2015-04-06 18:44:42 +00:00
Simon Pilgrim 49ba9b8e2f [X86][SSE] Use (V)PINSRB for direct byte insertion in 16i8 buildvector on SSE4.1 targets
This patch allows SSE4.1 targets to use (V)PINSRB to create 16i8 vectors by inserting i8 scalars directly into a XMM register instead of merging pairs of i8 scalars into a i16 and using the SSE2 PINSRW instruction.

This allows folding of byte loads and reduces scalar register usage as well.

Differential Revision: http://reviews.llvm.org/D8839

llvm-svn: 234193
2015-04-06 18:39:00 +00:00
Rafael Espindola 972756b741 Remove unnecessary uses of AliasedSymbol.
As pr19627 points out, every use of AliasedSymbol is likely a bug.

The main use was to avoid the oddity of a variable showing up as undefined. That
was fixed in r233995, which made these calls nops.

llvm-svn: 234169
2015-04-06 16:10:05 +00:00
Rafael Espindola 61e8ce36be Store the sh_link of ARM_EXIDX directly in MCSectionELF.
This avoids some pretty horrible and broken name based section handling.

llvm-svn: 234142
2015-04-06 04:25:18 +00:00
Rafael Espindola 8ca44f0b5c Implement unique sections with an unique ID.
This allows the compiler/assembly programmer to switch back to a
section. This in turn fixes the bootstrap failure on powerpc (tested
on gcc110) without changing the ppc codegen at all.

I will try to cleanup the various getELFSection overloads in a  followup patch.
Just using a default argument now would lead to ambiguities.

llvm-svn: 234099
2015-04-04 18:02:01 +00:00
Craig Topper 7ea899a15f [X86] Apply AddedComplexity consistently for similar patterns. This keeps them together in the DAGISel tables and reduces table size slightly.
llvm-svn: 234086
2015-04-04 04:22:12 +00:00
Craig Topper 3d44178733 [X86] Add a comment about the change in r234075.
llvm-svn: 234079
2015-04-04 02:31:43 +00:00
Craig Topper 9012028738 [X86] Don't use GR64 register 'and with immediate' instructions if the immediate is zero in the upper 33-bits or upper 57-bits. Use GR32 instructions instead.
Previously the patterns didn't have high enough priority and we would only use the GR32 form if the only the upper 32 or 56 bits were zero.

Fixes PR23100.

llvm-svn: 234075
2015-04-04 02:08:20 +00:00
David Majnemer 3337064a47 [WinEH] Sink UnwindHelp completely out of IR
We don't need to represent UnwindHelp in IR.  Instead, we can use the
knowledge that we are emitting the parent function to decide if we
should create the UnwindHelp stack object.

llvm-svn: 234061
2015-04-03 22:32:26 +00:00
David Blaikie aa41cd57e0 [opaque pointer type] More GEP IRBuilder API migrations...
llvm-svn: 234058
2015-04-03 21:33:42 +00:00
David Blaikie 93c5444fe0 [opaque pointer type] More GEP API migrations in IRBuilder uses
The plan here is to push the API changes out from the common components
(like Constant::getGetElementPtr and IRBuilder::CreateGEP related
functions) and just update callers to either pass the type if it's
obvious, or pass null.

Do this with LoadInst as well and anything else that comes up, then to
start porting specific uses to not pass null anymore - this may require
some refactoring in each case.

llvm-svn: 234042
2015-04-03 19:41:44 +00:00
Duncan P. N. Exon Smith 3bef6a3803 CodeGen: Assert that inlined-at locations agree
As a follow-up to r234021, assert that a debug info intrinsic variable's
`MDLocalVariable::getInlinedAt()` always matches the
`MDLocation::getInlinedAt()` of its `!dbg` attachment.

The goal here is to get rid of `MDLocalVariable::getInlinedAt()`
entirely (PR22778), but I'll let these assertions bake for a while
first.

If you have an out-of-tree backend that just broke, you're probably
attaching the wrong `DebugLoc` to a `DBG_VALUE` instruction.  The one
you want is the location that was attached to the corresponding
`@llvm.dbg.declare` or `@llvm.dbg.value` call that you started with.

llvm-svn: 234038
2015-04-03 19:20:26 +00:00
Simon Pilgrim 0184622bbc [X86] Added SSE4.2 CRC32 memory folding patterns + tests
llvm-svn: 234013
2015-04-03 14:24:40 +00:00
Bill Schmidt 91dd765a04 [PowerPC] Enable splat generation for BUILD_VECTOR with little endian
When enabling PPC64LE, I disabled some optimizations of BUILD_VECTOR
nodes for little endian because wrong results were produced.  I've
subsequently investigated and found this is due to a call to
BuildVectorSDNode::isConstantSplat that was always specifying
big-endian.  With this changed to correctly identify the target
endianness, the optimizations work as expected.

I found another case of a call to the same method with big-endian
hardcoded, in PPC::isAllNegativeZeroVector().  I discovered this was
an orphaned method with no callers, so I've just removed it.

The existing test/CodeGen/PowerPC/vec_constants.ll checks these
optimizations, so for testing I've just added a variant for little
endian.

llvm-svn: 234011
2015-04-03 13:48:24 +00:00
Simon Pilgrim 8dba5da06d [X86][3DNow] Added 3DNow! memory folding patterns + tests
llvm-svn: 234008
2015-04-03 11:50:30 +00:00
Peter Collingbourne 10d362c51b MC: For variable symbols, maintain MCSymbol::Section as a cache.
Fixes PR19582.

Previously, when an asm assignment (.set or =) was created, we would look up
the section immediately in MCSymbol::setVariableValue. This caused symbols
to receive the wrong section if the RHS of the assignment had not been seen
yet. This had a knock-on effect in the object file emitters, causing them
to emit extra symbols, or to give symbols the wrong visibility or the wrong
section. For example, in the following asm:

.data
.Llocal:

.text
leaq .Llocal1(%rip), %rdi
.Llocal1 = .Llocal2
.Llocal2 = .Llocal

the first assignment would give .Llocal1 a null section, which would never get
fixed up by the second assignment. This would cause the ELF object file emitter
to consider .Llocal1 to be an undefined symbol and give it external linkage,
even though .Llocal1 should not have been emitted at all in the object file.

Or in the following asm:

alias_to_local = Ltmp0
Ltmp0:

the Mach-O object file emitter would give the alias_to_local symbol a n_type
of N_SECT and a n_sect of 0.  This is invalid under the Mach-O specification,
which requires N_SECT symbols to receive a non-zero section number if the
symbol is defined in a section in the object file.

https://developer.apple.com/library/mac/documentation/DeveloperTools/Conceptual/MachORuntime/#//apple_ref/c/tag/nlist

After this change we do not look up the section when the assignment is created,
but instead look it up on demand and store it in Section, which is treated
as a cache if the symbol is a variable symbol.

This change also fixes a bug in MCExpr::FindAssociatedSection. Previously,
if we saw a subtraction, we would return the first referenced section, even in
cases where we should have been returning the absolute pseudo-section. Now we
always return the absolute pseudo-section for expressions that subtract two
section-derived expressions. This isn't always correct (e.g. if one of the
sections ends up being laid out at an absolute address), but it's probably
the best we can do without more context.

This allows us to remove code in two places where we appear to have been
working around this bug, in MachObjectWriter::markAbsoluteVariableSymbols
and in X86AsmPrinter::EmitStartOfAsmFile.

Re-applies r233595 (aka D8586), which was reverted in r233898.

Differential Revision: http://reviews.llvm.org/D8798

llvm-svn: 233995
2015-04-03 01:46:11 +00:00
Matthias Braun 6a42d7fd6b ARM: Handle physreg targets in RegPair hints gracefully
Register coalescing can change the target of a RegPair hint to a
physreg, we should not crash on this. This also slightly improved the
way ARMBaseRegisterInfo::updateRegAllocHint() works.

llvm-svn: 233987
2015-04-03 00:18:38 +00:00
Sanjay Patel eca590ffb3 [AVX] Improve insertion of i8 or i16 into low element of 256-bit zero vector
Without this patch, we split the 256-bit vector into halves and produced something like:
	movzwl	(%rdi), %eax
	vmovd	%eax, %xmm0
	vxorps	%xmm1, %xmm1, %xmm1
	vblendps	$15, %ymm0, %ymm1, %ymm0 ## ymm0 = ymm0[0,1,2,3],ymm1[4,5,6,7]

Now, we eliminate the xor and blend because those zeros are free with the vmovd:
        movzwl  (%rdi), %eax
        vmovd   %eax, %xmm0

This should be the final fix needed to resolve PR22685:
https://llvm.org/bugs/show_bug.cgi?id=22685

llvm-svn: 233941
2015-04-02 20:21:52 +00:00
David Blaikie 4a2e73b066 [opaque pointer type] API migration for GEP constant factories
Require the pointee type to be passed explicitly and assert that it is
correct. For now it's possible to pass nullptr here (and I've done so in
a few places in this patch) but eventually that will be disallowed once
all clients have been updated or removed. It'll be a long road to get
all the way there... but if you have the cahnce to update your callers
to pass the type explicitly without depending on a pointer's element
type, that would be a good thing to do soon and a necessary thing to do
eventually.

llvm-svn: 233938
2015-04-02 18:55:32 +00:00
Quentin Colombet a64723c2bf [AArch64] Add a comment to make it explicit why we increased the complexity.
Follow-up of r233653.

llvm-svn: 233936
2015-04-02 18:54:23 +00:00
Sanjay Patel 2bb5d695f9 [X86, AVX] adjust tablegen patterns to generate better code for scalar insertion into zero vector (PR23073)
For code like this:

define <8 x i32> @load_v8i32() {
  ret <8 x i32> <i32 7, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>
}

We produce this AVX code:

_load_v8i32:                            ## @load_v8i32
  movl	$7, %eax
  vmovd	%eax, %xmm0
  vxorps	%ymm1, %ymm1, %ymm1
  vblendps	$1, %ymm0, %ymm1, %ymm0 ## ymm0 = ymm0[0],ymm1[1,2,3,4,5,6,7]
  retq

There are at least 2 bugs in play here:

    We're generating a blend when a move scalar does the same job using 2 less instruction bytes (see FIXMEs).
    We're not matching an existing pattern that would eliminate the xor and blend entirely. The zero bytes are free with vmovd.

The 2nd fix involves an adjustment of "AddedComplexity" [1] and mostly masks the 1st problem.

[1] AddedComplexity has close to no documentation in the source. 
The best we have is this comment: "roughly corresponds to the number of nodes that are covered". 
It appears that x86 has bastardized this definition by inflating its values for some other
undocumented reason. For example, we have a pattern with "AddedComplexity = 400" (!). 

I searched my way to this page:
https://groups.google.com/forum/#!topic/llvm-dev/5UX-Og9M0xQ

Differential Revision: http://reviews.llvm.org/D8794

llvm-svn: 233931
2015-04-02 17:56:17 +00:00
Vasileios Kalintiris 580f067f72 [mips] Implement eliminateCallFramePseudoInstr() in MipsFrameLowering. NFC.
Summary:
Avoid duplicate code in Mips16FrameLowering and MipsSEFrameLowering by
providing an implementation of the eliminateCallFramePseudoInstr()
function from their base class.

Depends on D8640.

Reviewers: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8641

llvm-svn: 233909
2015-04-02 11:09:40 +00:00
Elena Demikhovsky 1eeece1285 AVX-512: intrinsics for VPADD, VPMULDQ and VPSUB
by Asaf Badouh (asaf.badouh@intel.com)

llvm-svn: 233906
2015-04-02 10:51:40 +00:00
Vasileios Kalintiris 6d68778588 [mips] Expose adjustStackPtr() from MipsInstrInfo. NFC.
Summary:
adjustStackPtr() is implemented from both MipsSEInstrInfo and
Mips16InstrInfo. It makes sense to expose this function from
MipsInstrInfo and avoid explicit casting in some cases.

Depends on D8638.

Reviewers: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8640

llvm-svn: 233905
2015-04-02 10:42:44 +00:00
Vasileios Kalintiris b3698a53fa [mips] Make sure that we don't adjust the stack pointer by zero amount.
Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8638

llvm-svn: 233904
2015-04-02 10:14:54 +00:00
Peter Collingbourne 949eb3f6a7 Revert r233595, "MC: For variable symbols, maintain MCSymbol::Section as a cache."
llvm-svn: 233898
2015-04-02 07:02:51 +00:00
Benjamin Kramer 3a16a36bb6 [X86] Don't accidentally select shll $1, %eax when shrinking an immediate.
addl has higher throughput and this was needlessly picking a suboptimal
encoding causing PR23098.

I wish there was a way of doing this without further duplicating tbl-
generated patterns, but so far I haven't found one.

llvm-svn: 233832
2015-04-01 19:01:09 +00:00
Vladimir Sukharev 2afdb32c06 [ARM] Rename v8.1a from "extension" to "architecture"
v8.1a is renamed to architecture, following current entity naming approach.

Excess generic cpu is removed. Intended use: "generic" cpu with "v8.1a" subtarget feature

Reviewers: jmolloy

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8767

llvm-svn: 233811
2015-04-01 14:54:56 +00:00
Vladimir Sukharev 439328e172 [AArch64] Rename v8.1a from "extension" to "architecture"
v8.1a is renamed to architecture, accordingly to approaches in ARM backend.

Excess generic cpu is removed. Intended use: "generic" cpu with "v8.1a" subtarget feature

Reviewers: jmolloy

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8766

llvm-svn: 233810
2015-04-01 14:49:29 +00:00
Ulrich Weigand 57c85f53ba [SystemZ] Support transactional execution on zEC12
The zEC12 provides the transactional-execution facility.  This is exposed
to users via a set of builtin routines on other compilers.  This patch
adds LLVM support to enable those builtins.  In partciular, the patch:

- adds the transactional-execution and processor-assist facilities
- adds MC support for all instructions provided by those facilities
- adds LLVM intrinsics for those instructions and hooks them up for CodeGen
- adds CodeGen support to optimize CC return value checking

Since this is first use of target-specific intrinsics on the platform,
the patch creates the include/llvm/IR/IntrinsicsSystemZ.td file and
hooks it up in Intrinsics.td.  I've also changed Triple::getArchTypePrefix
to return "s390" instead of "systemz", since the naming convention for
GCC intrinsics uses "s390" on the platform, and it neemed more straight-
forward to use the same convention for LLVM IR intrinsics.

An associated clang patch makes the intrinsics (and command line switches)
available at the source-language level.

For reference, the transactional-execution instructions are documented
in the z/Architecture Principles of Operation for the zEC12:
http://publibfp.boulder.ibm.com/cgi-bin/bookmgr/download/DZ9ZR009.pdf
The associated builtins are documented in the GCC manual:
http://gcc.gnu.org/onlinedocs/gcc/S_002f390-System-z-Built-in-Functions.html


Index: llvm-head/lib/Target/SystemZ/SystemZOperators.td
===================================================================
--- llvm-head.orig/lib/Target/SystemZ/SystemZOperators.td
+++ llvm-head/lib/Target/SystemZ/SystemZOperators.td
@@ -79,6 +79,9 @@ def SDT_ZI32Intrinsic       : SDTypeProf
 def SDT_ZPrefetch           : SDTypeProfile<0, 2,
                                             [SDTCisVT<0, i32>,
                                              SDTCisPtrTy<1>]>;
+def SDT_ZTBegin             : SDTypeProfile<0, 2,
+                                            [SDTCisPtrTy<0>,
+                                             SDTCisVT<1, i32>]>;
 
 //===----------------------------------------------------------------------===//
 // Node definitions
@@ -180,6 +183,15 @@ def z_prefetch          : SDNode<"System
                                  [SDNPHasChain, SDNPMayLoad, SDNPMayStore,
                                   SDNPMemOperand]>;
 
+def z_tbegin            : SDNode<"SystemZISD::TBEGIN", SDT_ZTBegin,
+                                 [SDNPHasChain, SDNPOutGlue, SDNPMayStore,
+                                  SDNPSideEffect]>;
+def z_tbegin_nofloat    : SDNode<"SystemZISD::TBEGIN_NOFLOAT", SDT_ZTBegin,
+                                 [SDNPHasChain, SDNPOutGlue, SDNPMayStore,
+                                  SDNPSideEffect]>;
+def z_tend              : SDNode<"SystemZISD::TEND", SDTNone,
+                                 [SDNPHasChain, SDNPOutGlue, SDNPSideEffect]>;
+
 //===----------------------------------------------------------------------===//
 // Pattern fragments
 //===----------------------------------------------------------------------===//
Index: llvm-head/lib/Target/SystemZ/SystemZInstrFormats.td
===================================================================
--- llvm-head.orig/lib/Target/SystemZ/SystemZInstrFormats.td
+++ llvm-head/lib/Target/SystemZ/SystemZInstrFormats.td
@@ -473,6 +473,17 @@ class InstSS<bits<8> op, dag outs, dag i
   let Inst{15-0}  = BD2;
 }
 
+class InstS<bits<16> op, dag outs, dag ins, string asmstr, list<dag> pattern>
+  : InstSystemZ<4, outs, ins, asmstr, pattern> {
+  field bits<32> Inst;
+  field bits<32> SoftFail = 0;
+
+  bits<16> BD2;
+
+  let Inst{31-16} = op;
+  let Inst{15-0}  = BD2;
+}
+
 //===----------------------------------------------------------------------===//
 // Instruction definitions with semantics
 //===----------------------------------------------------------------------===//
Index: llvm-head/lib/Target/SystemZ/SystemZInstrInfo.td
===================================================================
--- llvm-head.orig/lib/Target/SystemZ/SystemZInstrInfo.td
+++ llvm-head/lib/Target/SystemZ/SystemZInstrInfo.td
@@ -1362,6 +1362,60 @@ let Defs = [CC] in {
 }
 
 //===----------------------------------------------------------------------===//
+// Transactional execution
+//===----------------------------------------------------------------------===//
+
+let Predicates = [FeatureTransactionalExecution] in {
+  // Transaction Begin
+  let hasSideEffects = 1, mayStore = 1,
+      usesCustomInserter = 1, Defs = [CC] in {
+    def TBEGIN : InstSIL<0xE560,
+                         (outs), (ins bdaddr12only:$BD1, imm32zx16:$I2),
+                         "tbegin\t$BD1, $I2",
+                         [(z_tbegin bdaddr12only:$BD1, imm32zx16:$I2)]>;
+    def TBEGIN_nofloat : Pseudo<(outs), (ins bdaddr12only:$BD1, imm32zx16:$I2),
+                                [(z_tbegin_nofloat bdaddr12only:$BD1,
+                                                   imm32zx16:$I2)]>;
+    def TBEGINC : InstSIL<0xE561,
+                          (outs), (ins bdaddr12only:$BD1, imm32zx16:$I2),
+                          "tbeginc\t$BD1, $I2",
+                          [(int_s390_tbeginc bdaddr12only:$BD1,
+                                             imm32zx16:$I2)]>;
+  }
+
+  // Transaction End
+  let hasSideEffects = 1, Defs = [CC], BD2 = 0 in
+    def TEND : InstS<0xB2F8, (outs), (ins), "tend", [(z_tend)]>;
+
+  // Transaction Abort
+  let hasSideEffects = 1, isTerminator = 1, isBarrier = 1 in
+    def TABORT : InstS<0xB2FC, (outs), (ins bdaddr12only:$BD2),
+                       "tabort\t$BD2",
+                       [(int_s390_tabort bdaddr12only:$BD2)]>;
+
+  // Nontransactional Store
+  let hasSideEffects = 1 in
+    def NTSTG : StoreRXY<"ntstg", 0xE325, int_s390_ntstg, GR64, 8>;
+
+  // Extract Transaction Nesting Depth
+  let hasSideEffects = 1 in
+    def ETND : InherentRRE<"etnd", 0xB2EC, GR32, (int_s390_etnd)>;
+}
+
+//===----------------------------------------------------------------------===//
+// Processor assist
+//===----------------------------------------------------------------------===//
+
+let Predicates = [FeatureProcessorAssist] in {
+  let hasSideEffects = 1, R4 = 0 in
+    def PPA : InstRRF<0xB2E8, (outs), (ins GR64:$R1, GR64:$R2, imm32zx4:$R3),
+                      "ppa\t$R1, $R2, $R3", []>;
+  def : Pat<(int_s390_ppa_txassist GR32:$src),
+            (PPA (INSERT_SUBREG (i64 (IMPLICIT_DEF)), GR32:$src, subreg_l32),
+                 0, 1)>;
+}
+
+//===----------------------------------------------------------------------===//
 // Miscellaneous Instructions.
 //===----------------------------------------------------------------------===//
 
Index: llvm-head/lib/Target/SystemZ/SystemZProcessors.td
===================================================================
--- llvm-head.orig/lib/Target/SystemZ/SystemZProcessors.td
+++ llvm-head/lib/Target/SystemZ/SystemZProcessors.td
@@ -60,6 +60,16 @@ def FeatureMiscellaneousExtensions : Sys
   "Assume that the miscellaneous-extensions facility is installed"
 >;
 
+def FeatureTransactionalExecution : SystemZFeature<
+  "transactional-execution", "TransactionalExecution",
+  "Assume that the transactional-execution facility is installed"
+>;
+
+def FeatureProcessorAssist : SystemZFeature<
+  "processor-assist", "ProcessorAssist",
+  "Assume that the processor-assist facility is installed"
+>;
+
 def : Processor<"generic", NoItineraries, []>;
 def : Processor<"z10", NoItineraries, []>;
 def : Processor<"z196", NoItineraries,
@@ -70,4 +80,5 @@ def : Processor<"zEC12", NoItineraries,
                 [FeatureDistinctOps, FeatureLoadStoreOnCond, FeatureHighWord,
                  FeatureFPExtension, FeaturePopulationCount,
                  FeatureFastSerialization, FeatureInterlockedAccess1,
-                 FeatureMiscellaneousExtensions]>;
+                 FeatureMiscellaneousExtensions,
+                 FeatureTransactionalExecution, FeatureProcessorAssist]>;
Index: llvm-head/lib/Target/SystemZ/SystemZSubtarget.cpp
===================================================================
--- llvm-head.orig/lib/Target/SystemZ/SystemZSubtarget.cpp
+++ llvm-head/lib/Target/SystemZ/SystemZSubtarget.cpp
@@ -40,6 +40,7 @@ SystemZSubtarget::SystemZSubtarget(const
       HasLoadStoreOnCond(false), HasHighWord(false), HasFPExtension(false),
       HasPopulationCount(false), HasFastSerialization(false),
       HasInterlockedAccess1(false), HasMiscellaneousExtensions(false),
+      HasTransactionalExecution(false), HasProcessorAssist(false),
       TargetTriple(TT), InstrInfo(initializeSubtargetDependencies(CPU, FS)),
       TLInfo(TM, *this), TSInfo(*TM.getDataLayout()), FrameLowering() {}
 
Index: llvm-head/lib/Target/SystemZ/SystemZSubtarget.h
===================================================================
--- llvm-head.orig/lib/Target/SystemZ/SystemZSubtarget.h
+++ llvm-head/lib/Target/SystemZ/SystemZSubtarget.h
@@ -42,6 +42,8 @@ protected:
   bool HasFastSerialization;
   bool HasInterlockedAccess1;
   bool HasMiscellaneousExtensions;
+  bool HasTransactionalExecution;
+  bool HasProcessorAssist;
 
 private:
   Triple TargetTriple;
@@ -102,6 +104,12 @@ public:
     return HasMiscellaneousExtensions;
   }
 
+  // Return true if the target has the transactional-execution facility.
+  bool hasTransactionalExecution() const { return HasTransactionalExecution; }
+
+  // Return true if the target has the processor-assist facility.
+  bool hasProcessorAssist() const { return HasProcessorAssist; }
+
   // Return true if GV can be accessed using LARL for reloc model RM
   // and code model CM.
   bool isPC32DBLSymbol(const GlobalValue *GV, Reloc::Model RM,
Index: llvm-head/lib/Support/Triple.cpp
===================================================================
--- llvm-head.orig/lib/Support/Triple.cpp
+++ llvm-head/lib/Support/Triple.cpp
@@ -92,7 +92,7 @@ const char *Triple::getArchTypePrefix(Ar
   case sparcv9:
   case sparc:       return "sparc";
 
-  case systemz:     return "systemz";
+  case systemz:     return "s390";
 
   case x86:
   case x86_64:      return "x86";
Index: llvm-head/include/llvm/IR/Intrinsics.td
===================================================================
--- llvm-head.orig/include/llvm/IR/Intrinsics.td
+++ llvm-head/include/llvm/IR/Intrinsics.td
@@ -634,3 +634,4 @@ include "llvm/IR/IntrinsicsNVVM.td"
 include "llvm/IR/IntrinsicsMips.td"
 include "llvm/IR/IntrinsicsR600.td"
 include "llvm/IR/IntrinsicsBPF.td"
+include "llvm/IR/IntrinsicsSystemZ.td"
Index: llvm-head/include/llvm/IR/IntrinsicsSystemZ.td
===================================================================
--- /dev/null
+++ llvm-head/include/llvm/IR/IntrinsicsSystemZ.td
@@ -0,0 +1,46 @@
+//===- IntrinsicsSystemZ.td - Defines SystemZ intrinsics ---*- tablegen -*-===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+// This file defines all of the SystemZ-specific intrinsics.
+//
+//===----------------------------------------------------------------------===//
+
+//===----------------------------------------------------------------------===//
+//
+// Transactional-execution intrinsics
+//
+//===----------------------------------------------------------------------===//
+
+let TargetPrefix = "s390" in {
+  def int_s390_tbegin : Intrinsic<[llvm_i32_ty], [llvm_ptr_ty, llvm_i32_ty],
+                                  [IntrNoDuplicate]>;
+
+  def int_s390_tbegin_nofloat : Intrinsic<[llvm_i32_ty],
+                                          [llvm_ptr_ty, llvm_i32_ty],
+                                          [IntrNoDuplicate]>;
+
+  def int_s390_tbeginc : Intrinsic<[], [llvm_ptr_ty, llvm_i32_ty],
+                                   [IntrNoDuplicate]>;
+
+  def int_s390_tabort : Intrinsic<[], [llvm_i64_ty],
+                                  [IntrNoReturn, Throws]>;
+
+  def int_s390_tend : GCCBuiltin<"__builtin_tend">,
+                      Intrinsic<[llvm_i32_ty], []>;
+
+  def int_s390_etnd : GCCBuiltin<"__builtin_tx_nesting_depth">,
+                      Intrinsic<[llvm_i32_ty], [], [IntrNoMem]>;
+
+  def int_s390_ntstg : Intrinsic<[], [llvm_i64_ty, llvm_ptr64_ty],
+                                 [IntrReadWriteArgMem]>;
+
+  def int_s390_ppa_txassist : GCCBuiltin<"__builtin_tx_assist">,
+                              Intrinsic<[], [llvm_i32_ty]>;
+}
+
Index: llvm-head/lib/Target/SystemZ/SystemZ.h
===================================================================
--- llvm-head.orig/lib/Target/SystemZ/SystemZ.h
+++ llvm-head/lib/Target/SystemZ/SystemZ.h
@@ -68,6 +68,18 @@ const unsigned CCMASK_TM_MSB_0       = C
 const unsigned CCMASK_TM_MSB_1       = CCMASK_2 | CCMASK_3;
 const unsigned CCMASK_TM             = CCMASK_ANY;
 
+// Condition-code mask assignments for TRANSACTION_BEGIN.
+const unsigned CCMASK_TBEGIN_STARTED       = CCMASK_0;
+const unsigned CCMASK_TBEGIN_INDETERMINATE = CCMASK_1;
+const unsigned CCMASK_TBEGIN_TRANSIENT     = CCMASK_2;
+const unsigned CCMASK_TBEGIN_PERSISTENT    = CCMASK_3;
+const unsigned CCMASK_TBEGIN               = CCMASK_ANY;
+
+// Condition-code mask assignments for TRANSACTION_END.
+const unsigned CCMASK_TEND_TX   = CCMASK_0;
+const unsigned CCMASK_TEND_NOTX = CCMASK_2;
+const unsigned CCMASK_TEND      = CCMASK_TEND_TX | CCMASK_TEND_NOTX;
+
 // The position of the low CC bit in an IPM result.
 const unsigned IPM_CC = 28;
 
Index: llvm-head/lib/Target/SystemZ/SystemZISelLowering.h
===================================================================
--- llvm-head.orig/lib/Target/SystemZ/SystemZISelLowering.h
+++ llvm-head/lib/Target/SystemZ/SystemZISelLowering.h
@@ -146,6 +146,15 @@ enum {
   // Perform a serialization operation.  (BCR 15,0 or BCR 14,0.)
   SERIALIZE,
 
+  // Transaction begin.  The first operand is the chain, the second
+  // the TDB pointer, and the third the immediate control field.
+  // Returns chain and glue.
+  TBEGIN,
+  TBEGIN_NOFLOAT,
+
+  // Transaction end.  Just the chain operand.  Returns chain and glue.
+  TEND,
+
   // Wrappers around the inner loop of an 8- or 16-bit ATOMIC_SWAP or
   // ATOMIC_LOAD_<op>.
   //
@@ -318,6 +327,7 @@ private:
   SDValue lowerSTACKSAVE(SDValue Op, SelectionDAG &DAG) const;
   SDValue lowerSTACKRESTORE(SDValue Op, SelectionDAG &DAG) const;
   SDValue lowerPREFETCH(SDValue Op, SelectionDAG &DAG) const;
+  SDValue lowerINTRINSIC_W_CHAIN(SDValue Op, SelectionDAG &DAG) const;
 
   // If the last instruction before MBBI in MBB was some form of COMPARE,
   // try to replace it with a COMPARE AND BRANCH just before MBBI.
@@ -355,6 +365,10 @@ private:
   MachineBasicBlock *emitStringWrapper(MachineInstr *MI,
                                        MachineBasicBlock *BB,
                                        unsigned Opcode) const;
+  MachineBasicBlock *emitTransactionBegin(MachineInstr *MI,
+                                          MachineBasicBlock *MBB,
+                                          unsigned Opcode,
+                                          bool NoFloat) const;
 };
 } // end namespace llvm
 
Index: llvm-head/lib/Target/SystemZ/SystemZISelLowering.cpp
===================================================================
--- llvm-head.orig/lib/Target/SystemZ/SystemZISelLowering.cpp
+++ llvm-head/lib/Target/SystemZ/SystemZISelLowering.cpp
@@ -20,6 +20,7 @@
 #include "llvm/CodeGen/MachineInstrBuilder.h"
 #include "llvm/CodeGen/MachineRegisterInfo.h"
 #include "llvm/CodeGen/TargetLoweringObjectFileImpl.h"
+#include "llvm/IR/Intrinsics.h"
 #include <cctype>
 
 using namespace llvm;
@@ -304,6 +305,9 @@ SystemZTargetLowering::SystemZTargetLowe
   // Codes for which we want to perform some z-specific combinations.
   setTargetDAGCombine(ISD::SIGN_EXTEND);
 
+  // Handle intrinsics.
+  setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::Other, Custom);
+
   // We want to use MVC in preference to even a single load/store pair.
   MaxStoresPerMemcpy = 0;
   MaxStoresPerMemcpyOptSize = 0;
@@ -1031,6 +1035,53 @@ prepareVolatileOrAtomicLoad(SDValue Chai
   return DAG.getNode(SystemZISD::SERIALIZE, DL, MVT::Other, Chain);
 }
 
+// Return true if Op is an intrinsic node with chain that returns the CC value
+// as its only (other) argument.  Provide the associated SystemZISD opcode and
+// the mask of valid CC values if so.
+static bool isIntrinsicWithCCAndChain(SDValue Op, unsigned &Opcode,
+                                      unsigned &CCValid) {
+  unsigned Id = cast<ConstantSDNode>(Op.getOperand(1))->getZExtValue();
+  switch (Id) {
+  case Intrinsic::s390_tbegin:
+    Opcode = SystemZISD::TBEGIN;
+    CCValid = SystemZ::CCMASK_TBEGIN;
+    return true;
+
+  case Intrinsic::s390_tbegin_nofloat:
+    Opcode = SystemZISD::TBEGIN_NOFLOAT;
+    CCValid = SystemZ::CCMASK_TBEGIN;
+    return true;
+
+  case Intrinsic::s390_tend:
+    Opcode = SystemZISD::TEND;
+    CCValid = SystemZ::CCMASK_TEND;
+    return true;
+
+  default:
+    return false;
+  }
+}
+
+// Emit an intrinsic with chain with a glued value instead of its CC result.
+static SDValue emitIntrinsicWithChainAndGlue(SelectionDAG &DAG, SDValue Op,
+                                             unsigned Opcode) {
+  // Copy all operands except the intrinsic ID.
+  unsigned NumOps = Op.getNumOperands();
+  SmallVector<SDValue, 6> Ops;
+  Ops.reserve(NumOps - 1);
+  Ops.push_back(Op.getOperand(0));
+  for (unsigned I = 2; I < NumOps; ++I)
+    Ops.push_back(Op.getOperand(I));
+
+  assert(Op->getNumValues() == 2 && "Expected only CC result and chain");
+  SDVTList RawVTs = DAG.getVTList(MVT::Other, MVT::Glue);
+  SDValue Intr = DAG.getNode(Opcode, SDLoc(Op), RawVTs, Ops);
+  SDValue OldChain = SDValue(Op.getNode(), 1);
+  SDValue NewChain = SDValue(Intr.getNode(), 0);
+  DAG.ReplaceAllUsesOfValueWith(OldChain, NewChain);
+  return Intr;
+}
+
 // CC is a comparison that will be implemented using an integer or
 // floating-point comparison.  Return the condition code mask for
 // a branch on true.  In the integer case, CCMASK_CMP_UO is set for
@@ -1588,9 +1639,53 @@ static void adjustForTestUnderMask(Selec
   C.CCMask = NewCCMask;
 }
 
+// Return a Comparison that tests the condition-code result of intrinsic
+// node Call against constant integer CC using comparison code Cond.
+// Opcode is the opcode of the SystemZISD operation for the intrinsic
+// and CCValid is the set of possible condition-code results.
+static Comparison getIntrinsicCmp(SelectionDAG &DAG, unsigned Opcode,
+                                  SDValue Call, unsigned CCValid, uint64_t CC,
+                                  ISD::CondCode Cond) {
+  Comparison C(Call, SDValue());
+  C.Opcode = Opcode;
+  C.CCValid = CCValid;
+  if (Cond == ISD::SETEQ)
+    // bit 3 for CC==0, bit 0 for CC==3, always false for CC>3.
+    C.CCMask = CC < 4 ? 1 << (3 - CC) : 0;
+  else if (Cond == ISD::SETNE)
+    // ...and the inverse of that.
+    C.CCMask = CC < 4 ? ~(1 << (3 - CC)) : -1;
+  else if (Cond == ISD::SETLT || Cond == ISD::SETULT)
+    // bits above bit 3 for CC==0 (always false), bits above bit 0 for CC==3,
+    // always true for CC>3.
+    C.CCMask = CC < 4 ? -1 << (4 - CC) : -1;
+  else if (Cond == ISD::SETGE || Cond == ISD::SETUGE)
+    // ...and the inverse of that.
+    C.CCMask = CC < 4 ? ~(-1 << (4 - CC)) : 0;
+  else if (Cond == ISD::SETLE || Cond == ISD::SETULE)
+    // bit 3 and above for CC==0, bit 0 and above for CC==3 (always true),
+    // always true for CC>3.
+    C.CCMask = CC < 4 ? -1 << (3 - CC) : -1;
+  else if (Cond == ISD::SETGT || Cond == ISD::SETUGT)
+    // ...and the inverse of that.
+    C.CCMask = CC < 4 ? ~(-1 << (3 - CC)) : 0;
+  else
+    llvm_unreachable("Unexpected integer comparison type");
+  C.CCMask &= CCValid;
+  return C;
+}
+
 // Decide how to implement a comparison of type Cond between CmpOp0 with CmpOp1.
 static Comparison getCmp(SelectionDAG &DAG, SDValue CmpOp0, SDValue CmpOp1,
                          ISD::CondCode Cond) {
+  if (CmpOp1.getOpcode() == ISD::Constant) {
+    uint64_t Constant = cast<ConstantSDNode>(CmpOp1)->getZExtValue();
+    unsigned Opcode, CCValid;
+    if (CmpOp0.getOpcode() == ISD::INTRINSIC_W_CHAIN &&
+        CmpOp0.getResNo() == 0 && CmpOp0->hasNUsesOfValue(1, 0) &&
+        isIntrinsicWithCCAndChain(CmpOp0, Opcode, CCValid))
+      return getIntrinsicCmp(DAG, Opcode, CmpOp0, CCValid, Constant, Cond);
+  }
   Comparison C(CmpOp0, CmpOp1);
   C.CCMask = CCMaskForCondCode(Cond);
   if (C.Op0.getValueType().isFloatingPoint()) {
@@ -1632,6 +1727,17 @@ static Comparison getCmp(SelectionDAG &D
 
 // Emit the comparison instruction described by C.
 static SDValue emitCmp(SelectionDAG &DAG, SDLoc DL, Comparison &C) {
+  if (!C.Op1.getNode()) {
+    SDValue Op;
+    switch (C.Op0.getOpcode()) {
+    case ISD::INTRINSIC_W_CHAIN:
+      Op = emitIntrinsicWithChainAndGlue(DAG, C.Op0, C.Opcode);
+      break;
+    default:
+      llvm_unreachable("Invalid comparison operands");
+    }
+    return SDValue(Op.getNode(), Op->getNumValues() - 1);
+  }
   if (C.Opcode == SystemZISD::ICMP)
     return DAG.getNode(SystemZISD::ICMP, DL, MVT::Glue, C.Op0, C.Op1,
                        DAG.getConstant(C.ICmpType, MVT::i32));
@@ -1713,7 +1819,6 @@ SDValue SystemZTargetLowering::lowerSETC
 }
 
 SDValue SystemZTargetLowering::lowerBR_CC(SDValue Op, SelectionDAG &DAG) const {
-  SDValue Chain    = Op.getOperand(0);
   ISD::CondCode CC = cast<CondCodeSDNode>(Op.getOperand(1))->get();
   SDValue CmpOp0   = Op.getOperand(2);
   SDValue CmpOp1   = Op.getOperand(3);
@@ -1723,7 +1828,7 @@ SDValue SystemZTargetLowering::lowerBR_C
   Comparison C(getCmp(DAG, CmpOp0, CmpOp1, CC));
   SDValue Glue = emitCmp(DAG, DL, C);
   return DAG.getNode(SystemZISD::BR_CCMASK, DL, Op.getValueType(),
-                     Chain, DAG.getConstant(C.CCValid, MVT::i32),
+                     Op.getOperand(0), DAG.getConstant(C.CCValid, MVT::i32),
                      DAG.getConstant(C.CCMask, MVT::i32), Dest, Glue);
 }
 
@@ -2561,6 +2666,30 @@ SDValue SystemZTargetLowering::lowerPREF
                                  Node->getMemoryVT(), Node->getMemOperand());
 }
 
+// Return an i32 that contains the value of CC immediately after After,
+// whose final operand must be MVT::Glue.
+static SDValue getCCResult(SelectionDAG &DAG, SDNode *After) {
+  SDValue Glue = SDValue(After, After->getNumValues() - 1);
+  SDValue IPM = DAG.getNode(SystemZISD::IPM, SDLoc(After), MVT::i32, Glue);
+  return DAG.getNode(ISD::SRL, SDLoc(After), MVT::i32, IPM,
+                     DAG.getConstant(SystemZ::IPM_CC, MVT::i32));
+}
+
+SDValue
+SystemZTargetLowering::lowerINTRINSIC_W_CHAIN(SDValue Op,
+                                              SelectionDAG &DAG) const {
+  unsigned Opcode, CCValid;
+  if (isIntrinsicWithCCAndChain(Op, Opcode, CCValid)) {
+    assert(Op->getNumValues() == 2 && "Expected only CC result and chain");
+    SDValue Glued = emitIntrinsicWithChainAndGlue(DAG, Op, Opcode);
+    SDValue CC = getCCResult(DAG, Glued.getNode());
+    DAG.ReplaceAllUsesOfValueWith(SDValue(Op.getNode(), 0), CC);
+    return SDValue();
+  }
+
+  return SDValue();
+}
+
 SDValue SystemZTargetLowering::LowerOperation(SDValue Op,
                                               SelectionDAG &DAG) const {
   switch (Op.getOpcode()) {
@@ -2634,6 +2763,8 @@ SDValue SystemZTargetLowering::LowerOper
     return lowerSTACKRESTORE(Op, DAG);
   case ISD::PREFETCH:
     return lowerPREFETCH(Op, DAG);
+  case ISD::INTRINSIC_W_CHAIN:
+    return lowerINTRINSIC_W_CHAIN(Op, DAG);
   default:
     llvm_unreachable("Unexpected node to lower");
   }
@@ -2674,6 +2805,9 @@ const char *SystemZTargetLowering::getTa
     OPCODE(SEARCH_STRING);
     OPCODE(IPM);
     OPCODE(SERIALIZE);
+    OPCODE(TBEGIN);
+    OPCODE(TBEGIN_NOFLOAT);
+    OPCODE(TEND);
     OPCODE(ATOMIC_SWAPW);
     OPCODE(ATOMIC_LOADW_ADD);
     OPCODE(ATOMIC_LOADW_SUB);
@@ -3501,6 +3635,50 @@ SystemZTargetLowering::emitStringWrapper
   return DoneMBB;
 }
 
+// Update TBEGIN instruction with final opcode and register clobbers.
+MachineBasicBlock *
+SystemZTargetLowering::emitTransactionBegin(MachineInstr *MI,
+                                            MachineBasicBlock *MBB,
+                                            unsigned Opcode,
+                                            bool NoFloat) const {
+  MachineFunction &MF = *MBB->getParent();
+  const TargetFrameLowering *TFI = Subtarget.getFrameLowering();
+  const SystemZInstrInfo *TII = Subtarget.getInstrInfo();
+
+  // Update opcode.
+  MI->setDesc(TII->get(Opcode));
+
+  // We cannot handle a TBEGIN that clobbers the stack or frame pointer.
+  // Make sure to add the corresponding GRSM bits if they are missing.
+  uint64_t Control = MI->getOperand(2).getImm();
+  static const unsigned GPRControlBit[16] = {
+    0x8000, 0x8000, 0x4000, 0x4000, 0x2000, 0x2000, 0x1000, 0x1000,
+    0x0800, 0x0800, 0x0400, 0x0400, 0x0200, 0x0200, 0x0100, 0x0100
+  };
+  Control |= GPRControlBit[15];
+  if (TFI->hasFP(MF))
+    Control |= GPRControlBit[11];
+  MI->getOperand(2).setImm(Control);
+
+  // Add GPR clobbers.
+  for (int I = 0; I < 16; I++) {
+    if ((Control & GPRControlBit[I]) == 0) {
+      unsigned Reg = SystemZMC::GR64Regs[I];
+      MI->addOperand(MachineOperand::CreateReg(Reg, true, true));
+    }
+  }
+
+  // Add FPR clobbers.
+  if (!NoFloat && (Control & 4) != 0) {
+    for (int I = 0; I < 16; I++) {
+      unsigned Reg = SystemZMC::FP64Regs[I];
+      MI->addOperand(MachineOperand::CreateReg(Reg, true, true));
+    }
+  }
+
+  return MBB;
+}
+
 MachineBasicBlock *SystemZTargetLowering::
 EmitInstrWithCustomInserter(MachineInstr *MI, MachineBasicBlock *MBB) const {
   switch (MI->getOpcode()) {
@@ -3742,6 +3920,12 @@ EmitInstrWithCustomInserter(MachineInstr
     return emitStringWrapper(MI, MBB, SystemZ::MVST);
   case SystemZ::SRSTLoop:
     return emitStringWrapper(MI, MBB, SystemZ::SRST);
+  case SystemZ::TBEGIN:
+    return emitTransactionBegin(MI, MBB, SystemZ::TBEGIN, false);
+  case SystemZ::TBEGIN_nofloat:
+    return emitTransactionBegin(MI, MBB, SystemZ::TBEGIN, true);
+  case SystemZ::TBEGINC:
+    return emitTransactionBegin(MI, MBB, SystemZ::TBEGINC, true);
   default:
     llvm_unreachable("Unexpected instr type to insert");
   }
Index: llvm-head/test/CodeGen/SystemZ/htm-intrinsics.ll
===================================================================
--- /dev/null
+++ llvm-head/test/CodeGen/SystemZ/htm-intrinsics.ll
@@ -0,0 +1,352 @@
+; Test transactional-execution intrinsics.
+;
+; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=zEC12 | FileCheck %s
+
+declare i32 @llvm.s390.tbegin(i8 *, i32)
+declare i32 @llvm.s390.tbegin.nofloat(i8 *, i32)
+declare void @llvm.s390.tbeginc(i8 *, i32)
+declare i32 @llvm.s390.tend()
+declare void @llvm.s390.tabort(i64)
+declare void @llvm.s390.ntstg(i64, i64 *)
+declare i32 @llvm.s390.etnd()
+declare void @llvm.s390.ppa.txassist(i32)
+
+; TBEGIN.
+define void @test_tbegin() {
+; CHECK-LABEL: test_tbegin:
+; CHECK-NOT: stmg
+; CHECK: std %f8,
+; CHECK: std %f9,
+; CHECK: std %f10,
+; CHECK: std %f11,
+; CHECK: std %f12,
+; CHECK: std %f13,
+; CHECK: std %f14,
+; CHECK: std %f15,
+; CHECK: tbegin 0, 65292
+; CHECK: ld %f8,
+; CHECK: ld %f9,
+; CHECK: ld %f10,
+; CHECK: ld %f11,
+; CHECK: ld %f12,
+; CHECK: ld %f13,
+; CHECK: ld %f14,
+; CHECK: ld %f15,
+; CHECK: br %r14
+  call i32 @llvm.s390.tbegin(i8 *null, i32 65292)
+  ret void
+}
+
+; TBEGIN (nofloat).
+define void @test_tbegin_nofloat1() {
+; CHECK-LABEL: test_tbegin_nofloat1:
+; CHECK-NOT: stmg
+; CHECK-NOT: std
+; CHECK: tbegin 0, 65292
+; CHECK: br %r14
+  call i32 @llvm.s390.tbegin.nofloat(i8 *null, i32 65292)
+  ret void
+}
+
+; TBEGIN (nofloat) with integer CC return value.
+define i32 @test_tbegin_nofloat2() {
+; CHECK-LABEL: test_tbegin_nofloat2:
+; CHECK-NOT: stmg
+; CHECK-NOT: std
+; CHECK: tbegin 0, 65292
+; CHECK: ipm %r2
+; CHECK: srl %r2, 28
+; CHECK: br %r14
+  %res = call i32 @llvm.s390.tbegin.nofloat(i8 *null, i32 65292)
+  ret i32 %res
+}
+
+; TBEGIN (nofloat) with implicit CC check.
+define void @test_tbegin_nofloat3(i32 *%ptr) {
+; CHECK-LABEL: test_tbegin_nofloat3:
+; CHECK-NOT: stmg
+; CHECK-NOT: std
+; CHECK: tbegin 0, 65292
+; CHECK: jnh  {{\.L*}}
+; CHECK: mvhi 0(%r2), 0
+; CHECK: br %r14
+  %res = call i32 @llvm.s390.tbegin.nofloat(i8 *null, i32 65292)
+  %cmp = icmp eq i32 %res, 2
+  br i1 %cmp, label %if.then, label %if.end
+
+if.then:                                          ; preds = %entry
+  store i32 0, i32* %ptr, align 4
+  br label %if.end
+
+if.end:                                           ; preds = %if.then, %entry
+  ret void
+}
+
+; TBEGIN (nofloat) with dual CC use.
+define i32 @test_tbegin_nofloat4(i32 %pad, i32 *%ptr) {
+; CHECK-LABEL: test_tbegin_nofloat4:
+; CHECK-NOT: stmg
+; CHECK-NOT: std
+; CHECK: tbegin 0, 65292
+; CHECK: ipm %r2
+; CHECK: srl %r2, 28
+; CHECK: cijlh %r2, 2,  {{\.L*}}
+; CHECK: mvhi 0(%r3), 0
+; CHECK: br %r14
+  %res = call i32 @llvm.s390.tbegin.nofloat(i8 *null, i32 65292)
+  %cmp = icmp eq i32 %res, 2
+  br i1 %cmp, label %if.then, label %if.end
+
+if.then:                                          ; preds = %entry
+  store i32 0, i32* %ptr, align 4
+  br label %if.end
+
+if.end:                                           ; preds = %if.then, %entry
+  ret i32 %res
+}
+
+; TBEGIN (nofloat) with register.
+define void @test_tbegin_nofloat5(i8 *%ptr) {
+; CHECK-LABEL: test_tbegin_nofloat5:
+; CHECK-NOT: stmg
+; CHECK-NOT: std
+; CHECK: tbegin 0(%r2), 65292
+; CHECK: br %r14
+  call i32 @llvm.s390.tbegin.nofloat(i8 *%ptr, i32 65292)
+  ret void
+}
+
+; TBEGIN (nofloat) with GRSM 0x0f00.
+define void @test_tbegin_nofloat6() {
+; CHECK-LABEL: test_tbegin_nofloat6:
+; CHECK: stmg %r6, %r15,
+; CHECK-NOT: std
+; CHECK: tbegin 0, 3840
+; CHECK: br %r14
+  call i32 @llvm.s390.tbegin.nofloat(i8 *null, i32 3840)
+  ret void
+}
+
+; TBEGIN (nofloat) with GRSM 0xf100.
+define void @test_tbegin_nofloat7() {
+; CHECK-LABEL: test_tbegin_nofloat7:
+; CHECK: stmg %r8, %r15,
+; CHECK-NOT: std
+; CHECK: tbegin 0, 61696
+; CHECK: br %r14
+  call i32 @llvm.s390.tbegin.nofloat(i8 *null, i32 61696)
+  ret void
+}
+
+; TBEGIN (nofloat) with GRSM 0xfe00 -- stack pointer added automatically.
+define void @test_tbegin_nofloat8() {
+; CHECK-LABEL: test_tbegin_nofloat8:
+; CHECK-NOT: stmg
+; CHECK-NOT: std
+; CHECK: tbegin 0, 65280
+; CHECK: br %r14
+  call i32 @llvm.s390.tbegin.nofloat(i8 *null, i32 65024)
+  ret void
+}
+
+; TBEGIN (nofloat) with GRSM 0xfb00 -- no frame pointer needed.
+define void @test_tbegin_nofloat9() {
+; CHECK-LABEL: test_tbegin_nofloat9:
+; CHECK: stmg %r10, %r15,
+; CHECK-NOT: std
+; CHECK: tbegin 0, 64256
+; CHECK: br %r14
+  call i32 @llvm.s390.tbegin.nofloat(i8 *null, i32 64256)
+  ret void
+}
+
+; TBEGIN (nofloat) with GRSM 0xfb00 -- frame pointer added automatically.
+define void @test_tbegin_nofloat10(i64 %n) {
+; CHECK-LABEL: test_tbegin_nofloat10:
+; CHECK: stmg %r11, %r15,
+; CHECK-NOT: std
+; CHECK: tbegin 0, 65280
+; CHECK: br %r14
+  %buf = alloca i8, i64 %n
+  call i32 @llvm.s390.tbegin.nofloat(i8 *null, i32 64256)
+  ret void
+}
+
+; TBEGINC.
+define void @test_tbeginc() {
+; CHECK-LABEL: test_tbeginc:
+; CHECK-NOT: stmg
+; CHECK-NOT: std
+; CHECK: tbeginc 0, 65288
+; CHECK: br %r14
+  call void @llvm.s390.tbeginc(i8 *null, i32 65288)
+  ret void
+}
+
+; TEND with integer CC return value.
+define i32 @test_tend1() {
+; CHECK-LABEL: test_tend1:
+; CHECK: tend
+; CHECK: ipm %r2
+; CHECK: srl %r2, 28
+; CHECK: br %r14
+  %res = call i32 @llvm.s390.tend()
+  ret i32 %res
+}
+
+; TEND with implicit CC check.
+define void @test_tend3(i32 *%ptr) {
+; CHECK-LABEL: test_tend3:
+; CHECK: tend
+; CHECK: je  {{\.L*}}
+; CHECK: mvhi 0(%r2), 0
+; CHECK: br %r14
+  %res = call i32 @llvm.s390.tend()
+  %cmp = icmp eq i32 %res, 2
+  br i1 %cmp, label %if.then, label %if.end
+
+if.then:                                          ; preds = %entry
+  store i32 0, i32* %ptr, align 4
+  br label %if.end
+
+if.end:                                           ; preds = %if.then, %entry
+  ret void
+}
+
+; TEND with dual CC use.
+define i32 @test_tend2(i32 %pad, i32 *%ptr) {
+; CHECK-LABEL: test_tend2:
+; CHECK: tend
+; CHECK: ipm %r2
+; CHECK: srl %r2, 28
+; CHECK: cijlh %r2, 2,  {{\.L*}}
+; CHECK: mvhi 0(%r3), 0
+; CHECK: br %r14
+  %res = call i32 @llvm.s390.tend()
+  %cmp = icmp eq i32 %res, 2
+  br i1 %cmp, label %if.then, label %if.end
+
+if.then:                                          ; preds = %entry
+  store i32 0, i32* %ptr, align 4
+  br label %if.end
+
+if.end:                                           ; preds = %if.then, %entry
+  ret i32 %res
+}
+
+; TABORT with register only.
+define void @test_tabort1(i64 %val) {
+; CHECK-LABEL: test_tabort1:
+; CHECK: tabort 0(%r2)
+; CHECK: br %r14
+  call void @llvm.s390.tabort(i64 %val)
+  ret void
+}
+
+; TABORT with immediate only.
+define void @test_tabort2(i64 %val) {
+; CHECK-LABEL: test_tabort2:
+; CHECK: tabort 1234
+; CHECK: br %r14
+  call void @llvm.s390.tabort(i64 1234)
+  ret void
+}
+
+; TABORT with register + immediate.
+define void @test_tabort3(i64 %val) {
+; CHECK-LABEL: test_tabort3:
+; CHECK: tabort 1234(%r2)
+; CHECK: br %r14
+  %sum = add i64 %val, 1234
+  call void @llvm.s390.tabort(i64 %sum)
+  ret void
+}
+
+; TABORT with out-of-range immediate.
+define void @test_tabort4(i64 %val) {
+; CHECK-LABEL: test_tabort4:
+; CHECK: tabort 0({{%r[1-5]}})
+; CHECK: br %r14
+  call void @llvm.s390.tabort(i64 4096)
+  ret void
+}
+
+; NTSTG with base pointer only.
+define void @test_ntstg1(i64 *%ptr, i64 %val) {
+; CHECK-LABEL: test_ntstg1:
+; CHECK: ntstg %r3, 0(%r2)
+; CHECK: br %r14
+  call void @llvm.s390.ntstg(i64 %val, i64 *%ptr)
+  ret void
+}
+
+; NTSTG with base and index.
+; Check that VSTL doesn't allow an index.
+define void @test_ntstg2(i64 *%base, i64 %index, i64 %val) {
+; CHECK-LABEL: test_ntstg2:
+; CHECK: sllg [[REG:%r[1-5]]], %r3, 3
+; CHECK: ntstg %r4, 0([[REG]],%r2)
+; CHECK: br %r14
+  %ptr = getelementptr i64, i64 *%base, i64 %index
+  call void @llvm.s390.ntstg(i64 %val, i64 *%ptr)
+  ret void
+}
+
+; NTSTG with the highest in-range displacement.
+define void @test_ntstg3(i64 *%base, i64 %val) {
+; CHECK-LABEL: test_ntstg3:
+; CHECK: ntstg %r3, 524280(%r2)
+; CHECK: br %r14
+  %ptr = getelementptr i64, i64 *%base, i64 65535
+  call void @llvm.s390.ntstg(i64 %val, i64 *%ptr)
+  ret void
+}
+
+; NTSTG with an out-of-range positive displacement.
+define void @test_ntstg4(i64 *%base, i64 %val) {
+; CHECK-LABEL: test_ntstg4:
+; CHECK: ntstg %r3, 0({{%r[1-5]}})
+; CHECK: br %r14
+  %ptr = getelementptr i64, i64 *%base, i64 65536
+  call void @llvm.s390.ntstg(i64 %val, i64 *%ptr)
+  ret void
+}
+
+; NTSTG with the lowest in-range displacement.
+define void @test_ntstg5(i64 *%base, i64 %val) {
+; CHECK-LABEL: test_ntstg5:
+; CHECK: ntstg %r3, -524288(%r2)
+; CHECK: br %r14
+  %ptr = getelementptr i64, i64 *%base, i64 -65536
+  call void @llvm.s390.ntstg(i64 %val, i64 *%ptr)
+  ret void
+}
+
+; NTSTG with an out-of-range negative displacement.
+define void @test_ntstg6(i64 *%base, i64 %val) {
+; CHECK-LABEL: test_ntstg6:
+; CHECK: ntstg %r3, 0({{%r[1-5]}})
+; CHECK: br %r14
+  %ptr = getelementptr i64, i64 *%base, i64 -65537
+  call void @llvm.s390.ntstg(i64 %val, i64 *%ptr)
+  ret void
+}
+
+; ETND.
+define i32 @test_etnd() {
+; CHECK-LABEL: test_etnd:
+; CHECK: etnd %r2
+; CHECK: br %r14
+  %res = call i32 @llvm.s390.etnd()
+  ret i32 %res
+}
+
+; PPA (Transaction-Abort Assist)
+define void @test_ppa_txassist(i32 %val) {
+; CHECK-LABEL: test_ppa_txassist:
+; CHECK: ppa %r2, 0, 1
+; CHECK: br %r14
+  call void @llvm.s390.ppa.txassist(i32 %val)
+  ret void
+}
+
Index: llvm-head/test/MC/SystemZ/insn-bad-zEC12.s
===================================================================
--- llvm-head.orig/test/MC/SystemZ/insn-bad-zEC12.s
+++ llvm-head/test/MC/SystemZ/insn-bad-zEC12.s
@@ -3,6 +3,22 @@
 # RUN: FileCheck < %t %s
 
 #CHECK: error: invalid operand
+#CHECK: ntstg	%r0, -524289
+#CHECK: error: invalid operand
+#CHECK: ntstg	%r0, 524288
+
+	ntstg	%r0, -524289
+	ntstg	%r0, 524288
+
+#CHECK: error: invalid operand
+#CHECK: ppa	%r0, %r0, -1
+#CHECK: error: invalid operand
+#CHECK: ppa	%r0, %r0, 16
+
+	ppa	%r0, %r0, -1
+	ppa	%r0, %r0, 16
+
+#CHECK: error: invalid operand
 #CHECK: risbgn	%r0,%r0,0,0,-1
 #CHECK: error: invalid operand
 #CHECK: risbgn	%r0,%r0,0,0,64
@@ -22,3 +38,47 @@
 	risbgn	%r0,%r0,-1,0,0
 	risbgn	%r0,%r0,256,0,0
 
+#CHECK: error: invalid operand
+#CHECK: tabort	-1
+#CHECK: error: invalid operand
+#CHECK: tabort	4096
+#CHECK: error: invalid use of indexed addressing
+#CHECK: tabort	0(%r1,%r2)
+
+	tabort	-1
+	tabort	4096
+	tabort	0(%r1,%r2)
+
+#CHECK: error: invalid operand
+#CHECK: tbegin	-1, 0
+#CHECK: error: invalid operand
+#CHECK: tbegin	4096, 0
+#CHECK: error: invalid use of indexed addressing
+#CHECK: tbegin	0(%r1,%r2), 0
+#CHECK: error: invalid operand
+#CHECK: tbegin	0, -1
+#CHECK: error: invalid operand
+#CHECK: tbegin	0, 65536
+
+	tbegin	-1, 0
+	tbegin	4096, 0
+	tbegin	0(%r1,%r2), 0
+	tbegin	0, -1
+	tbegin	0, 65536
+
+#CHECK: error: invalid operand
+#CHECK: tbeginc	-1, 0
+#CHECK: error: invalid operand
+#CHECK: tbeginc	4096, 0
+#CHECK: error: invalid use of indexed addressing
+#CHECK: tbeginc	0(%r1,%r2), 0
+#CHECK: error: invalid operand
+#CHECK: tbeginc	0, -1
+#CHECK: error: invalid operand
+#CHECK: tbeginc	0, 65536
+
+	tbeginc	-1, 0
+	tbeginc	4096, 0
+	tbeginc	0(%r1,%r2), 0
+	tbeginc	0, -1
+	tbeginc	0, 65536
Index: llvm-head/test/MC/SystemZ/insn-good-zEC12.s
===================================================================
--- llvm-head.orig/test/MC/SystemZ/insn-good-zEC12.s
+++ llvm-head/test/MC/SystemZ/insn-good-zEC12.s
@@ -1,6 +1,48 @@
 # For zEC12 and above.
 # RUN: llvm-mc -triple s390x-linux-gnu -mcpu=zEC12 -show-encoding %s | FileCheck %s
 
+#CHECK: etnd	%r0                     # encoding: [0xb2,0xec,0x00,0x00]
+#CHECK: etnd	%r15                    # encoding: [0xb2,0xec,0x00,0xf0]
+#CHECK: etnd	%r7                     # encoding: [0xb2,0xec,0x00,0x70]
+
+	etnd	%r0
+	etnd	%r15
+	etnd	%r7
+
+#CHECK: ntstg	%r0, -524288            # encoding: [0xe3,0x00,0x00,0x00,0x80,0x25]
+#CHECK: ntstg	%r0, -1                 # encoding: [0xe3,0x00,0x0f,0xff,0xff,0x25]
+#CHECK: ntstg	%r0, 0                  # encoding: [0xe3,0x00,0x00,0x00,0x00,0x25]
+#CHECK: ntstg	%r0, 1                  # encoding: [0xe3,0x00,0x00,0x01,0x00,0x25]
+#CHECK: ntstg	%r0, 524287             # encoding: [0xe3,0x00,0x0f,0xff,0x7f,0x25]
+#CHECK: ntstg	%r0, 0(%r1)             # encoding: [0xe3,0x00,0x10,0x00,0x00,0x25]
+#CHECK: ntstg	%r0, 0(%r15)            # encoding: [0xe3,0x00,0xf0,0x00,0x00,0x25]
+#CHECK: ntstg	%r0, 524287(%r1,%r15)   # encoding: [0xe3,0x01,0xff,0xff,0x7f,0x25]
+#CHECK: ntstg	%r0, 524287(%r15,%r1)   # encoding: [0xe3,0x0f,0x1f,0xff,0x7f,0x25]
+#CHECK: ntstg	%r15, 0                 # encoding: [0xe3,0xf0,0x00,0x00,0x00,0x25]
+
+	ntstg	%r0, -524288
+	ntstg	%r0, -1
+	ntstg	%r0, 0
+	ntstg	%r0, 1
+	ntstg	%r0, 524287
+	ntstg	%r0, 0(%r1)
+	ntstg	%r0, 0(%r15)
+	ntstg	%r0, 524287(%r1,%r15)
+	ntstg	%r0, 524287(%r15,%r1)
+	ntstg	%r15, 0
+
+#CHECK: ppa	%r0, %r0, 0             # encoding: [0xb2,0xe8,0x00,0x00]
+#CHECK: ppa	%r0, %r0, 15            # encoding: [0xb2,0xe8,0xf0,0x00]
+#CHECK: ppa	%r0, %r15, 0            # encoding: [0xb2,0xe8,0x00,0x0f]
+#CHECK: ppa	%r4, %r6, 7             # encoding: [0xb2,0xe8,0x70,0x46]
+#CHECK: ppa	%r15, %r0, 0            # encoding: [0xb2,0xe8,0x00,0xf0]
+
+	ppa	%r0, %r0, 0
+	ppa	%r0, %r0, 15
+	ppa	%r0, %r15, 0
+	ppa	%r4, %r6, 7
+	ppa	%r15, %r0, 0
+
 #CHECK: risbgn	%r0, %r0, 0, 0, 0       # encoding: [0xec,0x00,0x00,0x00,0x00,0x59]
 #CHECK: risbgn	%r0, %r0, 0, 0, 63      # encoding: [0xec,0x00,0x00,0x00,0x3f,0x59]
 #CHECK: risbgn	%r0, %r0, 0, 255, 0     # encoding: [0xec,0x00,0x00,0xff,0x00,0x59]
@@ -17,3 +59,68 @@
 	risbgn	%r15,%r0,0,0,0
 	risbgn	%r4,%r5,6,7,8
 
+#CHECK: tabort	0                       # encoding: [0xb2,0xfc,0x00,0x00]
+#CHECK: tabort	0(%r1)                  # encoding: [0xb2,0xfc,0x10,0x00]
+#CHECK: tabort	0(%r15)                 # encoding: [0xb2,0xfc,0xf0,0x00]
+#CHECK: tabort	4095                    # encoding: [0xb2,0xfc,0x0f,0xff]
+#CHECK: tabort	4095(%r1)               # encoding: [0xb2,0xfc,0x1f,0xff]
+#CHECK: tabort	4095(%r15)              # encoding: [0xb2,0xfc,0xff,0xff]
+
+	tabort	0
+	tabort	0(%r1)
+	tabort	0(%r15)
+	tabort	4095
+	tabort	4095(%r1)
+	tabort	4095(%r15)
+
+#CHECK: tbegin	0, 0                    # encoding: [0xe5,0x60,0x00,0x00,0x00,0x00]
+#CHECK: tbegin	4095, 0                 # encoding: [0xe5,0x60,0x0f,0xff,0x00,0x00]
+#CHECK: tbegin	0, 0                    # encoding: [0xe5,0x60,0x00,0x00,0x00,0x00]
+#CHECK: tbegin	0, 1                    # encoding: [0xe5,0x60,0x00,0x00,0x00,0x01]
+#CHECK: tbegin	0, 32767                # encoding: [0xe5,0x60,0x00,0x00,0x7f,0xff]
+#CHECK: tbegin	0, 32768                # encoding: [0xe5,0x60,0x00,0x00,0x80,0x00]
+#CHECK: tbegin	0, 65535                # encoding: [0xe5,0x60,0x00,0x00,0xff,0xff]
+#CHECK: tbegin	0(%r1), 42              # encoding: [0xe5,0x60,0x10,0x00,0x00,0x2a]
+#CHECK: tbegin	0(%r15), 42             # encoding: [0xe5,0x60,0xf0,0x00,0x00,0x2a]
+#CHECK: tbegin	4095(%r1), 42           # encoding: [0xe5,0x60,0x1f,0xff,0x00,0x2a]
+#CHECK: tbegin	4095(%r15), 42          # encoding: [0xe5,0x60,0xff,0xff,0x00,0x2a]
+
+	tbegin	0, 0
+	tbegin	4095, 0
+	tbegin	0, 0
+	tbegin	0, 1
+	tbegin	0, 32767
+	tbegin	0, 32768
+	tbegin	0, 65535
+	tbegin	0(%r1), 42
+	tbegin	0(%r15), 42
+	tbegin	4095(%r1), 42
+	tbegin	4095(%r15), 42
+
+#CHECK: tbeginc	0, 0                    # encoding: [0xe5,0x61,0x00,0x00,0x00,0x00]
+#CHECK: tbeginc	4095, 0                 # encoding: [0xe5,0x61,0x0f,0xff,0x00,0x00]
+#CHECK: tbeginc	0, 0                    # encoding: [0xe5,0x61,0x00,0x00,0x00,0x00]
+#CHECK: tbeginc	0, 1                    # encoding: [0xe5,0x61,0x00,0x00,0x00,0x01]
+#CHECK: tbeginc	0, 32767                # encoding: [0xe5,0x61,0x00,0x00,0x7f,0xff]
+#CHECK: tbeginc	0, 32768                # encoding: [0xe5,0x61,0x00,0x00,0x80,0x00]
+#CHECK: tbeginc	0, 65535                # encoding: [0xe5,0x61,0x00,0x00,0xff,0xff]
+#CHECK: tbeginc	0(%r1), 42              # encoding: [0xe5,0x61,0x10,0x00,0x00,0x2a]
+#CHECK: tbeginc	0(%r15), 42             # encoding: [0xe5,0x61,0xf0,0x00,0x00,0x2a]
+#CHECK: tbeginc	4095(%r1), 42           # encoding: [0xe5,0x61,0x1f,0xff,0x00,0x2a]
+#CHECK: tbeginc	4095(%r15), 42          # encoding: [0xe5,0x61,0xff,0xff,0x00,0x2a]
+
+	tbeginc	0, 0
+	tbeginc	4095, 0
+	tbeginc	0, 0
+	tbeginc	0, 1
+	tbeginc	0, 32767
+	tbeginc	0, 32768
+	tbeginc	0, 65535
+	tbeginc	0(%r1), 42
+	tbeginc	0(%r15), 42
+	tbeginc	4095(%r1), 42
+	tbeginc	4095(%r15), 42
+
+#CHECK: tend                            # encoding: [0xb2,0xf8,0x00,0x00]
+
+	tend
Index: llvm-head/test/MC/SystemZ/insn-bad-z196.s
===================================================================
--- llvm-head.orig/test/MC/SystemZ/insn-bad-z196.s
+++ llvm-head/test/MC/SystemZ/insn-bad-z196.s
@@ -244,6 +244,11 @@
 	cxlgbr	%f0, 16, %r0, 0
 	cxlgbr	%f2, 0, %r0, 0
 
+#CHECK: error: {{(instruction requires: transactional-execution)?}}
+#CHECK: etnd	%r7
+
+	etnd	%r7
+
 #CHECK: error: invalid operand
 #CHECK: fidbra	%f0, 0, %f0, -1
 #CHECK: error: invalid operand
@@ -546,6 +551,16 @@
 	locr	%r0,%r0,-1
 	locr	%r0,%r0,16
 
+#CHECK: error: {{(instruction requires: transactional-execution)?}}
+#CHECK: ntstg	%r0, 524287(%r1,%r15)
+
+	ntstg	%r0, 524287(%r1,%r15)
+
+#CHECK: error: {{(instruction requires: processor-assist)?}}
+#CHECK: ppa	%r4, %r6, 7
+
+	ppa	%r4, %r6, 7
+
 #CHECK: error: {{(instruction requires: miscellaneous-extensions)?}}
 #CHECK: risbgn	%r1, %r2, 0, 0, 0
 
@@ -690,3 +705,24 @@
 	stocg	%r0,-524289,1
 	stocg	%r0,524288,1
 	stocg	%r0,0(%r1,%r2),1
+
+#CHECK: error: {{(instruction requires: transactional-execution)?}}
+#CHECK: tabort	4095(%r1)
+
+	tabort	4095(%r1)
+
+#CHECK: error: {{(instruction requires: transactional-execution)?}}
+#CHECK: tbegin	4095(%r1), 42
+
+	tbegin	4095(%r1), 42
+
+#CHECK: error: {{(instruction requires: transactional-execution)?}}
+#CHECK: tbeginc	4095(%r1), 42
+
+	tbeginc	4095(%r1), 42
+
+#CHECK: error: {{(instruction requires: transactional-execution)?}}
+#CHECK: tend
+
+	tend
+
Index: llvm-head/test/MC/Disassembler/SystemZ/insns.txt
===================================================================
--- llvm-head.orig/test/MC/Disassembler/SystemZ/insns.txt
+++ llvm-head/test/MC/Disassembler/SystemZ/insns.txt
@@ -2503,6 +2503,15 @@
 # CHECK: ear %r15, %a15
 0xb2 0x4f 0x00 0xff
 
+# CHECK: etnd %r0
+0xb2 0xec 0x00 0x00
+
+# CHECK: etnd %r15
+0xb2 0xec 0x00 0xf0
+
+# CHECK: etnd %r7
+0xb2 0xec 0x00 0x70
+
 # CHECK: fidbr %f0, 0, %f0
 0xb3 0x5f 0x00 0x00
 
@@ -6034,6 +6043,36 @@
 # CHECK: ny %r15, 0
 0xe3 0xf0 0x00 0x00 0x00 0x54
 
+# CHECK: ntstg %r0, -524288
+0xe3 0x00 0x00 0x00 0x80 0x25
+
+# CHECK: ntstg %r0, -1
+0xe3 0x00 0x0f 0xff 0xff 0x25
+
+# CHECK: ntstg %r0, 0
+0xe3 0x00 0x00 0x00 0x00 0x25
+
+# CHECK: ntstg %r0, 1
+0xe3 0x00 0x00 0x01 0x00 0x25
+
+# CHECK: ntstg %r0, 524287
+0xe3 0x00 0x0f 0xff 0x7f 0x25
+
+# CHECK: ntstg %r0, 0(%r1)
+0xe3 0x00 0x10 0x00 0x00 0x25
+
+# CHECK: ntstg %r0, 0(%r15)
+0xe3 0x00 0xf0 0x00 0x00 0x25
+
+# CHECK: ntstg %r0, 524287(%r1,%r15)
+0xe3 0x01 0xff 0xff 0x7f 0x25
+
+# CHECK: ntstg %r0, 524287(%r15,%r1)
+0xe3 0x0f 0x1f 0xff 0x7f 0x25
+
+# CHECK: ntstg %r15, 0
+0xe3 0xf0 0x00 0x00 0x00 0x25
+
 # CHECK: oc 0(1), 0
 0xd6 0x00 0x00 0x00 0x00 0x00
 
@@ -6346,6 +6385,21 @@
 # CHECK: popcnt %r7, %r8
 0xb9 0xe1 0x00 0x78
 
+# CHECK: ppa %r0, %r0, 0
+0xb2 0xe8 0x00 0x00
+
+# CHECK: ppa %r0, %r0, 15
+0xb2 0xe8 0xf0 0x00
+
+# CHECK: ppa %r0, %r15, 0
+0xb2 0xe8 0x00 0x0f
+
+# CHECK: ppa %r4, %r6, 7
+0xb2 0xe8 0x70 0x46
+
+# CHECK: ppa %r15, %r0, 0
+0xb2 0xe8 0x00 0xf0
+
 # CHECK: risbg %r0, %r0, 0, 0, 0
 0xec 0x00 0x00 0x00 0x00 0x55
 
@@ -8062,6 +8116,93 @@
 # CHECK: sy %r15, 0
 0xe3 0xf0 0x00 0x00 0x00 0x5b
 
+# CHECK: tabort 0
+0xb2 0xfc 0x00 0x00
+
+# CHECK: tabort 0(%r1)
+0xb2 0xfc 0x10 0x00
+
+# CHECK: tabort 0(%r15)
+0xb2 0xfc 0xf0 0x00
+
+# CHECK: tabort 4095
+0xb2 0xfc 0x0f 0xff
+
+# CHECK: tabort 4095(%r1)
+0xb2 0xfc 0x1f 0xff
+
+# CHECK: tabort 4095(%r15)
+0xb2 0xfc 0xff 0xff
+
+# CHECK: tbegin 0, 0
+0xe5 0x60 0x00 0x00 0x00 0x00
+
+# CHECK: tbegin 4095, 0
+0xe5 0x60 0x0f 0xff 0x00 0x00
+
+# CHECK: tbegin 0, 0
+0xe5 0x60 0x00 0x00 0x00 0x00
+
+# CHECK: tbegin 0, 1
+0xe5 0x60 0x00 0x00 0x00 0x01
+
+# CHECK: tbegin 0, 32767
+0xe5 0x60 0x00 0x00 0x7f 0xff
+
+# CHECK: tbegin 0, 32768
+0xe5 0x60 0x00 0x00 0x80 0x00
+
+# CHECK: tbegin 0, 65535
+0xe5 0x60 0x00 0x00 0xff 0xff
+
+# CHECK: tbegin 0(%r1), 42
+0xe5 0x60 0x10 0x00 0x00 0x2a
+
+# CHECK: tbegin 0(%r15), 42
+0xe5 0x60 0xf0 0x00 0x00 0x2a
+
+# CHECK: tbegin 4095(%r1), 42
+0xe5 0x60 0x1f 0xff 0x00 0x2a
+
+# CHECK: tbegin 4095(%r15), 42
+0xe5 0x60 0xff 0xff 0x00 0x2a
+
+# CHECK: tbeginc 0, 0
+0xe5 0x61 0x00 0x00 0x00 0x00
+
+# CHECK: tbeginc 4095, 0
+0xe5 0x61 0x0f 0xff 0x00 0x00
+
+# CHECK: tbeginc 0, 0
+0xe5 0x61 0x00 0x00 0x00 0x00
+
+# CHECK: tbeginc 0, 1
+0xe5 0x61 0x00 0x00 0x00 0x01
+
+# CHECK: tbeginc 0, 32767
+0xe5 0x61 0x00 0x00 0x7f 0xff
+
+# CHECK: tbeginc 0, 32768
+0xe5 0x61 0x00 0x00 0x80 0x00
+
+# CHECK: tbeginc 0, 65535
+0xe5 0x61 0x00 0x00 0xff 0xff
+
+# CHECK: tbeginc 0(%r1), 42
+0xe5 0x61 0x10 0x00 0x00 0x2a
+
+# CHECK: tbeginc 0(%r15), 42
+0xe5 0x61 0xf0 0x00 0x00 0x2a
+
+# CHECK: tbeginc 4095(%r1), 42
+0xe5 0x61 0x1f 0xff 0x00 0x2a
+
+# CHECK: tbeginc 4095(%r15), 42
+0xe5 0x61 0xff 0xff 0x00 0x2a
+
+# CHECK: tend
+0xb2 0xf8 0x00 0x00
+
 # CHECK: tm 0, 0
 0x91 0x00 0x00 0x00
 

llvm-svn: 233803
2015-04-01 12:51:43 +00:00
Hal Finkel 50271aae7e [PowerPC] FastISel can't handle i1 return values when using CR bits
Under normal circumstances, use of CR bits is disabled when running at -O0, but
it is enabled by default otherwise, and if you have optnone functions, they'll
still generally be generated with crbits turned on (because nothing else turns
them off). FastISel can't handle most things dealing with i1 values when using
CR bits, and checks for that, but was not checking the return type on
functions; we can't fast-isel function calls with i1 return values either when
using CR bits for boolean values.

Fixes PR22664.

llvm-svn: 233775
2015-04-01 00:40:48 +00:00
David Majnemer a225a19dd0 [WinEH] Generate .xdata for catch handlers
This lets us catch exceptions in simple cases.

N.B. Things that do not work include (but are not limited to):
- Throwing from within a catch handler.
- Catching an object with a named catch parameter.
- 'CatchHigh' is fictitious, we aren't sure of its purpose.
- We aren't entirely efficient with regards to the number of EH states
  that we generate.
- IP-to-State tables are sensitive to the order of emission.

llvm-svn: 233767
2015-03-31 22:35:44 +00:00
Sanjay Patel 0b464dc4a6 typo; NFC
llvm-svn: 233761
2015-03-31 21:24:47 +00:00
Hal Finkel 52368d4437 [PowerPC] Don't use a vector preferred memory type at -O0
Even at -O0, we fall back to SDAG when we hit intrinsics, and if the intrinsic
is a memset/memcpy/etc. we might normally use vector types. At -O0, this is
probably not a good idea (because, if there is a bug in the lowering code,
there would be no good way to turn it off). At -O0, only use scalar preferred
types.

Related to PR22754.

llvm-svn: 233755
2015-03-31 20:56:09 +00:00
Quentin Colombet 6843ac470b [AArch64] Enable the codegenprepare optimization that promotes operation to form
extended loads.
Implement the related target lowering hook so that the optimization has a better
estimation of the cost of an extension.

rdar://problem/19267165

llvm-svn: 233753
2015-03-31 20:52:32 +00:00
Simon Atanasyan 772944ad3d [Hexagon] Avoid an unused variable warning when assertions are off
No functional changes.

llvm-svn: 233740
2015-03-31 19:43:47 +00:00
Ulrich Weigand 050527b4ac [SystemZ] Address review comments for r233689
Change lowerCTPOP to:
- Gracefully handle a known-zero input value
- Simplify computation of significant bit size

Thanks to Jay Foad for the review!

llvm-svn: 233736
2015-03-31 19:28:50 +00:00
Sanjay Patel 30d589536a [X86, AVX] fix zero-extending integer operand load patterns to use integer instructions
This is a follow-on to r233704 and another partial fix for PR22685:
https://llvm.org/bugs/show_bug.cgi?id=22685

llvm-svn: 233724
2015-03-31 18:43:43 +00:00
Sanjay Patel 2ae9943881 [X86, AVX] try to lowerVectorShuffleAsElementInsertion() for all 256-bit vector sub-types
I suggested this change in D7898 (http://llvm.org/viewvc/llvm-project?view=revision&revision=231354)

It improves the v4i64 case although not optimally. This AVX codegen:

  vmovq {{.*#+}} xmm0 = mem[0],zero
  vxorpd %ymm1, %ymm1, %ymm1
  vblendpd {{.*#+}} ymm0 = ymm0[0],ymm1[1,2,3]

Becomes:

  vmovsd {{.*#+}} xmm0 = mem[0],zero

Unfortunately, this doesn't completely solve PR22685. There are still at least 2 problems under here:

    We're not handling v32i8 / v16i16.
    We're not getting the FP / int domains right for instruction selection.

But since this patch alone appears to do no harm, reduces code duplication, and helps v4i64, 
I'm submitting this patch ahead of fixing the above.

Differential Revision: http://reviews.llvm.org/D8341

llvm-svn: 233704
2015-03-31 16:32:11 +00:00
Ulrich Weigand daa72f5120 [SystemZ] Add Analysis to required_libraries (fall-out from r233688)
Should fix build failures with cmake builds

llvm-svn: 233700
2015-03-31 15:16:13 +00:00
Krzysztof Parzyszek c05dff1792 Expand MUX instructions early on Hexagon
This time with all files included.

llvm-svn: 233696
2015-03-31 13:35:12 +00:00
Krzysztof Parzyszek 8c4fd2bdeb Revert 233694. Weak SVN-fu.
llvm-svn: 233695
2015-03-31 13:32:32 +00:00
Krzysztof Parzyszek 261d62c862 Expand MUX instructions early on Hexagon
llvm-svn: 233694
2015-03-31 13:29:17 +00:00
Vladimir Sukharev 297bf0eae0 [AArch64] Add v8.1a "Rounding Double Multiply Add/Subtract" extension
Reviewers: t.p.northover, jmolloy

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8502

llvm-svn: 233693
2015-03-31 13:15:48 +00:00
Ulrich Weigand 371d10a4eb [SystemZ] Support RISBGN instruction on zEC12
So far, we do not yet support any instruction specific to zEC12.
Most of the facilities added with zEC12 are indeed not very useful
to compiler code generation, but there is one exception: the
miscellaneous-extensions facility provides the RISBGN instruction,
which is a variant of RISBG that does not set the condition code.

Add support for this facility, MC support for RISBGN, and CodeGen
support for prefering RISBGN over RISBG on zEC12, unless we can
actually make use of the condition code set by RISBG.

llvm-svn: 233690
2015-03-31 12:58:17 +00:00
Ulrich Weigand b401218ca2 [SystemZ] Use POPCNT instruction on z196
We already exploit a number of instructions specific to z196,
but not yet POPCNT.  Add support for the population-count
facility, MC support for the POPCNT instruction, CodeGen
support for using POPCNT, and implement the getPopcntSupport
TargetTransformInfo hook.

llvm-svn: 233689
2015-03-31 12:56:33 +00:00
Ulrich Weigand 1f6666a49c [SystemZ] Provide basic TargetTransformInfo implementation
This hooks up the TargetTransformInfo machinery for SystemZ,
and provides an implementation of getIntImmCost.

In addition, the patch adds the isLegalICmpImmediate and
isLegalAddImmediate TargetLowering overrides, and updates
a couple of test cases where we now generate slightly
better code.

llvm-svn: 233688
2015-03-31 12:52:27 +00:00
Rafael Espindola dd3add6c60 Fix the operand encoding in the test instruction.
Fixes pr22995.

llvm-svn: 233686
2015-03-31 12:31:55 +00:00
Ulrich Weigand 7aed6890fd [PowerPC] Remove TargetMachine CPU auto-detection
As was done for X86 in r206094.

llvm-svn: 233684
2015-03-31 12:01:06 +00:00
Ahmed Bougacha 77b5bb7ce2 [X86] Generate MOVNT for all vector types.
We used to miss non-Q YMM integer vectors, and, non-Q/D XMM integer
vectors.
While there, change the v4i32 patterns to prefer MOVNTDQ.

llvm-svn: 233668
2015-03-31 03:16:51 +00:00
Alexei Starovoitov 1819953db5 [bpf] mark mov instructions as ReMaterializable
loading immediate into register is cheap, so take advantage of remat.

llvm-svn: 233666
2015-03-31 02:49:58 +00:00
Quentin Colombet 387a0e7cce [AArch64] Fix poor codegen for add immediate.
We used to match the register variant before the immediate when the register
argument could be implicitly zero-extended.

llvm-svn: 233653
2015-03-31 00:31:13 +00:00
Eric Christopher f8019408dc Replace the MCSubtargetInfo parameter with a Triple when creating
an MCInstPrinter. Update all callers and use where we wanted a Triple
previously.

llvm-svn: 233648
2015-03-31 00:10:04 +00:00
Juergen Ributzka 5fe5ef9e0e Transfer implicit operands when expanding the RET_ReallyLR pseudo instruction.
When we expand the RET_ReallyLR pseudo instruction we also need to transfer the
implicit operands.

The return register is an implicit operand and without it the liveness
calculation generates an incorrect live-out set for the patchpoint.

This fixes rdar://problem/19068476.

llvm-svn: 233635
2015-03-30 22:45:56 +00:00
Alexei Starovoitov 2a90cde4fd [bpf] add support for bswap instructions
BPF has cpu_to_be and cpu_to_le instructions.
For now assume little endian and generate cpu_to_be for ISD::BSWAP.

llvm-svn: 233620
2015-03-30 22:40:40 +00:00
Alexei Starovoitov 78f99be1df [bpf] fix build
fix build broken due to r233599.
Would still need to switch to MDLocation long term.

llvm-svn: 233619
2015-03-30 22:40:35 +00:00
Eric Christopher 9c1bd05949 Remove unused MCSubtargetInfo argument from the X86 MCInstPrinter ctors.
llvm-svn: 233614
2015-03-30 22:16:37 +00:00
Eric Christopher d639cdf427 Remove unused MCSubtargetInfo argument from the Sparc MCInstPrinter ctors.
llvm-svn: 233612
2015-03-30 22:09:43 +00:00
Eric Christopher 48e697f7fe Remove unused MCSubtargetInfo argument from the NVPTX MCInstPrinter ctors.
llvm-svn: 233611
2015-03-30 22:03:16 +00:00
Eric Christopher 7099d51275 Remove unused MCSubtargetInfo argument from the ARM MCInstPrinter ctors.
llvm-svn: 233609
2015-03-30 21:52:28 +00:00
Eric Christopher 2226c72301 Remove unused MCSubtargetInfo argument from the AArch64 MCInstPrinter ctors.
llvm-svn: 233608
2015-03-30 21:52:26 +00:00
Eric Christopher c7c5592b7e Remove unused Target argument from MCInstPrinter ctor functions.
llvm-svn: 233607
2015-03-30 21:52:21 +00:00
Peter Collingbourne 915a4b13ef MC: For variable symbols, maintain MCSymbol::Section as a cache.
This fixes the visibility of symbols in certain edge cases involving aliases
with multiple levels of indirection.

Fixes PR19582.

Differential Revision: http://reviews.llvm.org/D8586

llvm-svn: 233595
2015-03-30 20:41:21 +00:00
Justin Holewinski 0afca26e90 [NVPTX] Associate a minimum PTX version for each SM architecture
When a new SM architecture is introduced, it is only supported by the
current PTX version and later.  Make sure we are using at least the
minimum PTX version for the target architecture.

This also removes support for PTX ISA < 3.2.

llvm-svn: 233583
2015-03-30 19:30:55 +00:00
Duncan P. N. Exon Smith 9dffcd04f7 CodeGen: Use the new DebugLoc API, NFC
Update lib/CodeGen (and lib/Target) to use the new `DebugLoc` API.

llvm-svn: 233582
2015-03-30 19:14:47 +00:00
Justin Holewinski f94d5b5137 [NVPTX] Add options for PTX 4.1/4.2 and SM 3.2/3.7/5.2/5.3
llvm-svn: 233575
2015-03-30 18:12:50 +00:00
Yaron Keren 075759aadd Remove more superfluous .str() and replace std::string concatenation with Twine.
Following r233392, http://llvm.org/viewvc/llvm-project?rev=233392&view=rev.

llvm-svn: 233555
2015-03-30 15:42:36 +00:00
Sanjay Patel bbe1756a8c more space; NFC
llvm-svn: 233554
2015-03-30 15:31:32 +00:00
Ulrich Weigand b8d76fb7ca [SystemZ] Fix LLVM crash on unoptimized code
Compiling the following function with -O0 would crash, since LLVM would
hit an assertion in getTestUnderMaskCond:

  int test(unsigned long x)
  {
    return x >= 0 && x <= 15;
  }

Fixed by detecting the case in the caller of getTestUnderMaskCond.

llvm-svn: 233541
2015-03-30 13:46:59 +00:00
Ulrich Weigand 58bb263eed [SystemZ] Remove TargetMachine CPU auto-detection
As was done for X86 in r206094.

llvm-svn: 233540
2015-03-30 13:46:25 +00:00
Daniel Sanders 82df616d8e [mips] Support 9-bit offsets for the 'R' inline assembly memory constraint.
Summary:
The 'R' constraint is actually supposed to be much more complicated than
this and is defined in terms of whether it will cause macro expansion in
the assembler. 'R' is getting less useful due to architecture changes and
ought to be replaced by other constraints. We therefore implement 9-bit
offsets which will work for all subtargets and all instructions.

Reviewers: vkalintiris

Reviewed By: vkalintiris

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8440

llvm-svn: 233537
2015-03-30 13:27:25 +00:00
Elena Demikhovsky d8fda62247 AVX-512: blank lines, duplicated tests, no functional changes
see comments http://reviews.llvm.org/D6835

llvm-svn: 233528
2015-03-30 09:29:28 +00:00
Elena Demikhovsky 98de9d6360 AVX-512: added intrinsics for VPAND, VPOR and VPXOR
by Asaf Badouh (asaf.badouh@intel.com)

llvm-svn: 233525
2015-03-30 08:30:34 +00:00