Commit Graph

32357 Commits

Author SHA1 Message Date
David Blaikie 01a9a412a7 Avoid copying LiveInterval, this could lead to a double-delete
llvm-svn: 231154
2015-03-03 22:25:48 +00:00
David Blaikie 7f1e0565b3 Revert "Remove the explicit SDNodeIterator::operator= in favor of the implicit default"
Accidentally committed a few more of these cleanup changes than
intended. Still breaking these out & tidying them up.

This reverts commit r231135.

llvm-svn: 231136
2015-03-03 21:18:16 +00:00
David Blaikie bb8da4c08f Remove the explicit SDNodeIterator::operator= in favor of the implicit default
There doesn't seem to be any need to assert that iterator assignment is
between iterators over the same node - if you want to reuse an iterator
variable to iterate another node, that's perfectly acceptable. Just
don't mix comparisons between iterators into disjoint sequences, as
usual.

llvm-svn: 231135
2015-03-03 21:17:08 +00:00
Paul Robinson 06a8eb8343 [X86][ELF] Correct relocation for DWARF TLS references
Previously we had only Linux using DTPOFF for these; all X86 ELF
targets should. Fixes a side issue mentioned in PR21077.

Differential Revision: http://reviews.llvm.org/D8011

llvm-svn: 231130
2015-03-03 21:01:27 +00:00
Sanjay Patel 36a2dc895f remove enum value names from comments; NFC
llvm-svn: 231129
2015-03-03 20:58:35 +00:00
Sanjay Patel 948602bd17 use bool operator shortcut; NFC
llvm-svn: 231123
2015-03-03 20:41:27 +00:00
Kit Barton 0cfa7b7ad0 Add the following 64-bit vector integer arithmetic instructions added in POWER8:
vaddudm
vsubudm
vmulesw
vmulosw
vmuleuw
vmulouw
vmuluwm
vmaxsd
vmaxud
vminsd
vminud
vcmpequd
vcmpequd.
vcmpgtsd
vcmpgtsd.
vcmpgtud
vcmpgtud.
vrld
vsld
vsrd
vsrad

Phabricator review: http://reviews.llvm.org/D7959

llvm-svn: 231115
2015-03-03 19:55:45 +00:00
Eric Christopher 43768e311c 80-column fixup.
llvm-svn: 231088
2015-03-03 17:54:39 +00:00
Chad Rosier 8e38f30e49 [AArch64] When combining constant mul of -3, prefer (sub x, (shl x, N)).
This change only effects codegen when the constant is -3.

llvm-svn: 231085
2015-03-03 17:31:01 +00:00
Michael Kuperstein 84dff4c94c [X86][Haswell][SchedModel] Fix patterns for scalar FMA3 variants.
llvm-svn: 231073
2015-03-03 15:47:02 +00:00
Elena Demikhovsky d207f17fa1 AVX-512: Moved patterns for masked load/store under avx_store, avx_load classes.
No functional changes.

llvm-svn: 231069
2015-03-03 15:03:35 +00:00
Craig Topper ef04b2b505 [X86] Remove some unused code from disassembler.
llvm-svn: 231055
2015-03-03 05:24:03 +00:00
Ahmed Bougacha afbd6887c4 [X86] Special-case 2x CMOV when custom-inserting.
This lets us avoid a few copies that are otherwise hard to get rid of.
The way this is done is, the custom-inserter looks at the following
instruction for another CMOV, and replaces both at the same time.
A previous version used a new CMOV2 opcode, but the custom inserter
is expected to be able to return a different basic block anyway, which
means it's OK - though far from ideal - to alter that block's contents.
Explicitly document that, in case it ever makes a difference.
Alternatives welcome!

Follow-up to r231045.

rdar://19767934
Closes http://reviews.llvm.org/D8019

llvm-svn: 231046
2015-03-03 01:21:16 +00:00
Ahmed Bougacha 066d0b8e64 [X86] Combine (cmov (and/or (setcc) (setcc))) into (cmov (cmov)).
Fold and/or of setcc's to double CMOV:

(CMOV F, T, ((cc1 | cc2) != 0)) -> (CMOV (CMOV F, T, cc1), T, cc2)
(CMOV F, T, ((cc1 & cc2) != 0)) -> (CMOV (CMOV T, F, !cc1), F, !cc2)

When we can't use the CMOV instruction, it might increase branch
mispredicts.  When we can, or when there is no mispredict, this
improves throughput and reduces register pressure.

These can't be catched by generic combines, because the pattern can
appear when legalizing some instructions (such as fcmp une).

rdar://19767934
http://reviews.llvm.org/D7634

llvm-svn: 231045
2015-03-03 01:09:14 +00:00
Paul Robinson c583abc888 Remove useless .debug_macinfo section setup.
llvm-svn: 231001
2015-03-02 19:52:42 +00:00
Jan Vesely 468e055f54 R600: Use c++11 style for loop
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 230987
2015-03-02 18:56:52 +00:00
Paul Robinson 9f4cfc574e Revert r230979, should apply to all X86 ELF.
llvm-svn: 230985
2015-03-02 18:50:18 +00:00
Paul Robinson 10ae2e52de [PS4] Correct relocation for DWARF TLS references.
llvm-svn: 230979
2015-03-02 17:44:52 +00:00
Elena Demikhovsky 18fd49602b AVX-512: Add assembly parser support for Rounding mode
By Asaf Badouh <asaf.badouh@intel.com>

llvm-svn: 230962
2015-03-02 15:00:34 +00:00
Benjamin Kramer e86b0f186b NVPTX: Remove dead code.
Fun fact: This file was never referenced since the initial checkin of
the NVPTX backend.

llvm-svn: 230957
2015-03-02 13:16:28 +00:00
Vasileios Kalintiris e741eb2c7d [mips] Optimize conditional moves where RHS is zero.
Summary:
When the RHS of a conditional move node is zero, we can utilize the $zero
register by inverting the conditional move instruction and by swapping the
order of its True/False operands.

Reviewers: dsanders

Differential Revision: http://reviews.llvm.org/D7945

llvm-svn: 230956
2015-03-02 12:47:32 +00:00
Elena Demikhovsky 2689d78909 AVX-512: Simplified MOV patterns, no functional changes.
llvm-svn: 230954
2015-03-02 12:46:21 +00:00
Craig Topper 9c26bcca5a [X86] There are only 8 mask registers. Fail disassembly if instruction tries to reference more.
llvm-svn: 230931
2015-03-02 03:33:11 +00:00
Craig Topper 09b27e7b24 [X86] Fix diassembler crash on AVX512 cmpps/cmppd with immediate that doesn't fit in 5-bits. Fixes PR22743.
llvm-svn: 230924
2015-03-02 00:22:29 +00:00
Sanjoy Das e5d1466ab3 [AArch64] fix an invalid-iterator-use bug.
Summary:
In AArch64PromoteConstant::appendAndTransferDominatedUses,
`InsertPts[NewPt]` invalidates IPI.  Therefore, `InsertPts[NewPt] =
std::move(IPI->second)` is not legal.

This was caught by running `make check` with
http://reviews.llvm.org/D7931.

Reviewers: t.p.northover, grosbach, bkramer

Reviewed By: bkramer

Subscribers: aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D7988

llvm-svn: 230923
2015-03-02 00:17:18 +00:00
Benjamin Kramer 42cc33e816 X86: Replace variadic function with init list. NFC.
llvm-svn: 230911
2015-03-01 21:47:40 +00:00
Benjamin Kramer 030133c5db ArrayRef: Remove the equals helper with many arguments.
With initializer lists there is a really neat idiomatic way to write
this, 'ArrayRef.equals({1, 2, 3, 4, 5})'. Remove the equal method which
always had a hard limit on the number of arguments. I considered
rewriting it with variadic templates but that's not really a good fit
for a function with homogeneous arguments.

'ArrayRef == {1, 2, 3, 4, 5}' would've been even more awesome, but C++11
doesn't allow init lists with binary operators.

llvm-svn: 230907
2015-03-01 21:05:05 +00:00
Benjamin Kramer 7149aabf8b Make some non-constant static variables non-static or fully const.
Otherwise we have to emit thread-safe initialization for them. NFC.

llvm-svn: 230894
2015-03-01 18:09:56 +00:00
Elena Demikhovsky 0995479e67 Reverted 230471 - gather scatter handling in table gen.
llvm-svn: 230892
2015-03-01 08:23:41 +00:00
Elena Demikhovsky 02ffd26023 AVX-512: Added mask and rounding mode for scalar arithmetics
Added more tests for scalar instructions to destinguish between AVX and AVX-512 forms.

llvm-svn: 230891
2015-03-01 07:44:04 +00:00
Craig Topper 782d620657 [X86] Remove the blendpd/blendps/pblendw/pblendd intrinsics. They can represented by shuffle_vector instructions.
llvm-svn: 230860
2015-02-28 19:33:17 +00:00
Alexei Starovoitov 1b7b56fbcc bpf: fix build
complete the plumbing of passing TargetRegisterInfo through
computeRegisterProperties started by r230583

llvm-svn: 230858
2015-02-28 18:03:04 +00:00
Benjamin Kramer 5fbfe2ffdc Convert push_back loops into append calls.
No functionality change intended.

llvm-svn: 230849
2015-02-28 13:20:15 +00:00
Benjamin Kramer f1362f6196 ArrayRefize memory operand folding. NFC.
llvm-svn: 230846
2015-02-28 12:04:00 +00:00
Benjamin Kramer 4f6ac16292 Replace std::copy with a back inserter with vector append where feasible
All of the cases were just appending from random access iterators to a
vector. Using insert/append can grow the vector to the perfect size
directly and moves the growing out of the loop. No intended functionalty
change.

llvm-svn: 230845
2015-02-28 10:11:12 +00:00
Bill Schmidt e3959eb54e [PowerPC] Fix PR22711 - Misaligned .toc section
Straightforward patch to emit an alignment directive when emitting a
TOC entry.  The test case was generated from the test in PR22711 that
demonstrated a misaligned .toc section.  The object code is run
through llvm-readobj to verify that the correct alignment has been
applied to the .toc section.

Thanks to Ulrich Weigand for running down where the fix was needed.

llvm-svn: 230801
2015-02-27 22:14:10 +00:00
Charles Davis 83687fb9e6 Target/X86: Never use the redzone for Win64 ABI functions.
Summary:
Until now, we did this (among other things) based on whether or not the
target was Windows. This is clearly wrong, not just for Win64 ABI functions
on non-Windows, but for System V ABI functions on Windows, too. In this
change, we make this decision based on the ABI the calling convention
specifies instead.

Reviewers: rnk

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7953

llvm-svn: 230793
2015-02-27 21:11:16 +00:00
Hal Finkel 5c3cacf5c0 [PowerPC] Use vector types for memcpy and friends (sometimes)
When using Altivec, we can use vector loads and stores for aligned memcpy and
friends. Starting with the P7 and VXS, we have reasonable unaligned vector
stores. Starting with the P8, we have fast unaligned loads too.

For QPX, we use vector loads are stores, but only for aligned memory accesses.

llvm-svn: 230788
2015-02-27 19:58:28 +00:00
Renato Golin a78995c0a0 Equally to NetBSD, Bitrig/ARM uses the Itanium-ABI.
Patch by Patrick Wildt.

llvm-svn: 230762
2015-02-27 16:35:27 +00:00
Zoran Jovanovic 71a33e2ad6 [mips][microMIPS] Change register class for GP register
Differential Revision: http://reviews.llvm.org/D7934

llvm-svn: 230760
2015-02-27 15:03:50 +00:00
Tom Stellard aec94b3bf3 R600/SI: Add missing mubuf instructions
llvm-svn: 230759
2015-02-27 14:59:46 +00:00
Tom Stellard 49282c92c5 R600/SI: Consistently put soffset before the offset operand for mubuf instructions
This matches the assembly syntax.

llvm-svn: 230758
2015-02-27 14:59:44 +00:00
Tom Stellard 1f9939fba6 R600/SI: Add slc, glc, and tfe to non-atomic _ADDR64 instructions
llvm-svn: 230757
2015-02-27 14:59:41 +00:00
Chandler Carruth 9ad2ffac23 [x86] Run most of the rest of the shuffle combining over non-128-bit
vectors. This lets us fix the rest of the v16 lowering problems when
pshufb is clearly better.

We might still be able to improve some of the lowerings by enabling the
other combine-based rewriting to fire for non-128-bit vectors, but this
at least should remove any regressions from using the fancy v16i16
lowering strategy.

llvm-svn: 230753
2015-02-27 12:13:14 +00:00
Chandler Carruth 66b705bc64 [x86] Teach a bunch of the x86-specific shuffle combining to work with
256-bit vectors as well as 128-bit vectors. Fixes some of the redundant
shuffles for v16i16.

llvm-svn: 230752
2015-02-27 11:45:13 +00:00
Chandler Carruth 97f3260f57 [x86] Make the v8i16 clever single-input shuffle lowering usable for
repeated 128-bit lane shuffles of wider vector types and use it to lower
256-bit v16i16 vector shuffles where applicable.

This should let us perfectly lowering the pattern of pshuflw and pshufhw
even for AVX2 256-bit patterns.

I've not added AVX-512 support, but it should be trivial for someone
working on that to wire up.

Note that currently this generates bad, long shuffle chains because we
don't combine 256-bit target shuffles. The subsequent patches will fix
that.

llvm-svn: 230751
2015-02-27 11:33:46 +00:00
Toma Tabacu 344c167436 [mips] Remove redundant periods from -mattr=help descriptions for MIPS.
Summary: Also fixes an infringement of the 80-column limit rule.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7910

llvm-svn: 230748
2015-02-27 10:44:02 +00:00
Chandler Carruth ddc4d085cc [x86] Make the single-input v8i16 lowering directly recurse rather than
going back through the entire vector shuffle lowering.

This is an important step to being able to re-use this logic.

llvm-svn: 230743
2015-02-27 09:11:38 +00:00
Vasileios Kalintiris 18581f16b4 [mips] Account for constant-zero operands in ADDE nodes.
Summary:
We identify the cases where the operand to an ADDE node is a constant
zero. In such cases, we can avoid generating an extra ADDu instruction
disguised as an identity move alias (ie. addu $r, $r, 0 --> move $r, $r).

Reviewers: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7906

llvm-svn: 230742
2015-02-27 09:01:39 +00:00
Charles Davis 84d28de627 Target/X86: Save Win64 non-volatile registers in a Win64 ABI function.
Summary:
This change causes us to actually save non-volatile registers in a Win64
ABI function that calls a System V ABI function, and vice-versa.

Reviewers: rnk

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7919

llvm-svn: 230714
2015-02-27 00:57:01 +00:00
Eric Christopher 1cdefae9c4 Rewrite MachineOperand::print and MachineInstr::print to avoid
uses of TM->getSubtargetImpl and propagate to all calls.

This could be a debugging regression in places where we had a
TargetMachine and/or MachineFunction but don't have it as part
of the MachineInstr. Fixing this would require passing a
MachineFunction/Function down through the print operator, but
none of the existing uses in tree seem to do this.

llvm-svn: 230710
2015-02-27 00:11:34 +00:00
Eric Christopher 11e4df73c8 getRegForInlineAsmConstraint wants to use TargetRegisterInfo for
a lookup, pass that in rather than use a naked call to getSubtargetImpl.
This involved passing down and around either a TargetMachine or
TargetRegisterInfo. Update all callers/definitions around the targets
and SelectionDAG.

llvm-svn: 230699
2015-02-26 22:38:43 +00:00
Chandler Carruth 653773d004 [x86] Fix PR22706 where we would incorrectly try lower a v32i8 dynamic
blend as legal.

We made the same mistake in two different places. Whenever we are custom
lowering a v32i8 blend we need to check whether we are custom lowering
it only for constant conditions that can be shuffled, or whether we
actually have AVX2 and full dynamic blending support on bytes. Both are
fixed, with comments added to make it clear what is going on and a new
test case.

llvm-svn: 230695
2015-02-26 22:15:34 +00:00
Chandler Carruth 7bd840d058 [x86] Restructure the comments and the conditions for handling
dynamic blends.

This makes it much more clear what is going on. The case we're handling
is that of dynamic conditions, and we're bailing when the nature of the
vector types and subtarget preclude lowering the dynamic condition
vselect as an actual blend.

No functionality changed here, but this will make a subsequent bug-fix
to this code much more clear.

llvm-svn: 230690
2015-02-26 21:29:06 +00:00
Chandler Carruth efc6819041 [x86] Re-order the combines of select in the X86 backend. This doesn't
change functionality, but makes it more clear that the dynamic case and
the shuffle case don't overlap in any interesting way.

llvm-svn: 230689
2015-02-26 21:21:36 +00:00
Chandler Carruth 0757f14c69 [x86] Add an assert to catch if we ever try to blend a v32i8 without
AVX2.

llvm-svn: 230688
2015-02-26 21:18:20 +00:00
Reid Kleckner e81017248c Don't sibcall between SysV and Win64 convention functions
The shadow stack space expectations won't match.

Fixes PR22709.

llvm-svn: 230667
2015-02-26 19:43:20 +00:00
Petar Jovanovic 90ec1b175e Fix justify error for small structures in varargs for MIPS64BE
There was a problem when passing structures as variable arguments.
The structures smaller than 64 bit were not left justified on MIPS64
big endian. This is now fixed by shifting the value to make it left-
justified when appropriate.

This fixes the bug http://llvm.org/bugs/show_bug.cgi?id=21608

Patch by Aleksandar Beserminji.

Differential Revision: http://reviews.llvm.org/D7881

llvm-svn: 230657
2015-02-26 18:35:15 +00:00
Sumanth Gundapaneni 28a3b86b06 Use ".arch_extension" ARM directive to support hwdiv on krait
In case of "krait" CPU, asm printer doesn't emit any ".cpu" so the
features bits are not computed. This patch lets the asm printer
emit ".cpu cortex-a9" directive for krait and the hwdiv feature is
enabled through ".arch_extension". In short, krait is treated
as "cortex-a9" with hwdiv. We can not emit ".krait" as CPU since
it is not supported bu GNU GAS yet

llvm-svn: 230651
2015-02-26 18:08:41 +00:00
Sumanth Gundapaneni a9049ea368 Use ".arch_extension" ARM directive to specify the additional CPU features
This patch is in response to r223147 where the avaiable features are
computed based on ".cpu" directive. This will work clean for the standard
variants like cortex-a9. For custom variants which rely on standard cpu names
for assembly, the additional features of a CPU should be propagated. This can be
done via ".arch_extension" as long as the assembler supports it. The
implementation for krait along with unit test will be submitted in next patch.

llvm-svn: 230650
2015-02-26 18:07:35 +00:00
Tom Stellard eb05c610b4 R600/SI: Remove M0 from DS assembly strings
This matches the assembly syntax for the proprietary compiler.

llvm-svn: 230645
2015-02-26 17:08:43 +00:00
Michael Kuperstein 4af7449659 [X86][Haswell][SchedModel] Fix WriteMULm latency.
The latency for the WriteMULm class was set to 4, which is actually lower than the latency for WriteMULr (5). 
A better estimate would be 4 added to WriteMULr, that is, 9.

llvm-svn: 230634
2015-02-26 14:30:09 +00:00
Chandler Carruth 8e0a3ea52c [x86] Sink the single-input v8i16 lowering code that is actually
formulaic into the top v8i16 lowering routine.

This makes the generalized lowering a completely general and single path
lowering which will allow generalizing it in turn for multiple 128-bit
lanes.

llvm-svn: 230623
2015-02-26 11:00:40 +00:00
Chandler Carruth 11e7f6b50a [x86] Remove a SimpleTy usage. No need for it here, we already have the
MVT.

llvm-svn: 230622
2015-02-26 10:37:01 +00:00
Chandler Carruth d283cb6203 [x86] Make the vector shuffle helpers order the SDLoc and MVT arguments.
This ordering matches that of DAG.getNode.

llvm-svn: 230617
2015-02-26 08:19:24 +00:00
Reid Kleckner e2008ae475 Pass /nologo to ml64 for quieter builds
It still prints "Assembling path/to/X86CompilationCallback_Win64.asm",
but linking does the same thing.

llvm-svn: 230596
2015-02-26 00:51:33 +00:00
Eric Christopher 5f54195e4a Remove a FIXME.
Explanation: This function is in TargetLowering because it uses
RegClassForVT which would need to be moved to TargetRegisterInfo
and would necessitate moving isTypeLegal over as well - a massive
change that would just require TargetLowering having a TargetRegisterInfo
class member that it would use.

llvm-svn: 230585
2015-02-26 00:00:35 +00:00
Eric Christopher 23a3a7c871 Remove an argument-less call to getSubtargetImpl from TargetLoweringBase.
This required plumbing a TargetRegisterInfo through computeRegisterProperties
and into findRepresentativeClass which uses it for register class
iteration. This required passing a subtarget into a few target specific
initializations of TargetLowering.

llvm-svn: 230583
2015-02-26 00:00:24 +00:00
Hal Finkel cf59921670 [PowerPC] Make LDtocL and friends invariant loads
LDtocL, and other loads that roughly correspond to the TOC_ENTRY SDAG node,
represent loads from the TOC, which is invariant. As a result, these loads can
be hoisted out of loops, etc. In order to do this, we need to generate
GOT-style MMOs for TOC_ENTRY, which requires treating it as a legitimate memory
intrinsic node type. Once this is done, the MMO transfer is automatically
handled for TableGen-driven instruction selection, and for nodes generated
directly in PPCISelDAGToDAG, we need to transfer the MMOs manually.

Also, we were not transferring MMOs associated with pre-increment loads, so do
that too.

Lastly, this fixes an exposed bug where R30 was not added as a defined operand of
UpdateGBR.

This problem was highlighted by an example (used to generate the test case)
posted to llvmdev by Francois Pichet.

llvm-svn: 230553
2015-02-25 21:36:59 +00:00
David Majnemer e1bbad9eb2 X86, Win64: Allow 'mov' to restore the stack pointer if we have a FP
The Win64 epilogue structure is very restrictive, it permits a very
small number of opcodes and none of them are 'mov'.

This means that given:
  mov %rbp, %rsp
  pop %rbp

The mov isn't the epilogue, only the pop is.  This is problematic unless
a frame pointer is present in which case we are free to do whatever we'd
like in the "body" of the function.  If a frame pointer is present,
unwinding will undo the prologue operations in reverse order regardless
of the fact that we are at an instruction which is reseting the stack
pointer.

llvm-svn: 230543
2015-02-25 21:13:37 +00:00
Hal Finkel 0746211811 [PowerPC] Cleanup unused target-specific SDAG nodes
We had somehow accumulated a few target-specific SDAG nodes dealing with PPC64
TOC access that were referenced only in TableGen patterns. The associated
(pseudo-)instructions are used, but are being generated directly. NFC.

llvm-svn: 230518
2015-02-25 18:06:45 +00:00
Matthias Braun 02892ec62d AArch64: Add debug message for large shift constants.
As requested in code review.

llvm-svn: 230517
2015-02-25 18:03:50 +00:00
Vladimir Medic bcb7467540 [MIPS]Multiple and add instructions for Mips are currently available in mips32r2/mips64r2 and later but should also be available in mips4, mips5, and mips64. This patch fixes the requested features and updates the corresponding test files.
llvm-svn: 230500
2015-02-25 15:24:37 +00:00
Bruno Cardoso Lopes ab7afa9144 [X86][MMX] Reapply: Add MMX instructions to foldable tables
Reapply r230248.

Teach the peephole optimizer to work with MMX instructions by adding
entries into the foldable tables. This covers folding opportunities not
handled during isel.

llvm-svn: 230499
2015-02-25 15:14:02 +00:00
Bruno Cardoso Lopes 48b10681f9 [X86][MMX] Prevent MMX_MOVD64rm folding
MMX_MOVD64rm zero-extends i32 load results into i64 registers.

The peephole optimizer will try to fold it in other MMX foldable
instructions, the wrong thing to do, since there's no MMX memory
instruction that loads from i32 and does implict zero extension.

Remove 'canFoldAsLoad' from MOVD64rm in order to prevent such folding.
The current MMX tests already test this, but since there are no MMX
instructions in the foldable tables yet, this did not trigger. This
commit prepares the addition of those instructions.

llvm-svn: 230498
2015-02-25 15:13:52 +00:00
Renato Golin b9887ef32a Improve handling of stack accesses in Thumb-1
Thumb-1 only allows SP-based LDR and STR to be word-sized, and SP-base LDR,
STR, and ADD only allow offsets that are a multiple of 4. Make some changes
to better make use of these instructions:

* Use word loads for anyext byte and halfword loads from the stack.
* Enforce 4-byte alignment on objects accessed in this way, to ensure that
  the offset is valid.
* Do the same for objects whose frame index is used, in order to avoid having
  to use more than one ADD to generate the frame index.
* Correct how many bits of offset we think AddrModeT1_s has.

Patch by John Brawn.

llvm-svn: 230496
2015-02-25 14:41:06 +00:00
Aaron Ballman 5561ed448b Silencing a "result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)" warning in MSVC; NFC.
llvm-svn: 230489
2015-02-25 13:05:24 +00:00
Aaron Ballman 70c27ded97 Silencing a -Wsign-compare warning triggered in MSVC; NFC.
llvm-svn: 230488
2015-02-25 13:02:23 +00:00
Elena Demikhovsky 56eadcf5ce AVX-512: Gather and Scatter patterns
Gather and scatter instructions additionally write to one of the source operands - mask register.
In this case Gather has 2 destination values - the loaded value and the mask.
Till now we did not support code gen pattern for gather - the instruction was generated from 
intrinsic only and machine node was hardcoded.
When we introduce the masked_gather node, we need to select instruction automatically,
in the standard way.
I added a flag "hasTwoExplicitDefs" that allows to handle 2 destination operands.

(Some code in the X86InstrFragmentsSIMD.td is commented out, just to split one big
patch in many small patches)

llvm-svn: 230471
2015-02-25 09:46:31 +00:00
Hal Finkel c93a9a2cb4 [PowerPC] Add support for the QPX vector instruction set
This adds support for the QPX vector instruction set, which is used by the
enhanced A2 cores on the IBM BG/Q supercomputers. QPX vectors are 256 bytes
wide, holding 4 double-precision floating-point values. Boolean values, modeled
here as <4 x i1> are actually also represented as floating-point values
(essentially  { -1, 1 } for { false, true }). QPX shares many features with
Altivec and VSX, but is distinct from both of them. One major difference is
that, instead of adding completely-separate vector registers, QPX vector
registers are extensions of the scalar floating-point registers (lane 0 is the
corresponding scalar floating-point value). The operations supported on QPX
vectors mirrors that supported on the scalar floating-point values (with some
additional ones for permutations and logical/comparison operations).

I've been maintaining this support out-of-tree, as part of the bgclang project,
for several years. This is not the entire bgclang patch set, but is most of the
subset that can be cleanly integrated into LLVM proper at this time. Adding
this to the LLVM backend is part of my efforts to rebase bgclang to the current
LLVM trunk, but is independently useful (especially for codes that use LLVM as
a JIT in library form).

The assembler/disassembler test coverage is complete. The CodeGen test coverage
is not, but I've included some tests, and more will be added as follow-up work.

llvm-svn: 230413
2015-02-25 01:06:45 +00:00
Eric Christopher fe59972bbc Rename UpdateRegAllocHint to match style guidelines.
llvm-svn: 230357
2015-02-24 19:10:57 +00:00
Matthias Braun 7526035155 AArch64: Relax assert about large shift sizes.
The reason why these large shift sizes happen is because OpaqueConstants
currently inhibit alot of DAG combining, but that has to be addressed in
another commit (like the proposal in D6946).

Differential Revision: http://reviews.llvm.org/D6940

llvm-svn: 230355
2015-02-24 18:52:04 +00:00
Tom Stellard ecc419c31d R600/SI: Remove isel mubuf legalization
We legalize mubuf instructions post-instruction selection, so this
code is no longer needed.

llvm-svn: 230352
2015-02-24 17:59:19 +00:00
Tim Northover e95c5b3236 ARM: treat [N x i32] and [N x i64] as AAPCS composite types
The logic is almost there already, with our special homogeneous aggregate
handling. Tweaking it like this allows front-ends to emit AAPCS compliant code
without ever having to count registers or add discarded padding arguments.

Only arrays of i32 and i64 are needed to model AAPCS rules, but I decided to
apply the logic to all integer arrays for more consistency.

llvm-svn: 230348
2015-02-24 17:22:34 +00:00
Sanjay Patel a709f3a5ae simplify control flow; NFC
llvm-svn: 230342
2015-02-24 16:26:02 +00:00
Michael Kuperstein d2f3b87812 [x32] Mark RBX as reserved when EBX is the base pointer.
This should have gone into r230334.

llvm-svn: 230339
2015-02-24 16:13:16 +00:00
Sanjay Patel 2898548598 fix typo in comment; NFC
llvm-svn: 230338
2015-02-24 16:11:05 +00:00
Michael Kuperstein 8ffb409135 [x32] x32 should use ebx as the base pointer.
This fixes the original issue in PR22655, but not the secondary one.

llvm-svn: 230334
2015-02-24 15:27:13 +00:00
Toma Tabacu a90f144a1d [mips] Reformat some TableGen definitions. NFC.
Summary: Separated some instruction and pseudo-instruction definitions from InstAlias definitions, added banner for pseudo-instructions and removed a redundant whitespace from a pseudo-instruction definition. No functional change.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7552

llvm-svn: 230327
2015-02-24 11:52:19 +00:00
Craig Topper cf51397c48 [X86] Remove the AbsMem32 type from the assembly parser. Only really need the 16-bit version which will automatically get prioritized over AbsMem.
llvm-svn: 230313
2015-02-24 08:02:13 +00:00
Reed Kotler 5fb7d8b508 Beginning of alloca implementation for Mips fast-isel
Summary: Begin to add various address modes; including alloca.

Test Plan: Make sure there are no regressions in test-suite at O0/02 in mips32r1/r2

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: echristo, rfuhler, llvm-commits

Differential Revision: http://reviews.llvm.org/D6426

llvm-svn: 230300
2015-02-24 02:36:45 +00:00
Bob Wilson 8e29dec986 Fix handling of negative offsets for AddrModeT2_i8s4 in rewriteT2FrameIndex.
This is a follow up to r230233 to fix something that I noticed by
inspection. The AddrModeT2_i8s4 addressing mode does not support
negative offsets. I spent a good chunk of the day trying to come up with
a testcase for this but was not successful. This addressing mode is used
to spill and restore GPRPair registers in Thumb2 code and that does not
happen often. We also make very limited used of negative offsets when
lowering frame indexes. I am going ahead with the change anyway, because
I am pretty confident that it is correct. I also added a missing assertion
to check that the low bits of the scaled offset are zero.

llvm-svn: 230297
2015-02-24 01:37:31 +00:00
David Majnemer 3aa0bd81a2 X86: Only use 'lea' in Win64 epilogues if a frame pointer exists
We can only use 'add' in epilogues, 'lea' is not permitted unless we've
established a frame pointer in the prologue.

llvm-svn: 230286
2015-02-24 00:11:32 +00:00
David Majnemer 006c490ba8 X86: Use a smaller 'mov' instruction for stack probe calls
Prologue emission, in some cases, requires calls to a stack probe helper
function.  The amount of stack to probe is passed as a register
argument in the Win64 ABI but the instruction sequence used is
pessimistic: it assumes that the number of bytes to probe is greater
than 4 GB.

Instead, select a more appropriate opcode depending on the number of
bytes we are going to probe.

llvm-svn: 230270
2015-02-23 21:50:30 +00:00
David Majnemer 31d868b618 X86: Use 'mov' instead of 'lea' in Win64 SEH prologues when possible
'mov' and 'lea' are equivalent when the displacement applied with 'lea'
is zero.  However, 'mov' should encode smaller.

llvm-svn: 230269
2015-02-23 21:50:27 +00:00
David Majnemer b85e023b8b X86: Explain why we cannot use a 'mov' in a Win64 epilogue
llvm-svn: 230268
2015-02-23 21:50:25 +00:00
David Majnemer 086f6a7e6e X86: Consistently use 'epilogue' instead of 'epilog'
llvm-svn: 230267
2015-02-23 21:50:18 +00:00
Bruno Cardoso Lopes 24492b057e [AsmPrinter] Access pointers to globals via pcrel GOT entries
Front-ends could use global unnamed_addr to hold pointers to other
symbols, like @gotequivalent below:

@foo = global i32 42
@gotequivalent = private unnamed_addr constant i32* @foo

@delta = global i32 trunc (i64 sub (i64 ptrtoint (i32** @gotequivalent to i64),
                                    i64 ptrtoint (i32* @delta to i64))
                           to i32)

The global @delta holds a data "PC"-relative offset to @gotequivalent,
an unnamed pointer to @foo. The darwin/x86-64 assembly output for this follows:

 .globl  _foo
_foo:
 .long   42

 .globl  _gotequivalent
_gotequivalent:
 .quad   _foo

 .globl  _delta
_delta:
 .long   _gotequivalent-_delta

Since unnamed_addr indicates that the address is not significant, only
the content, we can optimize the case above by replacing pc-relative
accesses to "GOT equivalent" globals, by a PC relative access to the GOT
entry of the final symbol instead. Therefore, "delta" can contain a pc
relative relocation to foo's GOT entry and we avoid the emission of
"gotequivalent", yielding the assembly code below:

 .globl  _foo
_foo:
 .long   42

 .globl  _delta
_delta:
 .long   _foo@GOTPCREL+4

There are a couple of advantages of doing this: (1) Front-ends that need
to emit a great deal of data to store pointers to external symbols could
save space by not emitting such "got equivalent" globals and (2) IR
constructs combined with this opt opens a way to represent GOT pcrel
relocations by using the LLVM IR, which is something we previously had
no way to express.

Differential Revision: http://reviews.llvm.org/D6922

rdar://problem/18534217

llvm-svn: 230264
2015-02-23 21:26:18 +00:00
Bruno Cardoso Lopes 32173cdf06 Revert "[X86][MMX] Add MMX instructions to foldable tables"
This reverts commit r230226 since it breaks win buildbots.

llvm-svn: 230248
2015-02-23 19:53:37 +00:00
Eric Christopher ed47b22951 Rewrite the global merge pass to be subprogram agnostic for now.
It was previously using the subtarget to get values for the global
offset without actually checking each function as it was generating
code. Go ahead and solidify the current behavior and make the
existing FIXMEs more prominent.

As a note the ARM backend previously had a thumb1 and non-thumb1
set of defaults. Only the former was tested so I've changed the
behavior to only use that for now.

llvm-svn: 230245
2015-02-23 19:28:45 +00:00
Chad Rosier 543900539f Prevent hoisting fmul from THEN/ELSE to IF if there is fmsub/fmadd opportunity.
This patch adds the isProfitableToHoist API.  For AArch64, we want to prevent a
fmul from being hoisted in cases where it is more profitable to form a
fmsub/fmadd.

Phabricator Review: http://reviews.llvm.org/D7299
Patch by Lawrence Hu <lawrence@codeaurora.org>

llvm-svn: 230241
2015-02-23 19:15:16 +00:00
Daniel Sanders afe27c7d27 [mips] Honour -mno-odd-spreg for vector insert/extract when MSA is enabled.
Summary:
-mno-odd-spreg prohibits the use of odd-numbered single-precision floating
point registers. However, vector insert/extract was still using them when
manipulating the subregisters of an MSA register. Fixed this by ensuring
that insertion/extraction is only performed on even-numbered vector
registers when -mno-odd-spreg is given.

Reviewers: vmedic, sstankovic

Reviewed By: sstankovic

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7672

llvm-svn: 230235
2015-02-23 17:22:16 +00:00
Bob Wilson 89e94fc3ad Fix incorrect immediate size for AddrModeT2_i8s4 in rewriteT2FrameIndex.
The natural way to handle this addressing mode would be to say that it has
8 bits and gets scaled by 4, but since the MC layer is expecting the scaling
to be already reflected in the immediate value, we have been setting the
Scale to 1. That's fine, but then NumBits needs to be adjusted to reflect
the effective increase in the range of the immediate. That adjustment was
missing.

The consequence is that the register scavenger can fail.
The estimateRSStackSizeLimit() function in ARMFrameLowering.cpp correctly
assumes that the AddrModeT2_i8s4 address mode can handle scaled offsets up to
1020. Under just the right circumstances, we fail to reserve space for the
scavenger because it thinks that nothing will be needed. However, the overly
pessimistic behavior in rewriteT2FrameIndex causes some frame indexes to be
out of range and require scavenged registers, and so the scavenger asserts.

Unfortunately I have not been able to come up with a testcase for this. I
can only reproduce it on an internal branch where the frame layout and
register allocation is slightly different than trunk. We really need a
way to serialize MachineInstr-level IR to write reasonable tests for things
like this.

rdar://problem/19909005

llvm-svn: 230233
2015-02-23 16:57:19 +00:00
Bruno Cardoso Lopes f488e2ae69 [X86][MMX] Add MMX instructions to foldable tables
Teach the peephole optimizer to work with MMX instructions by adding
entries into the foldable tables. This covers folding opportunities not
handled during isel.

llvm-svn: 230226
2015-02-23 15:23:22 +00:00
Bruno Cardoso Lopes 9e1c4c17d9 [X86][MMX] Support folding loads in psll, psrl and psra intrinsics
llvm-svn: 230225
2015-02-23 15:23:14 +00:00
Elena Demikhovsky 52e81bc499 AVX-512: recommitted 229837 + bugfix + test
llvm-svn: 230223
2015-02-23 15:12:31 +00:00
Elena Demikhovsky 145e5b4409 restructured X86 scalar unary operation templates
I made the templates general, no need to define pattern separately for each instruction/intrinsic.
Now only need to add r_Int pattern for AVX.

llvm-svn: 230221
2015-02-23 14:14:02 +00:00
NAKAMURA Takumi 3d61760bd6 Fix a warning on HexagonMCCodeEmitter::MCII. [-Wunused-private-field]
llvm-svn: 230170
2015-02-22 09:58:29 +00:00
Craig Topper 8659344d93 [X86] Add some missing redundant MMX and SSE encodings for disassembler.
llvm-svn: 230165
2015-02-22 07:50:41 +00:00
Matt Arsenault f07833057c R600/SI: Use v_madmk_f32
llvm-svn: 230149
2015-02-21 21:29:10 +00:00
Matt Arsenault 0325d3d27f R600/SI: Try to use v_madak_f32
This is a code size optimization when the constant
only has one use.

llvm-svn: 230148
2015-02-21 21:29:07 +00:00
Matt Arsenault 657b1cb739 R600/SI: Don't crash when getting immediate operand size
llvm-svn: 230147
2015-02-21 21:29:04 +00:00
Matt Arsenault 70120fa813 R600/SI: Fix mad*k definitions
llvm-svn: 230146
2015-02-21 21:29:00 +00:00
Benjamin Kramer cd8e4ebadb Remove dead prototype.
llvm-svn: 230137
2015-02-21 14:35:00 +00:00
Benjamin Kramer 71bfe3d1a4 X86: Remove custom lowering of SIGN_EXTEND_INREG
This was just replicating logic from the legalizer. Covered by existing
tests.

llvm-svn: 230136
2015-02-21 14:31:29 +00:00
Eric Christopher cde54069f9 Remove obsolete comment.
llvm-svn: 230134
2015-02-21 08:48:23 +00:00
Eric Christopher 327fc9721c Have the MipsAsmPrinter fp stub emission code take a custom
MCSubtargetInfo as the MachineFunction has gone away and we need
to emit code at the module level.

llvm-svn: 230133
2015-02-21 08:48:22 +00:00
Eric Christopher d5bc07e866 Turn an if+llvm_unreachable into an assert and reword comment.
llvm-svn: 230132
2015-02-21 08:32:38 +00:00
Eric Christopher bb40164e48 Endianness can be gotten from the DataLayout which we already
have. Also, the subtarget is invalid at this point.

llvm-svn: 230131
2015-02-21 08:32:22 +00:00
David Majnemer d5ab35f265 X86: Call __main using the SelectionDAG
Synthesizing a call directly using the MI layer would confuse the frame
lowering code.  This is problematic as frame lowering is highly
sensitive the particularities of calls, etc.

llvm-svn: 230129
2015-02-21 05:49:45 +00:00
Tim Northover 3b6b7ca2bc CodeGen: convert CCState interface to using ArrayRefs
Everyone except R600 was manually passing the length of a static array
at each callsite, calculated in a variety of interesting ways. Far
easier to let ArrayRef handle that.

There should be no functional change, but out of tree targets may have
to tweak their calls as with these examples.

llvm-svn: 230118
2015-02-21 02:11:17 +00:00
David Majnemer 89d0564b6a Win64: Stack alignment constraints aren't applied during SET_FPREG
Stack realignment occurs after the prolog, not during, for Win64.
Because of this, don't factor in the maximum stack alignment when
establishing a frame pointer.

This fixes PR22572.

llvm-svn: 230113
2015-02-21 01:04:47 +00:00
Reid Kleckner 8142a08ce7 X86: Remove pre-2010 dead code in mergeSPUpdatesDown
llvm-svn: 230075
2015-02-20 22:13:25 +00:00
Simon Pilgrim b7875837c7 LowerScalarImmediateShift - Merged v16i8 and v32i8 shift lowering. NFC.
llvm-svn: 230074
2015-02-20 22:13:03 +00:00
Matt Arsenault 20711b7bae R600/SI: Remove v_sub_f64 pseudo
The expansion code does the same thing. Since
the operands were not defined with the correct
types, this has the side effect of fixing operand
folding since the expanded pseudo would never use
SGPRs or inline immediates.

llvm-svn: 230072
2015-02-20 22:10:45 +00:00
Matt Arsenault 8d6300346f R600: Use new fmad node.
This enables a few useful combines that used to only
use fma.

Also since v_mad_f32 apparently does not support denormals,
disable the existing cases that are custom handled if they are
requested.

llvm-svn: 230071
2015-02-20 22:10:41 +00:00
Jozef Kolek 0365675522 Reversed revision 229706. The reason is regression, which is caused by the
usage of instruction ADDU16 by CodeGen. For this instruction an improper
register is allocated, i.e. the register that is not from register set defined
for the instruction.

llvm-svn: 230053
2015-02-20 20:26:52 +00:00
Eric Christopher db51e2a52a Fix an asan use-after-free bug introduced by the asm printer
changes to remove non-Function based subtargets out of the asm
printer. For module level emission we'll need to construct up
an MCSubtargetInfo so that we can encode instructions for
emission.

llvm-svn: 230050
2015-02-20 19:54:07 +00:00
Andrea Di Biagio 7035178aeb [X86][FastIsel] Teach how to select float-half conversion intrinsics.
This patch teaches X86FastISel how to select intrinsic 'convert_from_fp16' and
intrinsic 'convert_to_fp16'.
If the target has F16C, we can select VCVTPS2PHrr for a float-half conversion,
and VCVTPH2PSrr for a half-float conversion.

Differential Revision: http://reviews.llvm.org/D7673

llvm-svn: 230043
2015-02-20 19:37:14 +00:00
Eric Christopher 9cda4b7ed9 Remove a use of the Subtarget in the darwin ppc asm printer.
EmitFunctionStubs is called from doFinalization and so can't
depend on the Subtarget existing. It's also irrelevant as
we know we're darwin since we're in the darwin asm printer.

llvm-svn: 230039
2015-02-20 18:53:42 +00:00
Eric Christopher 1df0c519fc Get the cached subtarget off the MachineFunction rather than
inquiring for a new one from the TargetMachine.

llvm-svn: 230037
2015-02-20 18:44:15 +00:00
Sanjay Patel 1b53f74c4c canonicalize a v2f64 blendi of 2 registers
This canonicalization step saves us 3 pattern matching possibilities * 4 math ops
for scalar FP math that uses xmm regs. The backend can re-commute the operands
post-instruction-selection if that makes register allocation better.

The tests in llvm/test/CodeGen/X86/sse-scalar-fp-arith.ll cover this scenario already,
so there are no new tests with this patch.

Differential Revision: http://reviews.llvm.org/D7777

llvm-svn: 230024
2015-02-20 16:55:27 +00:00
Kit Barton 263edb99ab I incorrectly marked the VORC instruction as isCommutable when I added it.
This fix removes the VORC instruction definition from the isCommutable block.

Phabricator review: http://reviews.llvm.org/D7772

llvm-svn: 230020
2015-02-20 15:54:58 +00:00
Chandler Carruth 0c5f059865 [x86] Switching the shuffle equivalence test to a variadic template was
the wrong answer. We also got initializer lists which are *way* cleaner
for this kind of thing. Let's use those and make this a normal, boring
functionn accepting ArrayRef.

llvm-svn: 230004
2015-02-20 10:47:28 +00:00
Eric Christopher 0218f8cfed Fix wording and grammar in Mips subtarget options.
llvm-svn: 230001
2015-02-20 08:42:34 +00:00
Eric Christopher 3ee30d0607 Get the cached subtarget off the MachineFunction rather than
inquiring for a new one from the TargetMachine.

llvm-svn: 230000
2015-02-20 08:39:06 +00:00
Eric Christopher 22b2ad265f Get the cached subtarget off the MachineFunction rather than
inquiring for a new one from the TargetMachine.

llvm-svn: 229999
2015-02-20 08:24:37 +00:00
Eric Christopher 155290edf9 Get the cached subtarget off the MachineFunction rather than
inquiring for a new one from the TargetMachine.

llvm-svn: 229998
2015-02-20 08:24:34 +00:00
Eric Christopher ad1ef04ab1 Save the MachineFunction in startFunction so that we can use it for
lookups of the subtarget later.

llvm-svn: 229996
2015-02-20 08:01:55 +00:00
Eric Christopher 4369c9b42c Use the cached subtarget from the MachineFunction rather than
doing a lookup on the TargetMachine.

llvm-svn: 229995
2015-02-20 08:01:52 +00:00
Eric Christopher 1947a9e2e2 Make the TargetMachine::getSubtarget that takes a Function argument
take a reference to match the getSubtargetImpl that takes a Function
argument.

llvm-svn: 229994
2015-02-20 07:32:59 +00:00
Nick Lewycky a2bda08806 Fix build in release mode, -Wunused-variable on this lambda function used only in an assert.
llvm-svn: 229977
2015-02-20 07:16:17 +00:00
David Blaikie 9a3644c472 Fix -Wunused-variable warning in non-asserts build, and optimize a little bit while I'm here.
llvm-svn: 229970
2015-02-20 06:28:38 +00:00
Hal Finkel e5aaf3f2cd [PowerPC] Loop Data Prefetching for the BG/Q
The IBM BG/Q supercomputer's A2 cores have a hardware prefetching unit, the
L1P, but it does not prefetch directly into the A2's L1 cache. Instead, it
prefetches into its own L1P buffer, and the latency to access that buffer is
significantly higher than that to the L1 cache (although smaller than the
latency to the L2 cache). As a result, especially when multiple hardware
threads are not actively busy, explicitly prefetching data into the L1 cache is
advantageous.

I've been using this pass out-of-tree for data prefetching on the BG/Q for well
over a year, and it has worked quite well. It is enabled by default only for
the BG/Q, but can be enabled for other cores as well via a command-line option.

Eventually, we might want to add some TTI interfaces and move this into
Transforms/Scalar (there is nothing particularly target dependent about it,
although only machines like the BG/Q will benefit from its simplistic
strategy).

llvm-svn: 229966
2015-02-20 05:08:21 +00:00
Chandler Carruth 4041f2217b [x86] Remove the old vector shuffle lowering code and its flag.
The new shuffle lowering has been the default for some time. I've
enabled the new legality testing by default with no really blocking
regressions. I've fuzz tested this very heavily (many millions of fuzz
test cases have passed at this point). And this cleans up a ton of code.
=]

Thanks again to the many folks that helped with this transition. There
was a lot of work by others that went into the new shuffle lowering to
make it really excellent.

In case you aren't using a diff algorithm that can handle this:
  X86ISelLowering.cpp: 22 insertions(+), 2940 deletions(-)

llvm-svn: 229964
2015-02-20 04:25:04 +00:00
Chandler Carruth eb206aa1ea [x86] Now that the new vector shuffle legality is enabled and everything
is going well, remove the flag and the code for the old legality tests.

This is the first step toward removing the entire old vector shuffle
lowering. *Much* more code to delete coming up next.

llvm-svn: 229963
2015-02-20 03:59:35 +00:00
Chandler Carruth d2b14b296c [x86] Make the new vector shuffle legality test on by default, which
reflects the fact that the x86 backend can in fact lower any shuffle you
want it to with reasonably high code quality.

My recent work on the new vector shuffle has made this regress *very*
little. The diff in the test cases makes me very, very happy.

llvm-svn: 229958
2015-02-20 03:05:47 +00:00
Eric Christopher 0d94fa98e5 Revert "AVX-512: Full implementation for VRNDSCALESS/SD instructions and intrinsics."
The instructions were being generated on architectures that don't support avx512.

This reverts commit r229837.

llvm-svn: 229942
2015-02-20 00:45:28 +00:00
Eric Christopher 06b32cdfed Add a license header to the AVX512 file.
llvm-svn: 229941
2015-02-20 00:36:53 +00:00
Ahmed Bougacha db141ac37d [ARM] Re-re-apply VLD1/VST1 base-update combine.
This re-applies r223862, r224198, r224203, and r224754, which were
reverted in r228129 because they exposed Clang misalignment problems
when self-hosting.

The combine caused the crashes because we turned ISD::LOAD/STORE nodes
to ARMISD::VLD1/VST1_UPD nodes.  When selecting addressing modes, we
were very lax for the former, and only emitted the alignment operand
(as in "[r1:128]") when it was larger than the standard alignment of
the memory type.

However, for ARMISD nodes, we just used the MMO alignment, no matter
what.  In our case, we turned ISD nodes to ARMISD nodes, and this
caused the alignment operands to start being emitted.

And that's how we exposed alignment problems that were ignored before
(but I believe would have been caught with SCTRL.A==1?).

To fix this, we can just mirror the hack done for ISD nodes:  only
take into account the MMO alignment when the access is overaligned.

Original commit message:
We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD
when the base pointer is incremented after the load/store.

We can do the same thing for generic load/stores.

Note that we can only combine the first load/store+adds pair in
a sequence (as might be generated for a v16f32 load for instance),
because other combines turn the base pointer addition chain (each
computing the address of the next load, from the address of the last
load) into independent additions (common base pointer + this load's
offset).

rdar://19717869, rdar://14062261.

llvm-svn: 229932
2015-02-19 23:52:41 +00:00
Ahmed Bougacha dfdf54bed0 [ARM] Minor cleanup to CombineBaseUpdate. NFC.
In preparation for a future patch:
- rename isLoad to isLoadOp: the former is confusing, and can be taken
  to refer to the fact that the node is an ISD::LOAD.  (it isn't, yet.)
- change formatting here and there.
- add some comments.
- const-ify bools.

llvm-svn: 229929
2015-02-19 23:30:37 +00:00
Ahmed Bougacha 4c2b0781a5 [CodeGen] Use ArrayRef instead of std::vector&. NFC.
The former lets us use SmallVectors.  Do so in ARM and AArch64.

llvm-svn: 229925
2015-02-19 23:13:10 +00:00
Colin LeMahieu 1174fea31c [Hexagon] Moving remaining methods off of HexagonMCInst in to HexagonMCInstrInfo and eliminating HexagonMCInst class.
llvm-svn: 229914
2015-02-19 21:10:50 +00:00
Eric Christopher 64d35be6d6 Remove unused argument from emitInlineAsmStart.
llvm-svn: 229907
2015-02-19 19:52:25 +00:00
Colin LeMahieu 745c4710db [Hexagon] Moving more functions off of HexagonMCInst and in to HexagonMCInstrInfo.
llvm-svn: 229903
2015-02-19 19:49:27 +00:00
Colin LeMahieu af304e5192 [Hexagon] Creating HexagonMCInstrInfo namespace as landing zone for static functions detached from HexagonMCInst.
llvm-svn: 229885
2015-02-19 19:00:00 +00:00
Colin LeMahieu f08a3ccf50 [Hexagon] Removing static variable holding MCInstrInfo.
llvm-svn: 229872
2015-02-19 17:38:39 +00:00
Benjamin Kramer ea68a944a1 Demote vectors to arrays. No functionality change.
llvm-svn: 229861
2015-02-19 15:26:17 +00:00
Chandler Carruth 5d1a84b7b8 [x86] Delete still more piles of complex code now that we have a good
systematic lowering of v8i16.

This required a slight strategy shift to prefer unpack lowerings in more
places. While this isn't a cut-and-dry win in every case, it is in the
overwhelming majority. There are only a few places where the old
lowering would probably be a touch faster, and then only by a small
margin.

In some cases, this is yet another significant improvement.

llvm-svn: 229859
2015-02-19 15:21:57 +00:00
Chandler Carruth 0b39536390 [x86] Teach the unpack lowering how to lower with an initial unpack in
addition to lowering to trees rooted in an unpack.

This saves shuffles and or registers in many various ways, lets us
handle another class of v4i32 shuffles pre SSE4.1 without domain
crosses, etc.

llvm-svn: 229856
2015-02-19 15:06:13 +00:00
Chandler Carruth 352eba1c29 [x86] Dramatically improve v8i16 shuffle lowering by not using its
terribly complex partial blend logic.

This code path was one of the more complex and bug prone when it first
went in and it hasn't faired much better. Ultimately, with the simpler
basis for unpack lowering and support bit-math blending, this is
completely obsolete. In the worst case without this we generate
different but equivalent instructions. However, in many cases we
generate much better code. This is especially true when blends or pshufb
is available.

This does expose one (minor) weakness of the unpack lowering that I'll
try to address.

In case you were wondering, this is actually a big part of what I've
been trying to pull off in the recent string of commits.

llvm-svn: 229853
2015-02-19 14:08:24 +00:00
Chandler Carruth 2c0390ca4b [x86] Remove the final fallback in the v8i16 lowering that isn't really
needed, and significantly improve the SSSE3 path.

This makes the new strategy much more clear. If we can blend, we just go
with that. If we can't blend, we try to permute into an unpack so
that we handle cases where the unpack doing the blend also simplifies
the shuffle. If that fails and we've got SSSE3, we now call into
factored-out pshufb lowering code so that we leverage the fact that
pshufb can set up a blend for us while shuffling. This generates great
code, especially because we *know* we don't have a fast blend at this
point. Finally, we fall back on decomposing into permutes and blends
because we do at least have a bit-math-based blend if we need to use
that.

This pretty significantly improves some of the v8i16 code paths. We
never need to form pshufb for the single-input shuffles because we have
effective target-specific combines to form it there, but we were missing
its effectiveness in the blends.

llvm-svn: 229851
2015-02-19 13:56:49 +00:00
Chandler Carruth f0f0d27391 [x86] Simplify the pre-SSSE3 v16i8 lowering significantly by decomposing
them into permutes and a blend with the generic decomposition logic.

This works really well in almost every case and lets the code only
manage the expansion of a single input into two v8i16 vectors to perform
the actual shuffle. The blend-based merging is often much nicer than the
pack based merging that this replaces. The only place where it isn't we
end up blending between two packs when we could do a single pack. To
handle that case, just teach the v2i64 lowering to handle these blends
by digging out the operands.

With this we're down to only really random permutations that cause an
explosion of instructions.

llvm-svn: 229849
2015-02-19 13:15:12 +00:00
Chandler Carruth 8817e5e01b [x86] Remove the insanely over-aggressive unpack lowering strategy for
v16i8 shuffles, and replace it with new facilities.

This uses precise patterns to match exact unpacks, and the new
generalized unpack lowering only when we detect a case where we will
have to shuffle both inputs anyways and they terminate in exactly
a blend.

This fixes all of the blend horrors that I uncovered by always lowering
blends through the vector shuffle lowering. It also removes *sooooo*
much of the crazy instruction sequences required for v16i8 lowering
previously. Much cleaner now.

The only "meh" aspect is that we sometimes use pshufb+pshufb+unpck when
it would be marginally nicer to use pshufb+pshufb+por. However, the
difference there is *tiny*. In many cases its a win because we re-use
the pshufb mask. In others, we get to avoid the pshufb entirely. I've
left a FIXME, but I'm dubious we can really do better than this. I'm
actually pretty happy with this lowering now.

For SSE2 this exposes some horrors that were really already there. Those
will have to fixed by changing a different path through the v16i8
lowering.

llvm-svn: 229846
2015-02-19 12:10:37 +00:00
Jozef Kolek 5d171fc291 [mips][microMIPS] Make usage of AND16, OR16 and XOR16 by code generator
Differential Revision: http://reviews.llvm.org/D7611

llvm-svn: 229845
2015-02-19 11:51:32 +00:00
Chandler Carruth 38dea42ddf [x86] The SELECT x86 DAG combine also does legalization. It used to rely
on things not being marked as either custom or legal, but we now do
custom lowering of more VSELECT nodes. To cope with this, manually
replicate the legality tests here. These have to stay in sync with the
set of tests used in the custom lowering of VSELECT.

Ideally, we wouldn't do any of this combine-based-legalization when we
have an actual custom legalization step for VSELECT, but I'm not going
to be able to rewrite all of that today.

I don't have a test case for this currently, but it was found when
compiling a number of the test-suite benchmarks. I'll try to reduce
a test case and add it.

This should at least fix the test-suite fallout on build bots.

llvm-svn: 229844
2015-02-19 11:43:37 +00:00
Michael Kuperstein efd7a96d2e Reverting r229831 due to multiple ARM/PPC/MIPS build-bot failures.
llvm-svn: 229841
2015-02-19 11:38:11 +00:00
Elena Demikhovsky 69e8b45b13 AVX-512: Full implementation for VRNDSCALESS/SD instructions and intrinsics.
llvm-svn: 229837
2015-02-19 10:48:04 +00:00
Chandler Carruth bcb6c5f62d [x86] Add support for bit-wise blending and use it in the v8 and v16
lowering paths. I'm going to be leveraging this to simplify a lot of the
overly complex lowering of v8 and v16 shuffles in pre-SSSE3 modes.

Sadly, this isn't profitable on v4i32 and v2i64. There, the float and
double blending instructions for pre-SSE4.1 are actually pretty good,
and we can't beat them with bit math. And once SSE4.1 comes around we
have direct blending support and this ceases to be relevant.

Also, some of the test cases look odd because the domain fixer
canonicalizes these to floating point domain. That's OK, it'll use the
integer domain when it matters and some day I may be able to update
enough of LLVM to canonicalize the other way.

This restores almost all of the regressions from teaching x86's vselect
lowering to always use vector shuffle lowering for blends. The remaining
problems are because the v16 lowering path is still doing crazy things.
I'll be re-arranging that strategy in more detail in subsequent commits
to finish recovering the performance here.

llvm-svn: 229836
2015-02-19 10:46:52 +00:00
Chandler Carruth b89464a9b6 [x86,sdag] Two interrelated changes to the x86 and sdag code.
First, don't combine bit masking into vector shuffles (even ones the
target can handle) once operation legalization has taken place. Custom
legalization of vector shuffles may exist for these patterns (making the
predicate return true) but that custom legalization may in some cases
produce the exact bit math this matches. We only really want to handle
this prior to operation legalization.

However, the x86 backend, in a fit of awesome, relied on this. What it
would do is mark VSELECTs as expand, which would turn them into
arithmetic, which this would then match back into vector shuffles, which
we would then lower properly. Amazing.

Instead, the second change is to teach the x86 backend to directly form
vector shuffles from VSELECT nodes with constant conditions, and to mark
all of the vector types we support lowering blends as shuffles as custom
VSELECT lowering. We still mark the forms which actually support
variable blends as *legal* so that the custom lowering is bypassed, and
the legal lowering can even be used by the vector shuffle legalization
(yes, i know, this is confusing. but that's how the patterns are
written).

This makes the VSELECT lowering much more sensible, and in fact should
fix a bunch of bugs with it. However, as you'll see in the test cases,
right now what it does is point out the *hilarious* deficiency of the
new vector shuffle lowering when it comes to blends. Fortunately, my
very next patch fixes that. I can't submit it yet, because that patch,
somewhat obviously, forms the exact and/or pattern that the DAG combine
is matching here! Without this patch, teaching the vector shuffle
lowering to produce the right code infloops in the DAG combiner. With
this patch alone, we produce terrible code but at least lower through
the right paths. With both patches, all the regressions here should be
fixed, and a bunch of the improvements (like using 2 shufps with no
memory loads instead of 2 andps with memory loads and an orps) will
stay. Win!

There is one other change worth noting here. We had hilariously wrong
vectorization cost estimates for vselect because we fell through to the
code path that assumed all "expand" vector operations are scalarized.
However, the "expand" lowering of VSELECT is vector bit math, most
definitely not scalarized. So now we go back to the correct if horribly
naive cost of "1" for "not scalarized". If anyone wants to add actual
modeling of shuffle costs, that would be cool, but this seems an
improvement on its own. Note the removal of 16 and 32 "costs" for doing
a blend. Even in SSE2 we can blend in fewer than 16 instructions. ;] Of
course, we don't right now because of OMG bad code, but I'm going to fix
that. Next patch. I promise.

llvm-svn: 229835
2015-02-19 10:36:19 +00:00
Michael Kuperstein ba5b04c798 Use std::bitset for SubtargetFeatures
Previously, subtarget features were a bitfield with the underlying type being uint64_t. 
Since several targets (X86 and ARM, in particular) have hit or were very close to hitting this bound, switching the features to use a bitset.

No functional change.

Differential Revision: http://reviews.llvm.org/D7065

llvm-svn: 229831
2015-02-19 09:01:04 +00:00
Eric Christopher d84f5d30e2 Remove the local subtarget variable from the SystemZ asm printer
and update the two calls accordingly.

llvm-svn: 229805
2015-02-19 01:26:28 +00:00
Eric Christopher 0795a2ef0c Remove a few more calls to TargetMachine::getSubtarget from the
R600 port.

llvm-svn: 229804
2015-02-19 01:10:55 +00:00
Eric Christopher 7edca437f5 Grab the subtarget off of the machine function for the R600
asm printer and clean up a bunch of uses.

llvm-svn: 229803
2015-02-19 01:10:53 +00:00
Eric Christopher 96caeda730 Remove the DisasmEnabled AsmPrinter variable and just look it
up on the subtarget where it's set anyhow than looking it up
2-3 times in the same place.

llvm-svn: 229802
2015-02-19 01:10:49 +00:00
Peter Collingbourne fb8002cbe0 MC: Remove NullStreamer hook, as it is redundant with NullTargetStreamer.
llvm-svn: 229799
2015-02-19 00:45:07 +00:00
Peter Collingbourne 20c7259ce9 Introduce Target::createNullTargetStreamer and use it from IRObjectFile.
A null MCTargetStreamer allows IRObjectFile to ignore target-specific
directives. Previously we were crashing.

Differential Revision: http://reviews.llvm.org/D7711

llvm-svn: 229797
2015-02-19 00:45:02 +00:00
Eric Christopher ca929f2469 Avoid using a self-referential initializer and fix up uses.
llvm-svn: 229790
2015-02-19 00:22:47 +00:00
Eric Christopher 111de895a0 80-column fixups.
llvm-svn: 229789
2015-02-19 00:15:33 +00:00
Eric Christopher 02389e3886 Remove all use of is64bit off of NVPTXSubtarget and clean up code
accordingly. This changes the constructors of a number of classes
that don't need to know the subtarget's 64-bitness.

llvm-svn: 229787
2015-02-19 00:08:27 +00:00
Eric Christopher beffc4e84f Remove all use of getDrvInterface off of NVPTXSubtarget and clean
up code accordingly. Delete code that was checking for all cases
of an enum.

llvm-svn: 229786
2015-02-19 00:08:23 +00:00
Eric Christopher 6aad8b1801 Migrate the NVPTX backend asm printer to a per function subtarget.
This involved moving two non-subtarget dependent features (64-bitness
and the driver interface) to the NVPTX target machine and updating
the uses (or migrating around the subtarget use for ease of review).
Otherwise use the cached subtarget or create a default subtarget
based on the TargetMachine cpu and feature string for the module
level assembler emission.

llvm-svn: 229785
2015-02-19 00:08:14 +00:00
Marek Olsak 9b8f32eed1 R600/SI: Fix READLANE and WRITELANE lane select for VI
VOP2 declares vsrc1, but VOP3 declares src1.
We can't use the same "ins" if the operands have different names in VOP2
and VOP3 encodings.

This fixes a hang in geometry shaders which spill M0 on VI.
(BTW it doesn't look like M0 needs spilling and the spilling seems
duplicated 3 times)

llvm-svn: 229752
2015-02-18 22:12:45 +00:00
Marek Olsak 8eeebcccb5 R600/SI: Simplify verification of AMDGPU::OPERAND_REG_INLINE_C
llvm-svn: 229751
2015-02-18 22:12:41 +00:00
Marek Olsak b8c818337d R600/SI: Remove explicit VOP operand checking
This should be handled by the OperandType checking.

llvm-svn: 229750
2015-02-18 22:12:37 +00:00
Jozef Kolek 3c6724f442 [mips][microMIPS] Make usage of ADDU16 and SUBU16 by code generator
Differential Revision: http://reviews.llvm.org/D7609

llvm-svn: 229706
2015-02-18 17:33:56 +00:00
Jozef Kolek 1fd6548297 [mips][microMIPS] Implement JALX instruction
Differential Revision: http://reviews.llvm.org/D5047

llvm-svn: 229702
2015-02-18 17:15:48 +00:00
Daniel Sanders 1779314e3c [mips] Add backend support for Mips32r[35] and Mips64r[35].
Summary:
These ISA's didn't add any instructions so they are almost identical to
Mips32r2 and Mips64r2. Even the ELF e_flags are the same, However the ISA
revision in .MIPS.abiflags is 3 or 5 respectively instead of 2.

Reviewers: vmedic

Reviewed By: vmedic

Subscribers: tomatabacu, llvm-commits, atanasyan

Differential Revision: http://reviews.llvm.org/D7381

llvm-svn: 229695
2015-02-18 16:24:50 +00:00
Kit Barton 298beb5e86 This patch adds the VSX logical instructions introduced in the Power ISA 2.07. It also removes the added complexity that favors VMX versions of the three instructions.
Phabricator review: http://reviews.llvm.org/D7616

Commiting on Nemanja's behalf.

llvm-svn: 229694
2015-02-18 16:21:46 +00:00
Tom Stellard 1ca873bbc5 R600/SI: Don't set isCodeGenOnly = 1 on all instructions
We only need to set this on pseudo instructions which won't
be used by the assembler.

llvm-svn: 229689
2015-02-18 16:08:17 +00:00
Tom Stellard c34c37ae66 R600/SI: Add missing VOP1 instructions
llvm-svn: 229688
2015-02-18 16:08:15 +00:00
Tom Stellard 894b9883f4 R600/SI: Add missing VOP2 instructions
llvm-svn: 229687
2015-02-18 16:08:14 +00:00
Tom Stellard 0c0008cb6e R600/SI: Add definition for S_CBRANCH_G_FORK
llvm-svn: 229686
2015-02-18 16:08:13 +00:00
Tom Stellard ce449ade7e R600/SI: Add missing SOP1 instructions
llvm-svn: 229685
2015-02-18 16:08:11 +00:00
Tom Stellard ee21faa029 R600/SI: Refactor SOP2 definitions
llvm-svn: 229684
2015-02-18 16:08:09 +00:00
Vasileios Kalintiris 611cb70b83 [mips] Avoid redundant sign extension of the result of binary bitwise instructions.
Reviewers: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7581

llvm-svn: 229675
2015-02-18 14:57:05 +00:00
Benjamin Kramer 6ca8992018 X86: Use bitset to manage a bag of bits. NFC.
Doesn't matter in terms of memory usage or perf here, but it's a neat
simplification.

llvm-svn: 229672
2015-02-18 14:10:44 +00:00
Toma Tabacu 8874eac5e6 [mips] [IAS] Fix using .cpsetup with local labels (PR22518).
Summary:
Parse for an MCExpr instead of an Identifier and use the symbol for relocations, not just the symbol's name.

This fixes errors when using local labels in .cpsetup (PR22518).

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: seanbruno, emaste, llvm-commits

Differential Revision: http://reviews.llvm.org/D7697

llvm-svn: 229671
2015-02-18 13:46:53 +00:00
Chandler Carruth bbb377c3a1 [x86] Tighten the assertions to document that canonicalization has
actually removed all but a *very* small number of choices for v2i64.
Also remove dead code handling cases that simply cannot arise.

llvm-svn: 229670
2015-02-18 11:46:29 +00:00
Chandler Carruth 811f0ee8c1 [x86] Switch an if which is trivially true to an assert. NFC
llvm-svn: 229669
2015-02-18 11:46:27 +00:00