Commit Graph

28026 Commits

Author SHA1 Message Date
NAKAMURA Takumi d8570e5bc2 Fix abuse of StringRef on ARM64SysReg::MRSMapper::toString(Val, Valid).
FIXME: Could we use SmallString here?
llvm-svn: 205950
2014-04-10 03:05:59 +00:00
Saleem Abdulrasool c5e0099ffc ARM64: add an explicit cast to silence a silly warning
GCC 4.8 complains with:
  warning: enumeral and non-enumeral type in conditional expression

Although this is silly and harmless in this case, add an explicit cast to
silence the warning.

llvm-svn: 205949
2014-04-10 02:48:10 +00:00
Juergen Ributzka 48c8c07d0a [ARM64] Fix immediate cost calculation for types larger than i64.
The immediate cost calculation code was hitting an assertion in the included
test case, because APInt was still internally 128-bits. Truncating it to 64-bits
fixed the issue.

Fixes <rdar://problem/16572521>.

llvm-svn: 205947
2014-04-10 01:36:59 +00:00
Reid Kleckner 2d4a69e9c9 Revert "For the ARM integrated assembler add checking of the alignments on vld/vst instructions. And report errors for alignments that are not supported."
It doesn't build with MSVC 2012, because MSVC doesn't allow union
members that have non-trivial default constructors.  This change added
'SMLoc AlignmentLoc' to MemoryOp, which made MemoryOp's default ctor
non-trivial.

This reverts commit r205930.

llvm-svn: 205944
2014-04-10 00:52:14 +00:00
Jim Grosbach e4fef71981 Add support for load folding of avx1 logical instructions
AVX supports logical operations using an operand from memory. Unfortunately
because integer operations were not added until AVX2 the AVX1 logical
operation's types were preventing the isel from folding the loads. In a limited
number of cases the peephole optimizer would fold the loads, but most were
missed. This patch adds explicit patterns with appropriate casts in order for
these loads to be folded.

The included test cases run on reduced examples and disable the peephole
optimizer to ensure the folds are being pattern matched.

Patch by Louis Gerbarg <lgg@apple.com>

rdar://16355124

llvm-svn: 205938
2014-04-09 23:39:25 +00:00
Kevin Enderby c296ecd96c For the ARM integrated assembler add checking of the
alignments on vld/vst instructions.  And report errors for
alignments that are not supported.

While this is a large diff and an big test case, the changes
are very straight forward.  But pretty much had to touch
all vld/vst instructions changing the addrmode to one of the
new ones that where added will do the proper checking for
the specific instruction.

rdar://11312406

llvm-svn: 205930
2014-04-09 21:32:59 +00:00
Chad Rosier 5f8d6a6c15 [AArch64] Implement the isZExtFree APIs.
llvm-svn: 205926
2014-04-09 20:51:21 +00:00
Chad Rosier 9ce19fb65c [AArch64] Implement the isTruncateFree API.
In AArch64 i64 to i32 truncate operation is a subregister access.

This allows more opportunities for LSR optmization to eliminate
variables of different types (i32 and i64).

llvm-svn: 205925
2014-04-09 20:43:40 +00:00
Bob Wilson ae89ddedff Simple fix for build failures resulting from r205867.
llvm-svn: 205918
2014-04-09 18:34:45 +00:00
Justin Holewinski 30d56a7b86 [NVPTX] Add preliminary intrinsics and codegen support for textures/surfaces
This commit adds intrinsics and codegen support for the surface read/write and texture read instructions that take an explicit sampler parameter. Codegen operates on image handles at the PTX level, but falls back to direct replacement of handles with kernel arguments if image handles are not enabled. Note that image handles are explicitly disabled for all target architectures in this change (to be enabled later).

llvm-svn: 205907
2014-04-09 15:39:15 +00:00
Justin Holewinski 9d852a8e08 [NVPTX] Add support for addrspacecast in global variable initializers, including emitting generic() when casting to address space 0.
llvm-svn: 205906
2014-04-09 15:39:11 +00:00
Justin Holewinski 5959695a16 [NVPTX] Add query support for read-write images and managed variables
This also fixes a bug in the annotation cache where the cache will not be cleared between modules if multiple modules are compiled in the same process.

llvm-svn: 205905
2014-04-09 15:38:52 +00:00
Alp Toker 16f98b255d Fix some doc and comment typos
llvm-svn: 205899
2014-04-09 14:47:27 +00:00
Bradley Smith 246b0b617d [ARM64] Change SYS without a register to an alias to make disassembling more consistant.
llvm-svn: 205898
2014-04-09 14:44:58 +00:00
Bradley Smith 2cef19a2e6 [ARM64] Correctly disassemble ISB operand as ISB not DBarrier.
llvm-svn: 205897
2014-04-09 14:44:54 +00:00
Bradley Smith 239120cada [ARM64] Properly support both apple and standard syntax for FMOV
llvm-svn: 205896
2014-04-09 14:44:49 +00:00
Bradley Smith a2308f47d3 [ARM64] Flag setting logical/add/sub immediate instructions don't use SP.
llvm-svn: 205895
2014-04-09 14:44:44 +00:00
Bradley Smith f280e91849 [ARM64] Conditional branches must always print their condition code, even AL.
llvm-svn: 205894
2014-04-09 14:44:39 +00:00
Bradley Smith a19b7e83dc [ARM64] Fix disassembly logic for extended loads/stores with 32-bit registers.
llvm-svn: 205893
2014-04-09 14:44:36 +00:00
Bradley Smith a0d7a9a12f [ARM64] When printing a pre-indexed address with #0, the ', #0' is not optional.
llvm-svn: 205892
2014-04-09 14:44:31 +00:00
Bradley Smith 70c6acbbfd [ARM64] Add missing shifted register MVN alias to ORN
llvm-svn: 205891
2014-04-09 14:44:26 +00:00
Bradley Smith 403bbf95c0 [ARM64] SXTW/UXTW are only valid aliases for 32-bit operations.
llvm-svn: 205890
2014-04-09 14:44:22 +00:00
Bradley Smith 779238a216 [ARM64] Fix canonicalisation of MOVs. MOV is too complex to be modelled by a dumb alias.
llvm-svn: 205889
2014-04-09 14:44:18 +00:00
Bradley Smith f823079acd [ARM64] Fixup ADR/ADRP parsing such that they accept immediates and all labels types
llvm-svn: 205888
2014-04-09 14:44:12 +00:00
Bradley Smith af2710c96f [ARM64] Ensure sp is decoded as SP, not XZR in LD1 instructions.
llvm-svn: 205887
2014-04-09 14:44:07 +00:00
Bradley Smith a0dce246ed [ARM64] Tighten up the special casing in emitting arithmetic extends. UXTW should only be translated when the instruction uses WSP, not SP. Vice versa for UXTX and 64-bit instructions.
llvm-svn: 205886
2014-04-09 14:44:03 +00:00
Bradley Smith 3971d3dc75 [ARM64] Rename LR to the UAL-compliant 'X30'.
llvm-svn: 205885
2014-04-09 14:43:59 +00:00
Bradley Smith 6f1aa59c31 [ARM64] Rename FP to the UAL-compliant 'X29'.
llvm-svn: 205884
2014-04-09 14:43:50 +00:00
Bradley Smith 5511f08055 [ARM64] Add a PostEncoderMethod to FCMP - the Rm field should canonically be zero but should be decoded/disassembled with any value.
llvm-svn: 205883
2014-04-09 14:43:40 +00:00
Bradley Smith eb4ca04db2 [ARM64] SCVTF and FCVTZS/U are undefined if scale<5> == 0.
llvm-svn: 205882
2014-04-09 14:43:35 +00:00
Bradley Smith db7b9b17eb [ARM64] EXT and EXTR instructions on v8i8 and W regs respectively must have the top bit of their immediate clear.
llvm-svn: 205881
2014-04-09 14:43:31 +00:00
Bradley Smith 60e7667886 [ARM64] Scaled fixed-point FCVTZSs should also have bit 29 set to zero.
llvm-svn: 205880
2014-04-09 14:43:27 +00:00
Bradley Smith 7525b47208 [ARM64] UBFM/BFM is undefined on w registers when imms<5> or immr<5> is 1.
llvm-svn: 205879
2014-04-09 14:43:24 +00:00
Bradley Smith 0243aa33fa [ARM64] Floating point to fixed point scaled conversions are only available on fcvtzs and fcvtzu.
llvm-svn: 205878
2014-04-09 14:43:20 +00:00
Bradley Smith 8f906a3c5f [ARM64] Port over the PostEncoderMethod fix for SMULH/UMULH from AArch64.
llvm-svn: 205877
2014-04-09 14:43:15 +00:00
Bradley Smith 9f29b726d5 [ARM64] Add missing tlbi operands and error for extra/missing register on tlbi aliases.
llvm-svn: 205876
2014-04-09 14:43:11 +00:00
Bradley Smith e8b4166acc [ARM64] Rework system register parsing to overcome SPSel clash in MSR variants.
llvm-svn: 205875
2014-04-09 14:43:06 +00:00
Bradley Smith bc35b1f138 [ARM64] Port over the PostEncoderMethod from AArch64 for exclusive loads and stores, so the unused register fields are set to all-ones canonically but are recognised with any value.
llvm-svn: 205874
2014-04-09 14:43:01 +00:00
Bradley Smith 4925be9b56 [ARM64] Use PStateMapper to ensure that MSRcpsr operands are validated during disassembly.
llvm-svn: 205873
2014-04-09 14:42:56 +00:00
Bradley Smith 3339427e2a [ARM64] Remove PrefetchOp and use ARM64PRFM instead.
llvm-svn: 205872
2014-04-09 14:42:53 +00:00
Bradley Smith 16478c4ccf [ARM64] Add WZR to isGPR32Register, since every use needs to check for this anyway.
llvm-svn: 205871
2014-04-09 14:42:49 +00:00
Bradley Smith 3db2a85853 [ARM64] Remove ARM64SYS.
llvm-svn: 205870
2014-04-09 14:42:45 +00:00
Bradley Smith fb90df563f [ARM64] Move CPSRField and DBarrier operands over to AArch64-style disassembly and assembly. This removes the last users of namespace ARM64SYS.
llvm-svn: 205869
2014-04-09 14:42:42 +00:00
Bradley Smith 08c391c156 [ARM64] Switch the decoder, disassembler, instprinter and asmparser over to using AArch64-style system registers, and fix up test failures discovered in the process.
llvm-svn: 205868
2014-04-09 14:42:36 +00:00
Bradley Smith 2ba17a4a17 [ARM64] Move ARM64BaseInfo.{cpp,h} into a Utils/ subdirectory, a la AArch64. These files are required in the decoder, disassembler and parser, and a layering violation was imminent.
llvm-svn: 205867
2014-04-09 14:42:27 +00:00
Bradley Smith ceeb04df60 [ARM64] Copy the named immediate operand mapping logic and enums from AArch64. AArch64's named immediate mapping and parsing is much more advanced than ARM64's. No functionality change - they're currently living side by side while I switch uses over.
llvm-svn: 205866
2014-04-09 14:42:16 +00:00
Bradley Smith 8c0b88c987 [ARM64] Shifted register ALU ops are reserved if sf=0 and imm6<5>=1, and also (for add/sub only) if shift=11.
llvm-svn: 205865
2014-04-09 14:42:11 +00:00
Bradley Smith 527bf86e56 [ARM64] Add support for NV condition code (exists only for valid assembly/disassembly, equivilant to AL)
llvm-svn: 205864
2014-04-09 14:42:07 +00:00
Bradley Smith 6d7af17a3f [ARM64] Add missing 1Q -> 1q vector kind alias
llvm-svn: 205863
2014-04-09 14:42:01 +00:00
Bradley Smith 7d253f29a4 [ARM64] Add parsing for vector lists such as {v0.8b-v3.8b}
llvm-svn: 205862
2014-04-09 14:41:58 +00:00
Bradley Smith 664aa67153 [ARM64] Correctly alias LSL to UXTW for 32bit instruction variants, rather than UXTX
llvm-svn: 205861
2014-04-09 14:41:53 +00:00
Bradley Smith 35cadc58c9 [ARM64] STRHro and STRBro were not being decoded at all.
llvm-svn: 205860
2014-04-09 14:41:49 +00:00
Bradley Smith 87c60e00d5 [ARM64] MOVK with sf=0 and hw<1>=1 is unallocated. Shift amount for ADD/SUB instructions is unallocated if shift > 4.
llvm-svn: 205859
2014-04-09 14:41:45 +00:00
Bradley Smith cd91e5cd0c [ARM64] Register-offset loads and stores with the 'option' field equal to 00x or 10x are undefined.
llvm-svn: 205858
2014-04-09 14:41:38 +00:00
Elena Demikhovsky cf0b9bafc3 AVX-512: insert element to mask vector; store i1 data
Implemented INSERT_VECTOR_ELT operation for v16i1 and v8i1 vectors;
Implemented "store" for i1 type

llvm-svn: 205850
2014-04-09 12:37:50 +00:00
Daniel Sanders b282f1fec5 Re-commit: [mips] abs.[ds], and neg.[ds] should be allowed regardless of -enable-no-nans-fp-math
Summary:
They behave in accordance with the Has2008 and ABS2008 configuration bits of the processor which are used to select between the 1985 and 2008 versions of IEEE 754. In 1985 mode, these instructions are arithmetic (i.e. they raise invalid operation exceptions when given NaN), in 2008 mode they are non-arithmetic (i.e. they are copies).

nmadd.[ds], and nmsub.[ds] are still subject to -enable-no-nans-fp-math because the ISA spec does not explicitly state that they obey Has2008 and ABS2008.

Fixed the issue with the previous version of this patch (r205628). A pre-existing 'let Predicate =' statement was removing some predicates that were necessary for FP64 to behave correctly.

Reviewers: matheusalmeida

Reviewed By: matheusalmeida

Differential Revision: http://llvm-reviews.chandlerc.com/D3274

llvm-svn: 205844
2014-04-09 09:56:43 +00:00
Matt Arsenault 2c33562cd6 R600/SI: Match not instruction.
llvm-svn: 205837
2014-04-09 07:16:16 +00:00
Tim Northover b36d428d27 ARM64: scalarize v1i64 mul operation
This is the second part of fixing PR19367.

llvm-svn: 205836
2014-04-09 07:07:02 +00:00
Tim Northover b430cf6681 ARM64: add pattern for <1 x i64> custom not node.
This should fix PR19367.

llvm-svn: 205835
2014-04-09 06:55:39 +00:00
Saleem Abdulrasool fedfc2a63f ARM MC: 80 column
llvm-svn: 205833
2014-04-09 06:18:26 +00:00
Saleem Abdulrasool 5019a978ae ARM MC: sort source files in CMakeLists
llvm-svn: 205832
2014-04-09 06:18:23 +00:00
Craig Topper 8d399f87af [C++11] Replace some comparisons with 'nullptr' with simple boolean checks to reduce verbosity.
llvm-svn: 205829
2014-04-09 04:20:00 +00:00
Juergen Ributzka c11e8b67bb [Constant Hoisting][ARM64] Enable constant hoisting for ARM64.
This implements the target-hooks for ARM64 to enable constant hoisting.

This fixes <rdar://problem/14774662> and <rdar://problem/16381500>.

llvm-svn: 205791
2014-04-08 20:39:59 +00:00
Hal Finkel a775e51274 [PowerPC] Don't return false from PPC::isVSLDOIShuffleMask
PPC::isVSLDOIShuffleMask should return -1, not false, when the shuffle
predicate should be false.

Noticed by inspection; no test case (yet).

llvm-svn: 205787
2014-04-08 19:00:27 +00:00
Kevin Enderby d88fec3d3a Fix the ARM VLD3 (single 3-element structure to all lanes)
size 16 double-spaced registers instruction printing.

This:
	vld3.16 {d0[], d2[], d4[]}, [r4]!

was being printed as:

	vld3.16	{d0[], d1[], d2[]}, [r4]!

rdar://16531387

llvm-svn: 205779
2014-04-08 18:00:52 +00:00
NAKAMURA Takumi 35289340de X86MCAsmInfoGNUCOFF: Set PointerSize as 8 for targeting x64. It caused DW_LNE_set_address was misemitted on x64.
FIXME: I haven't investigate whether CalleeSaveStackSlotSize should be 8.
llvm-svn: 205772
2014-04-08 15:28:50 +00:00
Tim Northover 33d07468bc ARM64: fix fmsub patterns which assumed accum operand was first
Confusingly, the NEON fmla instructions put the accumulator first but the
scalar versions put it at the end (like the fma lib function & LLVM's
intrinsic).

This should fix PR19345, assuming there's only one issue.

llvm-svn: 205758
2014-04-08 12:23:51 +00:00
Elena Demikhovsky 3dcfbdfa54 AVX-512: Added fp_to_uint and uint_to_fp patterns.
llvm-svn: 205754
2014-04-08 07:24:02 +00:00
David Majnemer c9d2625586 X86: Split the relocation selection up
Before, we would have conditional operators where one side of the
operator would be of type RelocationTypeAMD64 and the other is of type
RelocationTypeI386.  GCC would noisly warn with -Wenum-compare
diagnostic.

Instead, refactor the code so it is more like the X86 ELF object writer.

llvm-svn: 205752
2014-04-08 02:15:13 +00:00
Jim Grosbach e75c048ab9 Tidy up comments a bit.
Punctuation, grammar, formatting, etc..

llvm-svn: 205749
2014-04-07 23:47:23 +00:00
Jim Grosbach 75010e7712 ARM64: Range based for loop in ARM64PromoteConstant pass
llvm-svn: 205748
2014-04-07 23:47:21 +00:00
Jim Grosbach 64a28e70c8 ARM64: Clean up file header comment a bit.
llvm-svn: 205747
2014-04-07 23:14:38 +00:00
Reed Kotler 735da8e015 Reverting commit r205628 due to mips64 issues.
llvm-svn: 205741
2014-04-07 22:11:40 +00:00
Tom Stellard 204e61bbdf R600/SI: Handle INSERT_SUBREG in SIFixSGPRCopies
llvm-svn: 205732
2014-04-07 19:45:45 +00:00
Tom Stellard 50122a5890 R600: Match 24-bit arithmetic patterns in a Target DAGCombine
Moving these patterns from TableGen files to PerformDAGCombine()
should allow us to generate better code by eliminating unnecessary
shifts and extensions earlier.

This also fixes a bug where the MAD pattern was calling
SimplifyDemandedBits with a 24-bit mask on the first operand
even when the full pattern wasn't being matched.  This occasionally
resulted in some instructions being incorrectly deleted from the
program.

v2:
  - Fix bug with 64-bit mul

llvm-svn: 205731
2014-04-07 19:45:41 +00:00
Tom Stellard 3cbe014027 R600: Replace dyn_cast + assert with cast
llvm-svn: 205730
2014-04-07 19:31:13 +00:00
Matt Arsenault 4be76e99fe Use std::swap
llvm-svn: 205723
2014-04-07 16:44:26 +00:00
Matt Arsenault 7939acd7fa Use .data() instead of &x[0]
llvm-svn: 205722
2014-04-07 16:44:24 +00:00
David Blaikie 2f7711242a MachineInstr: introduce explicit_operands and implicit_operands ranges
Makes iteration over implicit and explicit machine operands more
explicit (har har). Insipired by code review discussion for r205565.

llvm-svn: 205680
2014-04-05 22:42:04 +00:00
Saleem Abdulrasool dd979e6457 ARM: consolidate MachO checks for ARM asm parser
This consolidates the duplicated MachO checks in the directive parsing for
various directives that are unsupported for Mach-O.  The error message change is
unimportant as this restores the behaviour to that prior to the addition of the
new directive handling.  Furthermore, use a more direct check for MachO
targeting rather than an indirect feature check of the assembler.

Also simplify the test execution command to avoid temporary files.  Further more,
perform the check in both object and assembly emission.

Whether all non-applicable directives are handled is another question.  .fnstart
is marked as being unsupported, however, the complementary .fnend is not.  The
additional unwinding directives are also still honoured.  This change does not
change that, though, it would be good to validate and mark them as being
unsupported if they are unsupported for the MachO emission.

llvm-svn: 205678
2014-04-05 22:09:51 +00:00
Hal Finkel 41e9b1d559 [PowerPC] Remove unused TM member variable to unbreak build
Fix "error: private field 'TM' is not used [-Werror,-Wunused-private-field]"

llvm-svn: 205660
2014-04-05 00:16:28 +00:00
Hal Finkel de0b413ec0 [PowerPC] Adjust load/store costs in PPCTTI
This provides more realistic costs for the insert/extractelement instructions
(which are load/store pairs), accounts for the cheap unaligned Altivec load
sequence, and for unaligned VSX load/stores.

Bad news:
MultiSource/Applications/sgefa/sgefa - 35% slowdown (this will require more investigation)
SingleSource/Benchmarks/McGill/queens - 20% slowdown (we no longer vectorize this, but it was a constant store that was scalarized)
MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2 - 2% slowdown

Good news:
SingleSource/Benchmarks/Shootout/ary3 - 54% speedup
SingleSource/Benchmarks/Shootout-C++/ary - 40% speedup
MultiSource/Benchmarks/Ptrdist/ks/ks - 35% speedup
MultiSource/Benchmarks/FreeBench/neural/neural - 30% speedup
MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt - 20% speedup

Unfortunately, estimating the costs of the stack-based scalarization sequences
is hard, and adjusting these costs is like a game of whac-a-mole :( I'll
revisit this again after we have better codegen for vector extloads and
truncstores and unaligned load/stores.

llvm-svn: 205658
2014-04-04 23:51:18 +00:00
Hal Finkel b1308d525c [PowerPC] PPCTTI Cleanup
Remove the declaration of an unimplemented function.

llvm-svn: 205657
2014-04-04 23:51:11 +00:00
Matt Arsenault cf6f688a40 Add DAG parameter to ComputeNumSignBitsForTargetNode
This way, you can check the number of sign bits in the
operands. The depth parameter it already has is pretty useless
without this.

llvm-svn: 205649
2014-04-04 20:13:13 +00:00
Matt Arsenault 5e1e4316c4 Fix tabs
llvm-svn: 205648
2014-04-04 20:13:08 +00:00
Kai Nacke 6da86e8529 [mips] Add Octeon cnMips instructions seqi/snei and v3mulu/vmm0/vmulu.
This patch adds the Octeon cnMips instructions seqi/snei and v3mulu/vmm0/vmulu.
It is only for the assembler. Test case is included.

Reviewed by: Daniel.Sanders@imgtec.com

llvm-svn: 205631
2014-04-04 16:21:59 +00:00
Hal Finkel fbf7e2a1a1 [PowerPC] Add a full condition code register to make the "cc" clobber work
gcc inline asm supports specifying "cc" as a clobber of all condition
registers. Add just enough modeling of the full register to make this work.
Fixed PR19326.

llvm-svn: 205630
2014-04-04 15:15:57 +00:00
Daniel Sanders d4341a0ad7 [mips] abs.[ds], and neg.[ds] should be allowed regardless of -enable-no-nans-fp-math
Summary:
They behave in accordance with the Has2008 and ABS2008 configuration bits of the
processor which are used to select between the 1985 and 2008 versions of IEEE
754. In 1985 mode, these instructions are arithmetic (i.e. they raise invalid
operation exceptions when given NaN), in 2008 mode they are non-arithmetic
(i.e. they are copies).

nmadd.[ds], and nmsub.[ds] are still subject to -enable-no-nans-fp-math because
the ISA spec does not explicitly state that they obey Has2008 and ABS2008.

Reviewers: matheusalmeida

Reviewed By: matheusalmeida

Differential Revision: http://llvm-reviews.chandlerc.com/D3274

llvm-svn: 205628
2014-04-04 14:52:54 +00:00
Tim Northover 07a8ff4892 ARM64: handle v1i1 types arising from setcc properly.
There were several overlapping problems here, and this solution is
closely inspired by the one adopted in AArch64 in r201381.

Firstly, scalarisation of v1i1 setcc operations simply fails if the
input types are legal. This is fixed in LegalizeVectorTypes.cpp this
time, and allows AArch64 code to be simplified slightly.

Second, vselect with such a setcc feeding into it ends up in
ScalarizeVectorOperand, where it's not handled. I experimented with an
implementation, but found that whatever DAG came out was rather
horrific. I think Hao's DAG combine approach is a good one for
quality, though there are edge cases it won't catch (to be fixed
separately).

Should fix PR19335.

llvm-svn: 205625
2014-04-04 14:49:21 +00:00
Stepan Dyatkovskiy 3f1fa3d545 Fix for PR18921 (LDRD/STRD part)::
Removed "GNU Assembler extension (compatibility)" definitions from ARMInstrInfo.td
Fixed ARMAsmParser::ParseInstruction GNU compatability branch, so it also works for thumb mode from now.
Added new tests.

llvm-svn: 205622
2014-04-04 10:17:56 +00:00
Tim Northover 85d6a16c46 ARM64: use regalloc-friendly COPY_TO_REGCLASS for bitcasts
The previous patterns directly inserted FMOV or INS instructions into
the DAG for scalar_to_vector & bitconvert patterns. This is horribly
inefficient and can generated lots more GPR <-> FPR register traffic
than necessary.

It's much better to emit instructions the register allocator
understands so it can coalesce the copies when appropriate.

It led to at least one ISelLowering hack to avoid the problems, which
was incorrect for v1i64 (FPR64 has no dsub). It can now be removed
entirely.

This should also fix PR19331.

llvm-svn: 205616
2014-04-04 09:03:09 +00:00
Tim Northover 1e4f2c5e5f ARM64: add 128-bit MLA operations to the custom selection code.
Without this change, the llvm_unreachable kicked in. The code pattern
being spotted is rather non-canonical for 128-bit MLAs, but it can
happen and there's no point in generating sub-optimal code for it just
because it looks odd.

Should fix PR19332.

llvm-svn: 205615
2014-04-04 09:03:02 +00:00
Stepan Dyatkovskiy a09bd2379c Fixed register class in STRD instruction for Thumb2 mode.
llvm-svn: 205612
2014-04-04 08:14:13 +00:00
Craig Topper 840beec2d0 Make consistent use of MCPhysReg instead of uint16_t throughout the tree.
llvm-svn: 205610
2014-04-04 05:16:06 +00:00
Jim Grosbach 537f3ed838 ARM: Range based for-loop over block predecessors.
No functional change.

llvm-svn: 205604
2014-04-04 02:11:03 +00:00
Jim Grosbach f92e8f5a8b ARM: Use range-based for loops in frame lowering.
No functional change.

llvm-svn: 205602
2014-04-04 02:10:55 +00:00
Quentin Colombet 9c816f39ad Revert r205599, the commit was not intended to have so many changes
llvm-svn: 205600
2014-04-04 02:02:49 +00:00
Quentin Colombet 7ee4e79dec [RegAllocGreedy][Last Chance Recoloring] Emit diagnostics when last chance
recoloring cut-offs are hit.

This is related to PR18747.

Patch by MAYUR PANDEY <mayur.p@samsung.com>

llvm-svn: 205599
2014-04-04 01:58:57 +00:00
Saleem Abdulrasool a7a8a3e3ee MIPS: remove vim swap file
llvm-svn: 205595
2014-04-04 01:19:54 +00:00
Jim Grosbach b8bd4a5e2a Tidy up. Space before ':' in range-based for loops.
llvm-svn: 205585
2014-04-03 23:43:26 +00:00
Jim Grosbach bb1af943bb Tidy up. 80 columns.
llvm-svn: 205584
2014-04-03 23:43:22 +00:00
Jim Grosbach 1a59711505 Tidy up. Trailing whitespace.
llvm-svn: 205583
2014-04-03 23:43:18 +00:00
Jim Grosbach e04eb1dc12 Fix typo.
llvm-svn: 205582
2014-04-03 23:43:12 +00:00
Eli Bendersky bbef172f19 Optimize away unnecessary address casts.
Removes unnecessary casts from non-generic address spaces to the generic address
space for certain code patterns.

Patch by Jingyue Wu.

llvm-svn: 205571
2014-04-03 21:18:25 +00:00
Lang Hames cb74fa696b [ARM64] Teach the ARM64DeadRegisterDefinition pass to respect implicit-defs.
When rematerializing through truncates, the coalescer may produce instructions
with dead defs, but live implicit-defs of subregs:
E.g.
  %X1<def,dead> = MOVi64imm 2, %W1<imp-def>; %X1:GPR64, %W1:GPR32

These instructions are live, and their definitions should not be rewritten.

Fixes <rdar://problem/16492408>

llvm-svn: 205565
2014-04-03 20:51:08 +00:00
Tom Stellard a0150cb6a9 R600: Correct opcode for BFE_INT
Acording to AMD documentation, the correct opcode for
BFE_INT is 0x5, not 0x4

Fixes Arithm/Absdiff.Mat/3 OpenCV test

Patch by: Bruno Jiménez

llvm-svn: 205562
2014-04-03 20:19:29 +00:00
Tom Stellard 7ed0b5235a R600/SI: Lower 64-bit immediates using REG_SEQUENCE
llvm-svn: 205561
2014-04-03 20:19:27 +00:00
Tim Northover 01b4aa9437 ARM: tell LLVM about zext properties of ldrexb/ldrexh
Implementing this via ComputeMaskedBits has two advantages:
  + It actually works. DAGISel doesn't deal with the chains properly
    in the previous pattern-based solution, so they never trigger.
  + The information can be used in other DAG combines, as well as the
    trivial "get rid of truncs". For example if the trunc is in a
    different basic block.

rdar://problem/16227836

llvm-svn: 205540
2014-04-03 15:10:35 +00:00
Daniel Sanders 442f1a12f1 [mips] Implement ehb, ssnop, and pause in assembler
Summary: Add negative tests for pause

Reviewers: matheusalmeida

Reviewed By: matheusalmeida

Differential Revision: http://llvm-reviews.chandlerc.com/D3246

llvm-svn: 205537
2014-04-03 13:21:51 +00:00
Tim Northover 70450c59a4 ARM: skip cmpxchg failure barrier if ordering is monotonic.
The terminal barrier of a cmpxchg expansion will be either Acquire or
SequentiallyConsistent. In either case it can be skipped if the
operation has Monotonic requirements on failure.

rdar://problem/15996804

llvm-svn: 205535
2014-04-03 13:06:54 +00:00
Zoran Jovanovic cabf0f41e0 Implementation of 16-bit microMIPS instructions MFHI and MFLO.
Differential Revision: http://llvm-reviews.chandlerc.com/D3141

llvm-svn: 205532
2014-04-03 12:47:34 +00:00
Daniel Sanders f7b32291ad [mips] Add initial (experimental) MIPS-IV support.
Summary:
Adds the 'mips4' processor and a simple test of the ELF e_flags.

Patch by David Chisnall
His work was sponsored by: DARPA, AFRL

I made one small change to the testcase so that it uses
mips64-unknown-linux instead of mips4-unknown-linux.

This patch indirectly adds FeatureCondMov to FeatureMips64. This is ok
because it's supposed to be there anyway and it turns out that
FeatureCondMov is not a predicate of any instructions at the moment
(this is a bug that hasn't been noticed because there are no targets
without the conditional move instructions yet).

CC: theraven

Differential Revision: http://llvm-reviews.chandlerc.com/D3244

llvm-svn: 205530
2014-04-03 12:13:36 +00:00
Zoran Jovanovic 842f20ef0b MicroMIPS specific little endian fixup data byte ordering.
Differential Revision: http://llvm-reviews.chandlerc.com/D3245

llvm-svn: 205528
2014-04-03 12:01:01 +00:00
Tim Northover c882eb0723 ARM: expand atomic ldrex/strex loops in IR
The previous situation where ATOMIC_LOAD_WHATEVER nodes were expanded
at MachineInstr emission time had grown to be extremely large and
involved, to account for the subtly different code needed for the
various flavours (8/16/32/64 bit, cmpxchg/add/minmax).

Moving this transformation into the IR clears up the code
substantially, and makes future optimisations much easier:

1. an atomicrmw followed by using the *new* value can be more
   efficient. As an IR pass, simple CSE could handle this
   efficiently.
2. Making use of cmpxchg success/failure orderings only has to be done
   in one (simpler) place.
3. The common "cmpxchg; did we store?" idiom can be exposed to
   optimisation.

I intend to gradually improve this situation within the ARM backend
and make sure there are no hidden issues before moving the code out
into CodeGen to be shared with (at least ARM64/AArch64, though I think
PPC & Mips could benefit too).

llvm-svn: 205525
2014-04-03 11:44:58 +00:00
Stepan Dyatkovskiy 6207a4dadc PR19320:
The trouble as in ARMAsmParser, in ParseInstruction method. It assumes that ARM::R12 + 1 == ARM::SP.
It is wrong, since ARM::<Register> codes are generated by tablegen and actually could be any random numbers.

llvm-svn: 205524
2014-04-03 11:29:15 +00:00
Silviu Baranga a3106e6847 [ARM] When generating a vpaddl node the input lane type is not always the type of the
add operation since extract_vector_elt can perform an extend operation. Get the input lane
type from the vector on which we're performing the vpaddl operation on and extend or
truncate it to the output type of the original add node.

llvm-svn: 205523
2014-04-03 10:44:27 +00:00
Sasa Stankovic 06c4780311 [mips] Extend MipsMCExpr class to handle %higher(sym1 - sym2 + const) and
%highest(sym1 - sym2 + const) relocations. Remove "ABS_" from VK_Mips_HI
and VK_Mips_LO enums in MipsMCExpr, to be consistent with VK_Mips_HIGHER
and VK_Mips_HIGHEST.

This change also deletes test file test/MC/Mips/higher_highest.ll and moves
its CHECK's to the new test file test/MC/Mips/higher-highest-addressing.s.
The deleted file tests that R_MIPS_HIGHER and R_MIPS_HIGHEST relocations are
emitted in the .o file. Since it uses -force-mips-long-branch option, it was
created when MipsLongBranch's implementation was emitting R_MIPS_HIGHER and
R_MIPS_HIGHEST relocations in the .o file. It was disabled when MipsLongBranch
started to directly calculate offsets.

Differential Revision: http://llvm-reviews.chandlerc.com/D3230

llvm-svn: 205522
2014-04-03 10:37:45 +00:00
Tim Northover 2ad88d3aab ARM64: always use i64 for the RHS of shift operations
Switching between i32 and i64 based on the LHS type is a good idea in
theory, but pre-legalisation uses i64 regardless of our choice,
leading to potential ISel errors.

Should fix PR19294.

llvm-svn: 205519
2014-04-03 09:26:16 +00:00
Oliver Stannard 92e0fc0484 ARM: Use __STACK_LIMIT symbol for segmented stacks
We cannot use STACK_LIMIT, as it is not reserved for the compiler
by the C spec.

llvm-svn: 205516
2014-04-03 08:45:16 +00:00
Tim Northover c7c6a93704 ARM64: don't generate __sincos_stret calls unless on MachO
This should fix PR19314.

llvm-svn: 205514
2014-04-03 07:06:13 +00:00
Lang Hames c59a2d0529 [X86] As per suggestion from Craig Topper and Hal Finkel, override
TargetInstrInfo::findCommutedOpIndices to enable VFMA*231 commutation, rather
than abusing commuteInstruction.

Thanks very much for the suggestion guys!

llvm-svn: 205489
2014-04-02 23:57:49 +00:00
Hal Finkel f823380a44 [PowerPC] Make PPCTTI::getMemoryOpCost call BasicTTI::getMemoryOpCost
PPCTTI::getMemoryOpCost will now make use of BasicTTI::getMemoryOpCost to
calculate the base cost of the memory access, and then adjust on top of that.
There is no functionality change from this modification, but it will become
important so that PPCTTI can take advantage of scalarization information for which
BasicTTI::getMemoryOpCost will account in the near future.

llvm-svn: 205476
2014-04-02 22:43:49 +00:00
Lang Hames c2c751312e [X86] Make the VFMA*231 variants commutable and relax the alignment restrictions
on FMA3 memory operands. FMA3 instructions are VEX encoded, so they can load
from unaligned memory.

Testcase to follow, along with related patch.

<rdar://problem/16478629>

llvm-svn: 205472
2014-04-02 22:06:16 +00:00
Juergen Ributzka 27435b3b8a Add comments and test case for [X86TTI] Make constant base pointers for GetElementPtr opaque (r204739).
llvm-svn: 205468
2014-04-02 21:45:36 +00:00
Saleem Abdulrasool cd1308296e ARM: update subtarget information for Windows on ARM
Update the subtarget information for Windows on ARM.  This enables using the MC
layer to target Windows on ARM.

llvm-svn: 205459
2014-04-02 20:32:05 +00:00
Jim Grosbach 2a2459f365 Make a few more range-based loops use explicit types.
No functional change.

llvm-svn: 205458
2014-04-02 20:21:22 +00:00
Tom Stellard 36a031870b TargetLibraryInfo: Disable memcpy and memset on R600
There are no implementations of these for R600.

llvm-svn: 205455
2014-04-02 19:53:29 +00:00
Jim Grosbach 36c4953348 Simplify resolveFrameIndex() signature.
Just pass a MachineInstr reference rather than an MBB iterator.
Creating a MachineInstr& is the first thing every implementation did
anyway.

llvm-svn: 205453
2014-04-02 19:28:18 +00:00
Jim Grosbach 4a1a9ce5e6 ARM: cortex-m0 doesn't support unaligned memory access.
Unlike other v6+ processors, cortex-m0 never supports unaligned accesses.
From the v6m ARM ARM:

"A3.2 Alignment support: ARMv6-M always generates a fault when an unaligned
access occurs."

rdar://16491560

llvm-svn: 205452
2014-04-02 19:28:13 +00:00
Jim Grosbach df1e05bb8a Make some range based loop types more explicit.
No functional change, but more readable code.

llvm-svn: 205451
2014-04-02 19:28:08 +00:00
Kai Nacke 13673ac704 [mips] Add more Octeon cnMips instructions
Adds the instructions ext/ext32/cins/cins32.
It also changes pop/dpop to accept the two operand version and
adds a simple pattern to generate baddu.
Tests for the two operand versions (including baddu/dmul/dpop/pop)
and the code generation pattern for baddu are included.

Reviewed by: Daniel.Sanders@imgtec.com

llvm-svn: 205449
2014-04-02 18:40:43 +00:00
Jim Grosbach 20b0790df7 [C++11,ARM64] Range based for and explicit 'override' in STP cleanup.
No functional change intended.

llvm-svn: 205446
2014-04-02 18:00:59 +00:00
Jim Grosbach 05abd709f3 [C++11,ARM64] Range based for loops in constant promotion.
No functional change intended.

llvm-svn: 205445
2014-04-02 18:00:56 +00:00
Jim Grosbach 7dc9edeaa5 [C++11,ARM64] Range based for loops in load/store pair optimizer.
No functional change intended.

llvm-svn: 205444
2014-04-02 18:00:53 +00:00
Jim Grosbach 020e657790 [C++11,ARM64] Range based for loops in target lowering.
No functional change intended.

llvm-svn: 205443
2014-04-02 18:00:51 +00:00
Jim Grosbach 91f1f47751 [C++11,ARM64] Range based for loops in frame lowering.
No functional change intended.

llvm-svn: 205442
2014-04-02 18:00:49 +00:00
Jim Grosbach f39d752b03 [C++11,ARM64] Range based for loops in pseudo expansion.
No functional change intended.

llvm-svn: 205441
2014-04-02 18:00:46 +00:00
Jim Grosbach 673825ebac [C++11,ARM64] Range based for loops for LOH
No functional change intended.

llvm-svn: 205440
2014-04-02 18:00:44 +00:00
Jim Grosbach 2539c3d07a [C++11,ARM64] Range based for loops TLS cleanup.
No functional change intended.

llvm-svn: 205439
2014-04-02 18:00:41 +00:00
Jim Grosbach 0d0c5a614a [C++11,ARM64] Range based for loops in branch relaxation.
No functional change intended.

llvm-svn: 205438
2014-04-02 18:00:39 +00:00
Jim Grosbach 1c762ca9bd [C++11,ARM64] Range based for loops in address type promotion.
No functional change intended.

llvm-svn: 205437
2014-04-02 18:00:36 +00:00
Quentin Colombet 7bf9d8cd13 [ARM64][CollectLOH] Remove the link to the radar from the comments.
llvm-svn: 205435
2014-04-02 16:40:49 +00:00
Oliver Stannard b14c625111 ARM: Add support for segmented stacks
Patch by Alex Crichton, ILyoan, Luqman Aden and Svetoslav.

llvm-svn: 205430
2014-04-02 16:10:33 +00:00
Tim Northover 6d69168ffd ARM64: use GOT for weak symbols & PIC.
Weak symbols cannot use the small code model's usual ADRP sequences since the
instruction simply may not be able to encode a value of 0.

This redirects them to use the GOT, which hopefully linkers are able to cope
with even in the static relocation model.

llvm-svn: 205426
2014-04-02 14:39:11 +00:00
Tim Northover 0d80f70530 ARM64: fix lowering of fp128 fptosi/fptoui
We were creating libcall nodes that returned an MVT::f128, when these
particular operations actually return an int of some stripe.

llvm-svn: 205425
2014-04-02 14:39:07 +00:00
Tim Northover ebd37ab382 ARM64: make sure first argument to INSERT_SUBVECTOR has right type.
Again, coalescing and other optimisations swiftly made the MachineInstrs
consistent again, but when compiled at -O0 a bad INSERT_SUBREGISTER was
produced.

llvm-svn: 205423
2014-04-02 14:38:58 +00:00
Tim Northover 5e3a484e3b ARM64: convert fp16 narrowing ISel to pseudo-instruction
The previous attempt was fine with optimisations, but was actually rather
cavalier with its types. When compiled at -O0, it produced invalid COPY
MachineInstrs.

llvm-svn: 205422
2014-04-02 14:38:54 +00:00
Job Noorman f7da105f39 Mark FPB as a reserved register when needed.
llvm-svn: 205421
2014-04-02 13:13:56 +00:00
Renato Golin d93295ea56 Remove duplicated DMB instructions
ARM specific optimiztion, finding places in ARM machine code where 2 dmbs
follow one another, and eliminating one of them.

Patch by Reinoud Elhorst.

llvm-svn: 205409
2014-04-02 09:03:43 +00:00
Yaron Keren 2895496852 Added isTargetWindowsMSVC(), renamed isTargetMingw() to isTargetWindowsGNU()
and isTargetCygwin() to isTargetWindowsCygwin() to be consistent with the
four Windows environments in Triple.h.

Suggestion by Saleem Abdulrasool!

llvm-svn: 205393
2014-04-02 04:27:51 +00:00
Quentin Colombet 3c2b13b258 [ARM64][CollectLOH] Add some comments to explain how the LOHs
framework works (for the compiler part), since the design
document is not available.

llvm-svn: 205379
2014-04-02 01:02:28 +00:00
Hal Finkel 9e0baa6d3a [PowerPC] Add some missing VSX bitcast patterns
llvm-svn: 205352
2014-04-01 19:24:27 +00:00
Yaron Keren 48d68d439a If isKnownWindowsMSVCEnvironment then getOS == Triple::Win32 and
Environment == Triple::MSVC so it will never be MinGW or Cygwin.

llvm-svn: 205349
2014-04-01 18:52:55 +00:00
Hal Finkel 2eed29f3c8 Implement X86TTI::getUnrollingPreferences
This provides an initial implementation of getUnrollingPreferences for x86.
getUnrollingPreferences is used by the generic (concatenation) unroller, which
is distinct from the unrolling done by the loop vectorizer. Many modern x86
cores have some kind of uop cache and loop-stream detector (LSD) used to
efficiently dispatch small loops, and taking full advantage of this requires
unrolling small loops (small here means 10s of uops).

These caches also have limits on the number of taken branches in the loop, and
so we also cap the loop unrolling factor based on the maximum "depth" of the
loop. This is currently calculated with a partial DFS traversal (partial
because it will stop early if the path length grows too much). This is still an
approximation, and one that is both conservative (because it does not account
for branches eliminated via block placement) and optimistic (because it is only
recording the maximum depth over minimum paths). Nevertheless, because the
loops that fit in these uop caches are so small, it is not clear how much the
details matter.

The original set of patches posted for review produced the following test-suite
performance results (from the TSVC benchmark) at that time:
  ControlLoops-dbl - 13% speedup
  ControlLoops-flt - 15% speedup
  Reductions-dbl - 7.5% speedup

llvm-svn: 205348
2014-04-01 18:50:34 +00:00
Kai Nacke af47f60f83 [mips] Add Octeon cnMips instructions mtmX and mtpX
Adds the Octeon cnMips instructions "load multiplier register MPLx" and "load product register Px".
Includes tests.

Reviews by: Daniel.Sanders@imgtec.com

llvm-svn: 205343
2014-04-01 18:35:26 +00:00
Reid Kleckner 101102711d Support segmented stacks on Win64
Identical to Win32 method except the GS segment register is used for TLS
instead of FS and pvArbitrary is at TEB offset 0x28 instead of 0x14.

llvm-svn: 205342
2014-04-01 18:34:21 +00:00
Yaron Keren 136fe7db46 isTargetWindows() renamed to isTargetKnownWindowsMSVC()
to reflect its current functionality.

Based on Takumi NAKAMURA suggestion.

llvm-svn: 205338
2014-04-01 18:15:34 +00:00
Christian Pirker dc9ff75554 ARM: rename ARMle/ARMbe with ARMLE/ARMBE, and Thumble/Thumbbe with ThumbLE/ThumbBE
llvm-svn: 205317
2014-04-01 15:19:30 +00:00
Tim Northover 0feb91ef15 ARM: teach LLVM that Cortex-A7 is very similar to A8.
llvm-svn: 205314
2014-04-01 14:10:07 +00:00
Aaron Ballman 8bf5a548ea Attempting to fix r205124, which had failed asserts when built with MSVC.
Suggestion from Yaron Keren.

llvm-svn: 205313
2014-04-01 13:56:35 +00:00
Tim Northover 1351030801 ARM: add cyclone CPU with ZeroCycleZeroing feature.
The Cyclone CPU is similar to swift for most LLVM purposes, but does have two
preferred instructions for zeroing a VFP register. This teaches LLVM about
them.

llvm-svn: 205309
2014-04-01 13:22:02 +00:00
Daniel Sanders 21bce30fdc [mips] Renamed ParseAnyRegisterWithoutDollar to MatchAnyRegisterWithoutDollar
This is for consistency with other functions. The Parse* functions consume
tokens and the Match* functions don't.

No functional change.

llvm-svn: 205305
2014-04-01 12:35:23 +00:00
Aaron Ballman 0947bb20d8 Fixing an MSVC warning about widening the result of a 32-bit shift implicitly. No functional change intended.
llvm-svn: 205304
2014-04-01 12:24:25 +00:00
Tim Northover 4f1dd58e2e ARM64: add intrinsic for pmull (p64 x p64 = p128) operations.
llvm-svn: 205302
2014-04-01 12:22:37 +00:00
Aaron Ballman d1726ee8fa Fixing warnings in the MSVC build. No functional changes intended.
llvm-svn: 205301
2014-04-01 12:22:20 +00:00
Daniel Sanders ffd8436d6c [mips] Extend ParseJumpTarget to support the full symbol expression syntax.
Summary:
This should fix the issues the D3222 caused in lld. Testcase is based on
the one that failed in the buildbot.

Depends on D3233

Reviewers: matheusalmeida, vmedic

Reviewed By: matheusalmeida

Differential Revision: http://llvm-reviews.chandlerc.com/D3234

llvm-svn: 205298
2014-04-01 10:41:48 +00:00
Daniel Sanders 315386c083 [mips] Use AsmLexer::peekTok() to resolve the conflict between $reg and $sym
Summary:
Parsing registers no longer consume the $ token before it's confirmed whether it really has a register or not, therefore it's no longer impossible to match symbols if registers were tried first.

Depends on D3232

Reviewers: matheusalmeida, vmedic

Reviewed By: matheusalmeida

Differential Revision: http://llvm-reviews.chandlerc.com/D3233

llvm-svn: 205297
2014-04-01 10:40:14 +00:00
Daniel Sanders 0993457891 [mips] Hoist Parser.Lex() calls out of MatchAnyRegisterNameWithoutDollar()
Summary:
No functional change

Depends on D3222

Reviewers: matheusalmeida, vmedic

Reviewed By: matheusalmeida

Differential Revision: http://llvm-reviews.chandlerc.com/D3232

llvm-svn: 205295
2014-04-01 10:37:46 +00:00
Tim Northover ff179ba3d3 ARM64: add patterns for more lane-wise ld1/st1 operations.
llvm-svn: 205294
2014-04-01 10:37:09 +00:00
Tim Northover d8d613b979 ARM64: fix bug in ld3r (1d) SelectionDAG.
llvm-svn: 205293
2014-04-01 10:37:03 +00:00
Daniel Sanders b50ccf8e26 [mips] Rewrite MipsAsmParser and MipsOperand.
Summary:
Highlights:
- Registers are resolved much later (by the render method).
  Prior to that point, GPR32's/GPR64's are GPR's regardless of register
  size. Similarly FGR32's/FGR64's/AFGR64's are FGR's regardless of register
  size or FR mode. Numeric registers can be anything.
- All registers are parsed the same way everywhere (even when handling
  symbol aliasing)
  - One consequence is that all registers can be specified numerically
    almost anywhere (e.g. $fccX, $wX). The exception is symbol aliasing
    but that can be easily resolved.
- Removes the need for the hasConsumedDollar hack
- Parenthesis and Bracket suffixes are handled generically
- Micromips instructions are parsed directly instead of going through the
  standard encodings first.
- rdhwr accepts all 32 registers, and the following instructions that previously
  xfailed now work:
    ddiv, ddivu, div, divu, cvt.l.[ds], se[bh], wsbh, floor.w.[ds], c.ngl.d,
    c.sf.s, dsbh, dshd, madd.s, msub.s, nmadd.s, nmsub.s, swxc1
- Diagnostics involving registers point at the correct character (the $)
- There's only one kind of immediate in MipsOperand. LSA immediates are handled
  by the predicate and renderer.

Lowlights:
- Hardcoded '$zero' in the div patterns is handled with a hack.
  MipsOperand::isReg() will return true for a k_RegisterIndex token
  with Index == 0 and getReg() will return ZERO for this case. Note that it
  doesn't return ZERO_64 on isGP64() targets.
- I haven't cleaned up all of the now-unused functions.
  Some more of the generic parser could be removed too (integers and relocs
  for example).
- insve.df needed a custom decoder to handle the implicit fourth operand that
  was needed to make it parse correctly. The difficulty was that the matcher
  expected a Token<'0'> but gets an Imm<0>. Adding an implicit zero solved this.

Reviewers: matheusalmeida, vmedic

Reviewed By: matheusalmeida

Differential Revision: http://llvm-reviews.chandlerc.com/D3222

llvm-svn: 205292
2014-04-01 10:35:28 +00:00
Alexey Volkov 1328b28dc6 [x86] Do not convert to cmp32 for Atom arch by Sergey Okunev
Differential Revision: http://llvm-reviews.chandlerc.com/D2824

llvm-svn: 205288
2014-04-01 08:13:07 +00:00
Adam Nemet 10c4ce2584 [X86] Adjust cost of FP_TO_UINT v4f64->v4i32 as well
Pretty obvious follow-on to r205159 to also handle conversion from double
besides float.

Fixes <rdar://problem/16373208>

llvm-svn: 205253
2014-03-31 21:54:48 +00:00
Matt Arsenault d6c4326786 R600/SI: Remove leftover pattern splitting 64-bit ors.
It's now matched to the scalar 64-bit or and split later if
necessary.'

llvm-svn: 205252
2014-03-31 21:46:46 +00:00
Manman Ren 63efd8e7e6 Register allocator: set CSRFirstUseCost to 5 for ARM64.
A value of 5 means if we have a split or spill option that has a really
low cost (1 << 14 is the entry frequency), we will choose to spill
or split the really cold path before using a callee-saved register.

This gives us the performance benefit on SPECInt2k and is also conservative.

rdar://16162005

llvm-svn: 205248
2014-03-31 21:06:36 +00:00
Matt Arsenault f751d6272d Change shouldSplitVectorElementType to better match the description.
Pass the entire vector type, and not just the element.

llvm-svn: 205247
2014-03-31 20:54:58 +00:00
Matt Arsenault d7bdcc46a6 R600/SI: Implement shouldConvertConstantLoadToIntImm
llvm-svn: 205244
2014-03-31 19:54:27 +00:00
Matt Arsenault 378bf9c68b R600: Compute masked bits for min and max
llvm-svn: 205242
2014-03-31 19:35:33 +00:00
Rafael Espindola ee1c342ef9 Don't relocate with sections if there might be a paired relocation.
llvm-svn: 205240
2014-03-31 19:00:23 +00:00
Daniel Sanders e34a120285 Revert: [mips] Rewrite MipsAsmParser and MipsOperand.' due to buildbot errors in lld tests.
It's currently unable to parse 'sym + imm' without surrounding parenthesis.

llvm-svn: 205237
2014-03-31 18:51:43 +00:00
Matt Arsenault 4c53717787 R600: Add BFE, BFI, and BFM intrinsics to help with writing tests.
llvm-svn: 205236
2014-03-31 18:21:18 +00:00
Matt Arsenault b34583661b R600: Add target nodes for BFM and BFI
llvm-svn: 205235
2014-03-31 18:21:13 +00:00
Saleem Abdulrasool 2070088bef ARM: fix typo
llvm-svn: 205233
2014-03-31 18:09:10 +00:00
Hal Finkel b4240ca0f4 [PowerPC] Don't ever expand BUILD_VECTOR of v2i64 with shuffles
If we have two unique values for a v2i64 build vector, this will always result
in two vector loads if we expand using shuffles. Only one is necessary.

llvm-svn: 205231
2014-03-31 17:48:16 +00:00
Daniel Sanders 0c648ba5be [mips] Rewrite MipsAsmParser and MipsOperand.
Summary:
Highlights:
- Registers are resolved much later (by the render method).
  Prior to that point, GPR32's/GPR64's are GPR's regardless of register
  size. Similarly FGR32's/FGR64's/AFGR64's are FGR's regardless of register
  size or FR mode. Numeric registers can be anything.
- All registers are parsed the same way everywhere (even when handling
  symbol aliasing)
  - One consequence is that all registers can be specified numerically
    almost anywhere (e.g. $fccX, $wX). The exception is symbol aliasing
    but that can be easily resolved.
- Removes the need for the hasConsumedDollar hack
- Parenthesis and Bracket suffixes are handled generically
- Micromips instructions are parsed directly instead of going through the
  standard encodings first.
- rdhwr accepts all 32 registers, and the following instructions that previously
  xfailed now work:
    ddiv, ddivu, div, divu, cvt.l.[ds], se[bh], wsbh, floor.w.[ds], c.ngl.d,
    c.sf.s, dsbh, dshd, madd.s, msub.s, nmadd.s, nmsub.s, swxc1
- Diagnostics involving registers point at the correct character (the $)
- There's only one kind of immediate in MipsOperand. LSA immediates are handled
  by the predicate and renderer.

Lowlights:
- Hardcoded '$zero' in the div patterns is handled with a hack.
  MipsOperand::isReg() will return true for a k_RegisterIndex token
  with Index == 0 and getReg() will return ZERO for this case. Note that it
  doesn't return ZERO_64 on isGP64() targets.
- I haven't cleaned up all of the now-unused functions.
  Some more of the generic parser could be removed too (integers and relocs
  for example).
- insve.df needed a custom decoder to handle the implicit fourth operand that
  was needed to make it parse correctly. The difficulty was that the matcher
  expected a Token<'0'> but gets an Imm<0>. Adding an implicit zero solved this.

Reviewers: matheusalmeida, vmedic

Reviewed By: matheusalmeida

Differential Revision: http://llvm-reviews.chandlerc.com/D3222

llvm-svn: 205229
2014-03-31 17:43:46 +00:00
Hal Finkel 4c8f634f23 [PowerPC] Correct P7 dispatch unit allocation for vector instructions
llvm-svn: 205222
2014-03-31 17:02:10 +00:00
Eli Bendersky 6a0ccfb585 PR19099 - revert r203483
Now that r205212 was committed, r203483 is no longer necessary; it was a
temporary workaround that only handled a small number of the problematic cases.

llvm-svn: 205216
2014-03-31 16:11:57 +00:00
Christian Pirker 6d301b8ff2 ARM: change parameter names of the ELFARMAsmBackend constructor
I removed the underscore at the beginning of the parameter name,
because of a comment from Tim.

llvm-svn: 205215
2014-03-31 16:06:39 +00:00
Robert Khasanov ed0b2e9733 Test commit.
llvm-svn: 205214
2014-03-31 16:01:38 +00:00
Daniel Sanders d69adeb8e7 [mips] Fix use of uninitialized value reported by the sanitizer-x86_64-linux-bootstrap buildbot
llvm-svn: 205213
2014-03-31 15:58:58 +00:00
Eli Bendersky 264cd4672d Fix for PR19099 - NVPTX produces invalid symbol names.
This is a more thorough fix for the issue than r203483. An IR pass will run
before NVPTX codegen to make sure there are no invalid symbol names that can't
be consumed by the ptxas assembler.

llvm-svn: 205212
2014-03-31 15:56:26 +00:00
Tim Northover 5081cd0f81 ARM64: add extra patterns for scalar shifts
llvm-svn: 205209
2014-03-31 15:46:46 +00:00
Tim Northover e7834c3bbc ARM64: add extra scalar neg pattern & tests.
llvm-svn: 205208
2014-03-31 15:46:42 +00:00
Tim Northover 4468670345 ARM64: add patterns for scalar sqdmlal & sqdmlsl.
llvm-svn: 205207
2014-03-31 15:46:38 +00:00
Tim Northover 5731fc75af ARM64: add more patterns for commuted fmsub operations.
llvm-svn: 205206
2014-03-31 15:46:34 +00:00
Tim Northover 290e0698d4 ARM64: shuffle patterns around for fmin/fmax & add tests.
llvm-svn: 205205
2014-03-31 15:46:30 +00:00
Tim Northover 903814ccd6 ARM64: add more scalar patterns for usqadd & suqadd.
llvm-svn: 205204
2014-03-31 15:46:26 +00:00
Tim Northover 4c9d2c7e3f ARM64: add more scalar patterns for reciprocal ops.
llvm-svn: 205203
2014-03-31 15:46:22 +00:00
Tim Northover f48103618e ARM64: add i64 scalar pattern for @llvm.arm64.abs
This will be used by the Clang front-end code for vabsd_s64.

llvm-svn: 205202
2014-03-31 15:46:17 +00:00
Daniel Sanders a567da5a36 [mips] Implement missing relocations in the integrated assembler.
%got_hi, %got_lo, %call_hi, %call_lo, %higher, and %highest are now recognised
by MipsAsmParser::getVariantKind().

To prevent future issues with missing entries in this StringSwitch, I've added
an assertion to the default case.

llvm-svn: 205200
2014-03-31 15:15:02 +00:00
Daniel Sanders cefddb2ca6 Revert r205194 - [mips] Removed R_MIPS_GOT. It's identical to R_MIPS_GOT16.
There's a couple additional bits I missed.

llvm-svn: 205195
2014-03-31 14:34:36 +00:00
Daniel Sanders a104300dbe [mips] Removed R_MIPS_GOT. It's identical to R_MIPS_GOT16.
llvm-svn: 205194
2014-03-31 14:30:05 +00:00
Rafael Espindola 2378d4c0ce Capitalize the D in parseDirectiveGpDWord.
DWord seems to be the canonical way to camel case dword in llvm.

Thanks to Daniel Sander for noticing.

llvm-svn: 205191
2014-03-31 14:15:07 +00:00
Tom Stellard 30f59417cf R600/SI: Implement SIInstrInfo::isTriviallyRematerializable()
llvm-svn: 205188
2014-03-31 14:01:56 +00:00
Tom Stellard 7ea3d6d420 R600/SI: Lower i64 SELECT by bitcasting to a vector type
This allows allows us to replace ISD::EXTRACT_ELEMENT, which is lowered
using shifts, with ISD::EXTRACT_VECTOR_ELT, which is a no-op.

llvm-svn: 205187
2014-03-31 14:01:55 +00:00
Tom Stellard 7277b008ee R600/SI: Return the correct index for VGPRs in getHWRegIndex()
The register index is stored in the low 8-bits of the encoding.

llvm-svn: 205186
2014-03-31 14:01:52 +00:00
Zoran Jovanovic 9b05a31f76 Fixed issue with microMIPS JAL instruction.
Differential Revision: http://llvm-reviews.chandlerc.com/D3200

llvm-svn: 205185
2014-03-31 14:00:10 +00:00
Tim Northover 241856e5f8 ARM64: fix a couple of signed/unsigned comparison warnings.
llvm-svn: 205174
2014-03-31 10:21:36 +00:00
Alexey Samsonov 23aaf2a182 Try to fix MSan bootstrap bot: make ARM64Disassembler::getInstruction() always initialize Size argument.
llvm-svn: 205171
2014-03-31 07:59:33 +00:00
Yaron Keren 070a752d7e Correct OS conditionals following r204977 and r204978.
Previously, MinGW OS was Triple::MinGW and Cygwin was Triple::Cygwin
and now it is Triple::Win32 with Environment being GNU or Cygwin.
So,

  TheTriple.getOS() == Triple::Win32 
  
is replaced by

  TheTriple.isWindowsMSVCEnvironment()

and

  (TheTriple.getOS() == Triple::MinGW32 || TheTriple.getOS() == Triple::Cygwin)
  
is replaced by

  TheTriple.isOSCygMing()

llvm-svn: 205170
2014-03-31 07:59:14 +00:00
Craig Topper ec82847a64 [C++11] Mark more classes in the X86 target as 'final'.
llvm-svn: 205166
2014-03-31 06:53:13 +00:00
Craig Topper 26eec09d84 Mark a couple of the X86 target classes as final. Allows the compiler to de-virtualize some internal calls.
llvm-svn: 205165
2014-03-31 06:22:15 +00:00
NAKAMURA Takumi 82ec13e3d5 ARM64CollectLOH.cpp: Tweak \param. [-Wdocumentation]
llvm-svn: 205162
2014-03-31 01:10:26 +00:00
Chandler Carruth d28515af31 [ARM64] Fix materialization of an fp128 zero immediate. There currently
is not a pattern to lower this with clever instructions that zero the
register, so restrict the zero immediate legality special case to f64
and f32 (the only two sizes which fmov seems to directly support). Fixes
backend errors when building code such as libxml.

llvm-svn: 205161
2014-03-31 00:02:10 +00:00
Adam Nemet 6dafe97271 [X86] Adjust cost of FP_TO_UINT v8f32->v8i32
There is no direct AVX instruction to convert to unsigned.  I have some ideas
how we may be able to do this with three vector instructions but the current
backend just bails on this to get it scalarized.

See the comment why we need to adjust the cost returned by BasicTTI.

The test is a bit roundabout (and checks assembly rather than bit code) because
I'd like it to work even if at some point we could vectorize this conversion.

Fixes <rdar://problem/16371920>

llvm-svn: 205159
2014-03-30 18:07:13 +00:00
Stepan Dyatkovskiy 8baf17fc5f PR18929:
According to ARM assembler language hash symbol is optional before immediates.
For example, see here for more details:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0473j/dom1359731154529.html

llvm-svn: 205157
2014-03-30 17:09:54 +00:00
Hal Finkel 5c0d1454d6 [PowerPC] Handle VSX v2i64 SIGN_EXTEND_INREG
sitofp from v2i32 to v2f64 ends up generating a SIGN_EXTEND_INREG v2i64 node
(and similarly for v2i16 and v2i8). Even though there are no sign-extension (or
algebraic shifts) for v2i64 types, we can handle v2i32 sign extensions by
converting two and from v2i64. The small trick necessary here is to shift the
i32 elements into the right lanes before the i32 -> f64 step. This is because
of the big Endian nature of the system, we need the i32 portion in the high
word of the i64 elements.

For v2i16 and v2i8 we can do the same, but we first use the default Altivec
shift-based expansion from v2i16 or v2i8 to v2i32 (by casting to v4i32) and
then apply the above procedure.

llvm-svn: 205146
2014-03-30 13:22:59 +00:00
Chandler Carruth 81f7061065 [ARM64] Fix a heap-use-after-free spotted by ASan.
StringRef::lower() returns a std::string. Better yet, we can now stop
thinking about what it returns and write 'auto'. It does the right
thing. =]

llvm-svn: 205135
2014-03-30 09:08:07 +00:00
Tim Northover bf679cec67 ARM64: uncopy/paste helper function
It was doing functional but highly suspect operations on bools due to
the more limited shifting operands supported by memory instructions.

Should fix some MSVC warnings.

llvm-svn: 205134
2014-03-30 08:30:28 +00:00
Tim Northover 6b3258f087 ARM64: remove unused variables
llvm-svn: 205133
2014-03-30 07:35:48 +00:00
Tim Northover 3e52557212 ARM64: override all the things.
Actually, mostly only those in the top-level directory that already
had a "virtual" attached. But it's the thought that counts and it's
been a long day.

llvm-svn: 205131
2014-03-30 07:25:18 +00:00
NAKAMURA Takumi 09717bd1c4 X86Subtarget.h: isTargetWindows() should tell whether he is targeting msvc.
FYI, !isWindowsGNUEnvironment() is insufficient. It missed cygwin.

FIXME: The name "isTargetWindows" should be fixed.
llvm-svn: 205124
2014-03-30 04:35:00 +00:00
Dmitri Gribenko 1fd72104ad Fix a few -Wdocumentation warnings
llvm-svn: 205116
2014-03-29 19:40:32 +00:00
Benjamin Kramer 3ad660a515 Detemplatize LOHDirective.
The ARM64 backend uses it only as a container to keep an MCLOHType and
Arguments around so give it its own little copy. The other functionality
isn't used and we had a crazy method specialization hack in place to
keep it working. Unfortunately that was incompatible with MSVC.

Also range-ify a couple of loops while at it.

llvm-svn: 205114
2014-03-29 19:21:20 +00:00
Benjamin Kramer 61e595be4d ARM64: Remove unused helper function, make others static.
llvm-svn: 205112
2014-03-29 18:00:49 +00:00
Hal Finkel 777c9dd90a [PowerPC] Handle v2i64 comparisons
v2i64 is a legal type under VSX, however we don't have native vector
comparisons. We can handle eq/ne by casting it to an Altivec type, but
everything else must be expanded.

llvm-svn: 205106
2014-03-29 16:04:40 +00:00
Tim Northover adbd34e045 ARM64: format register strings without creating a local Twine.
It was causing horrible failures on some build-bots.

llvm-svn: 205105
2014-03-29 15:35:57 +00:00
Hal Finkel e8fba98735 [PowerPC] VSX instruction latency corrections
The vector divide and sqrt instructions have high latencies, and the scalar
comparisons are like all of the others. On the P7, permutations take an extra
cycle over purely-simple vector ops.

llvm-svn: 205096
2014-03-29 13:20:31 +00:00
Stepan Dyatkovskiy df657cc1d5 Recommitted fix for PR18931, with extended tests set.
Issue subject: Crash using integrated assembler with immediate arithmetic

Fix description:
Expressions like 'cmp r0, #(l1 - l2) >> 3' could not be evaluated on asm parsing stage,
since it is impossible to resolve labels on this stage. In the end of stage we still have
expression (MCExpr).
Then, when we want to encode it, we expect it to be an immediate, but it still an expression.
Patch introduces a Fixup (MCFixup instance), that is processed after main encoding stage.

llvm-svn: 205094
2014-03-29 13:12:40 +00:00
Tim Northover 2125374ecf ARM64: use 64-bit constant even on 32-bit machines
Another existing bot failure so no tests.

llvm-svn: 205093
2014-03-29 11:51:49 +00:00
Tim Northover 2011df293d ARM64: change format specifier to work on 32-bit targets
Existing tests were failing.

llvm-svn: 205092
2014-03-29 11:47:07 +00:00
Chandler Carruth 7b7a67c5c8 [ARM64] Fix 'assert("...")' to be 'assert(0 && "...")'. Otherwise, it is
no assert at all. ;] Some of these should probably be switched to
llvm_unreachable, but I didn't want to perturb the behavior in this
patch.

Found by -Wstring-conversion, which I'll try to turn on in CMake builds
at least as it is finding useful things.

llvm-svn: 205091
2014-03-29 11:07:40 +00:00
Tim Northover 00ed9964c6 ARM64: initial backend import
This adds a second implementation of the AArch64 architecture to LLVM,
accessible in parallel via the "arm64" triple. The plan over the
coming weeks & months is to merge the two into a single backend,
during which time thorough code review should naturally occur.

Everything will be easier with the target in-tree though, hence this
commit.

llvm-svn: 205090
2014-03-29 10:18:08 +00:00
Rafael Espindola 5904e12bfa Completely rewrite ELFObjectWriter::RecordRelocation.
I started trying to fix a small issue, but this code has seen a small fix too
many.

The old code was fairly convoluted. Some of the issues it had:

* It failed to check if a symbol difference was in the some section when
  converting a relocation to pcrel.
* It failed to check if the relocation was already pcrel.
* The pcrel value computation was wrong in some cases (relocation-pc.s)
* It was missing quiet a few cases where it should not convert symbol
  relocations to section relocations, leaving the backends to patch it up.
* It would not propagate the fact that it had changed a relocation to pcrel,
  requiring a quiet nasty work around in ARM.
* It was missing comments.

llvm-svn: 205076
2014-03-29 06:26:49 +00:00
Hal Finkel 19be506a5e [PowerPC] Add subregister classes for f64 VSX values
We had stored both f64 values and v2f64, etc. values in the VSX registers. This
worked, but was suboptimal because we would always spill 16-byte values even
through we almost always had scalar 8-byte values. This resulted in an
increase in stack-size use, extra memory bandwidth, etc. To fix this, I've
added 64-bit subregisters of the Altivec registers, and combined those with the
existing scalar floating-point registers to form a class of VSX scalar
floating-point registers. The ABI code has also been enhanced to use this
register class and some other necessary improvements have been made.

llvm-svn: 205075
2014-03-29 05:29:01 +00:00
Akira Hatanaka 9afbb8c2b1 [x86] Fix printing of register operands with q modifier.
Emit 32-bit register names instead of 64-bit register names if the target does
not have 64-bit general purpose registers.

<rdar://problem/14653996>

llvm-svn: 205067
2014-03-28 23:28:07 +00:00
David Majnemer 02f2188bb9 X86: Disable IsLegalToCallImmediateAddr for Win32
WinCOFF cannot form PC relative relocations to support absolute
MCValues.  We should reenable this once WinCOFF supports emission of
IMAGE_REL_I386_REL32 relocations.

This fixes PR19272.

llvm-svn: 205058
2014-03-28 21:40:47 +00:00
Hal Finkel 2583b06310 [PowerPC] Fix VSX permutation isel
Not only did I invert the indices when I wrote the code, but I also did the
same thing when I wrote the regression test. Oops.

llvm-svn: 205046
2014-03-28 20:24:55 +00:00
Hal Finkel 7811c6188e [PowerPC] v2[fi]64 need to be explicitly passed in VSX registers
v2[fi]64 values need to be explicitly passed in VSX registers. This is because
the code in TRI that finds the minimal register class given a register and a
value type will assert if given an Altivec register and a non-Altivec type.

llvm-svn: 205041
2014-03-28 19:58:11 +00:00
Rafael Espindola b59fb7347a Parse .gpdword and convert another llc -filetype=obj test.
llvm-svn: 205028
2014-03-28 18:50:26 +00:00
Rafael Espindola d7610a5d67 Add const to a method I missed in the previous commit.
llvm-svn: 205014
2014-03-28 16:14:12 +00:00
Rafael Espindola 3e3de5e353 Add const.
llvm-svn: 205013
2014-03-28 16:06:09 +00:00
Erik Verbruggen 5e1bac3a38 Revert "InstCombine: merge constants in both operands of icmp."
This reverts commit r204912, and follow-up commit r204948.

This introduced a performance regression, and the fix is not completely
clear yet.

llvm-svn: 205010
2014-03-28 14:50:57 +00:00
Christian Pirker 2a11160956 Add ARM big endian Target (armeb, thumbeb)
Reviewed at http://llvm-reviews.chandlerc.com/D3095

llvm-svn: 205007
2014-03-28 14:35:30 +00:00
Tim Northover 24f46618b2 R600: avoid calling std::next on an iterator that might be end()
This was causing my llc to go into an infinite loop on
CodeGen/R600/address-space.ll (just triggered recently by some allocator
changes).

llvm-svn: 205005
2014-03-28 13:52:56 +00:00
Hal Finkel c6fc9b8960 [PowerPC] Use a small cleanup pass to remove VSX self copies
As explained in r204976, because of how the allocation of VSX registers
interacts with the call-lowering code, we sometimes end up generating self VSX
copies. Specifically, things like this:
  %VSL2<def> = COPY %F2, %VSL2<imp-use,kill>
(where %F2 is really a sub-register of %VSL2, and so this copy is a nop)

This adds a small cleanup pass to remove these prior to post-RA scheduling.

llvm-svn: 204980
2014-03-27 23:12:31 +00:00
Saleem Abdulrasool edbdd2e5df Canonicalise Windows target triple spellings
Construct a uniform Windows target triple nomenclature which is congruent to the
Linux counterpart.  The old triples are normalised to the new canonical form.
This cleans up the long-standing issue of odd naming for various Windows
environments.

There are four different environments on Windows:

MSVC: The MS ABI, MSVCRT environment as defined by Microsoft
GNU: The MinGW32/MinGW32-W64 environment which uses MSVCRT and auxiliary libraries
Itanium: The MSVCRT environment + libc++ built with Itanium ABI
Cygnus: The Cygwin environment which uses custom libraries for everything

The following spellings are now written as:

i686-pc-win32 => i686-pc-windows-msvc
i686-pc-mingw32 => i686-pc-windows-gnu
i686-pc-cygwin => i686-pc-windows-cygnus

This should be sufficiently flexible to allow us to target other windows
environments in the future as necessary.

llvm-svn: 204977
2014-03-27 22:50:05 +00:00
Hal Finkel 9dcb3583d5 [PowerPC] Don't remove self VSX copies in PPCInstrInfo::copyPhysReg
Because of how the allocation of VSX registers interacts with the call-lowering
code, we sometimes end up generating self VSX copies. Specifically, things like
this:
  %VSL2<def> = COPY %F2, %VSL2<imp-use,kill>
(where %F2 is really a sub-register of %VSL2, and so this copy is a nop)

The problem is that ExpandPostRAPseudos always assumes that *some* instruction
has been inserted, and adds implicit defs to it. This is a problem if no copy
was inserted because it can cause subtle problems during post-RA scheduling.
These self copies will have to be removed some other way.

llvm-svn: 204976
2014-03-27 22:46:28 +00:00
Quentin Colombet 85b904d875 [X86][Vector Cost Model] Add a comment to explain the workaround
in my previous commit (r204884).

<rdar://problem/16381225>

llvm-svn: 204972
2014-03-27 22:27:41 +00:00
Hal Finkel 82569b6366 [PowerPC] Fix v2f64 vector extract and related patterns
First, v2f64 vector extract had not been declared legal (and so the existing
patterns were not being used). Second, the patterns for that, and for
scalar_to_vector, should really be a regclass copy, not a subregister
operation, because the VSX registers directly hold both the vector and scalar data.

llvm-svn: 204971
2014-03-27 22:22:48 +00:00