Commit Graph

8811 Commits

Author SHA1 Message Date
Richard Sandiford 48ef6abddc [SystemZ] Optimize selects between 0 and -1
Since z has no setcc instruction as such, the choice of setBooleanContents
is a bit arbitrary.  Currently it's set to ZeroOrOneBooleanContent,
so we produced a branch-free form when selecting between 0 and 1,
but not when selecting between 0 and -1.  This patch handles the latter
case too.

At some point I'd like to measure whether it's better to use conditional
moves for constant selects on z196, but that's future work.

llvm-svn: 196578
2013-12-06 09:53:09 +00:00
Juergen Ributzka e3cba95d5c [Stackmap] Update stackmap unit test to use AnyRegCC.
llvm-svn: 196552
2013-12-06 00:28:54 +00:00
Andrew Trick 880e573d98 MI-Sched: handle latency of in-order operations with the new machine model.
The per-operand machine model allows the target to define "unbuffered"
processor resources. This change is a quick, cheap way to model stalls
caused by the latency of operations that use such resources. This only
applies when the processor's micro-op buffer size is non-zero
(Out-of-Order). We can't precisely model in-order stalls during
out-of-order execution, but this is an easy and effective
heuristic. It benefits cortex-a9 scheduling when using the new
machine model, which is not yet on by default.

MI-Sched for armv7 was evaluated on Swift (and only not enabled because
of a performance bug related to predication). However, we never
evaluated Cortex-A9 performance on MI-Sched in its current form. This
change adds MI-Sched functionality to reach performance goals on
A9. The only remaining change is to allow MI-Sched to run as a PostRA
pass.

I evaluated performance using a set of options to estimate the performance impact once MI sched is default on armv7:
-mcpu=cortex-a9 -disable-post-ra -misched-bench -scheditins=false

For a simple saxpy loop I see a 1.7x speedup. Here are the llvm-testsuite results:
(min run time over 2 runs, filtering tiny changes)

Speedups:
| Benchmarks/BenchmarkGame/recursive         |  52.39% |
| Benchmarks/VersaBench/beamformer           |  20.80% |
| Benchmarks/Misc/pi                         |  19.97% |
| Benchmarks/Misc/mandel-2                   |  19.95% |
| SPEC/CFP2000/188.ammp                      |  18.72% |
| Benchmarks/McCat/08-main/main              |  18.58% |
| Benchmarks/Misc-C++/Large/sphereflake      |  18.46% |
| Benchmarks/Olden/power                     |  17.11% |
| Benchmarks/Misc-C++/mandel-text            |  16.47% |
| Benchmarks/Misc/oourafft                   |  15.94% |
| Benchmarks/Misc/flops-7                    |  14.99% |
| Benchmarks/FreeBench/distray               |  14.26% |
| SPEC/CFP2006/470.lbm                       |  14.00% |
| mediabench/mpeg2/mpeg2dec/mpeg2decode      |  12.28% |
| Benchmarks/SmallPT/smallpt                 |  10.36% |
| Benchmarks/Misc-C++/Large/ray              |   8.97% |
| Benchmarks/Misc/fp-convert                 |   8.75% |
| Benchmarks/Olden/perimeter                 |   7.10% |
| Benchmarks/Bullet/bullet                   |   7.03% |
| Benchmarks/Misc/mandel                     |   6.75% |
| Benchmarks/Olden/voronoi                   |   6.26% |
| Benchmarks/Misc/flops-8                    |   5.77% |
| Benchmarks/Misc/matmul_f64_4x4             |   5.19% |
| Benchmarks/MiBench/security-rijndael       |   5.15% |
| Benchmarks/Misc/flops-6                    |   5.10% |
| Benchmarks/Olden/tsp                       |   4.46% |
| Benchmarks/MiBench/consumer-lame           |   4.28% |
| Benchmarks/Misc/flops-5                    |   4.27% |
| Benchmarks/mafft/pairlocalalign            |   4.19% |
| Benchmarks/Misc/himenobmtxpa               |   4.07% |
| Benchmarks/Misc/lowercase                  |   4.06% |
| SPEC/CFP2006/433.milc                      |   3.99% |
| Benchmarks/tramp3d-v4                      |   3.79% |
| Benchmarks/FreeBench/pifft                 |   3.66% |
| Benchmarks/Ptrdist/ks                      |   3.21% |
| Benchmarks/Adobe-C++/loop_unroll           |   3.12% |
| SPEC/CINT2000/175.vpr                      |   3.12% |
| Benchmarks/nbench                          |   2.98% |
| SPEC/CFP2000/183.equake                    |   2.91% |
| Benchmarks/Misc/perlin                     |   2.85% |
| Benchmarks/Misc/flops-1                    |   2.82% |
| Benchmarks/Misc-C++-EH/spirit              |   2.80% |
| Benchmarks/Misc/flops-2                    |   2.77% |
| Benchmarks/NPB-serial/is                   |   2.42% |
| Benchmarks/ASC_Sequoia/CrystalMk           |   2.33% |
| Benchmarks/BenchmarkGame/n-body            |   2.28% |
| Benchmarks/SciMark2-C/scimark2             |   2.27% |
| Benchmarks/Olden/bh                        |   2.03% |
| skidmarks10/skidmarks                      |   1.81% |
| Benchmarks/Misc/flops                      |   1.72% |

Slowdowns:
| Benchmarks/llubenchmark/llu                | -14.14% |
| Benchmarks/Polybench/stencils/seidel-2d    |  -5.67% |
| Benchmarks/Adobe-C++/functionobjects       |  -5.25% |
| Benchmarks/Misc-C++/oopack_v1p8            |  -5.00% |
| Benchmarks/Shootout/hash                   |  -2.35% |
| Benchmarks/Prolangs-C++/ocean              |  -2.01% |
| Benchmarks/Polybench/medley/floyd-warshall |  -1.98% |
| Polybench/linear-algebra/kernels/3mm       |  -1.95% |
| Benchmarks/McCat/09-vor/vor                |  -1.68% |

llvm-svn: 196516
2013-12-05 17:55:58 +00:00
Justin Holewinski 4459717bab [NVPTX] Fix off-by-one error when creating the VT list for an SDNode
llvm-svn: 196503
2013-12-05 12:58:00 +00:00
Matheus Almeida a6beac1acc [mips] Small code generation improvement for conditional operator (select)
in case the operands are constants and its difference is |1|.
It should be possible in those cases to rematerialize the result using
MIPS's slt and similar instructions.

The small update to some of the tests in cmov.ll, sel1c.ll and sel2c.ll was needed
otherwise the optimization implemented in this patch would have been triggered
(difference between the operands was 1) and that would have changed the semantic
of the tests.

llvm-svn: 196498
2013-12-05 12:07:05 +00:00
Tim Northover e4def5e228 ARM: fix yet another stack-folding bug
We were trying to fold the stack adjustment into the wrong instruction in the
situation where the entire basic-block was epilogue code. Really, it can only
ever be valid to do the folding precisely where the "add sp, ..." would be
placed so there's no need for a separate iterator to track that.

Should fix PR18136.

llvm-svn: 196493
2013-12-05 11:02:02 +00:00
Alp Toker f907b891da Correct word hyphenations
This patch tries to avoid unrelated changes other than fixing a few
hyphen-related ambiguities and contractions in nearby lines.

llvm-svn: 196471
2013-12-05 05:44:44 +00:00
Rafael Espindola 01d19d0299 Hide the stub created for MO_ExternalSymbol too.
given

declare void @llvm.memset.p0i8.i32(i8* nocapture, i8, i32, i32, i1)
declare void @foo()
define void @bar() {
  call void @foo()
  call void @llvm.memset.p0i8.i32(i8* null, i8 0, i32 188, i32 1, i1 false)
  ret void
}

We used to produce

L_foo$stub:
        .indirect_symbol        _foo
        .ascii  "\364\364\364\364\364"

_memset$stub:
        .indirect_symbol        _memset
        .ascii  "\364\364\364\364\364"

We not produce a private stub for memset too.

Stubs are not needed with recent linkers, but we still produce them for darwin8.

Thanks to David Fang for confirming that gcc used to do this too.

llvm-svn: 196468
2013-12-05 05:19:12 +00:00
Matt Arsenault 89cc49fe5d R600/SI: Add comments for number of used registers.
llvm-svn: 196467
2013-12-05 05:15:35 +00:00
Jiangning Liu 65d8e3422a For AArch64, add missing register cost calculation for big value types like v4i64 and v8i64.
llvm-svn: 196456
2013-12-05 02:12:01 +00:00
Cameron McInally 164097a6eb Add FileCheck statements for r196435.
llvm-svn: 196449
2013-12-05 01:20:36 +00:00
Cameron McInally 30bbb214e5 Add AVX512 patterns for v16i32 broadcast and v2i64 zero extend load.
Patch by Aleksey Bader.

llvm-svn: 196435
2013-12-05 00:11:25 +00:00
David Peixotto 8ad70b3542 Add support for parsing ARM symbol variants on ELF targets
ARM symbol variants are written with parens instead of @ like this:

  .word __GLOBAL_I_a(target1)

This commit adds support for parsing these symbol variants in
expressions. We introduce a new flag to MCAsmInfo that indicates the
parser should use parens to parse the symbol variant. The expression
parser is modified to look for symbol variants using parens instead
of @ when the corresponding MCAsmInfo flag is true.

The MCAsmInfo parens flag is enabled only for ARM on ELF.

By adding this flag to MCAsmInfo, we are able to get rid of
redundant ARM-specific symbol variants and use the generic variants
instead (e.g. VK_GOT instead of VK_ARM_GOT). We use the new
UseParensForSymbolVariant attribute in MCAsmInfo to correctly print
the symbol variants for arm.

To achive this we need to keep a handle to the MCAsmInfo in the
MCSymbolRefExpr class that we can check when printing the symbol
variant.

Updated Tests:
  Changed case of symbol variant to match the generic kind.
  test/CodeGen/ARM/tls-models.ll
  test/CodeGen/ARM/tls1.ll
  test/CodeGen/ARM/tls2.ll
  test/CodeGen/Thumb2/tls1.ll
  test/CodeGen/Thumb2/tls2.ll

PR18080

llvm-svn: 196424
2013-12-04 22:43:20 +00:00
Cameron McInally cbb51dacfb Fix assembly syntax for AVX512 vector blend instructions.
llvm-svn: 196393
2013-12-04 18:05:36 +00:00
Cameron McInally c5f420e129 Suppress '(x < y) ? a : 0 -> (x < y) & a' transform on X86 architectures with dedicated mask registers.
Patch by Aleksey Bader.

llvm-svn: 196386
2013-12-04 14:52:33 +00:00
Kevin Qin afd095de8b [AArch64 Neon] Add ACLE intrinsic vceqz_f64.
llvm-svn: 196362
2013-12-04 08:02:34 +00:00
Kevin Qin f9832e8de7 [AArch64 NEON] Add missing compare intrinsics.
llvm-svn: 196360
2013-12-04 07:53:28 +00:00
Rafael Espindola 757ae64731 Add -mcpu=core2 to all llc invocations in this test.
Should fix the atom buildbot.

llvm-svn: 196340
2013-12-04 01:25:24 +00:00
Juergen Ributzka 17e0d9ee6c [Stackmap] Emit multi-byte nops for X86.
llvm-svn: 196334
2013-12-04 00:39:08 +00:00
Reed Kotler 59975c2cba final patch for very long conditional branches for mips16 constant islands.
this completes the basic port of ARM constant islands to Mips16.
More testing, code review, cleanup is in order but basically everything
seems to be working. A bug in gas is preventing some of the runtime
testing but I hope to resolve this soon.

llvm-svn: 196331
2013-12-03 23:42:51 +00:00
Rafael Espindola f4d34153f5 Use CHECK-LABEL to make this test more strict.
llvm-svn: 196321
2013-12-03 21:12:36 +00:00
Rafael Espindola 0a2baf8eaf Fix mingw32 thiscall + sret.
Unlike msvc, when handling a thiscall + sret gcc will
* Put the sret in %ecx
* Put the this pointer is (%esp)

This fixes, for example, calling stringstream::str.

llvm-svn: 196312
2013-12-03 20:51:23 +00:00
James Molloy 8a25992f39 Addrspacecasts are no-ops on ARM.
Testcase added.

llvm-svn: 196269
2013-12-03 11:23:11 +00:00
Richard Sandiford ccc2a7c1a0 [SystemZ] Fix choice of known-zero mask in insertion optimization
The backend converts 64-bit ORs into subreg moves if the upper 32 bits
of one operand and the low 32 bits of the other are known to be zero.
It then tries to peel away redundant ANDs from the upper 32 bits.

Since AND masks are canonicalized to exclude known-zero bits,
the test ORs the mask and the known-zero bits together before
checking for redundancy.  The problem was that it was using the
wrong node when checking for known-zero bits, so could drop ANDs
that were still needed.

llvm-svn: 196267
2013-12-03 11:01:54 +00:00
Michael Liao 14b02848a3 Enhance the fix of PR17631
- The fix to PR17631 fixes part of the cases where 'vzeroupper' should
  not be issued before 'call' insn. There're other cases where helper
  calls will be inserted not limited to epilog. These helper calls do
  not follow the standard calling convention and won't clobber any YMM
  registers. (So far, all call conventions will clobber any or part of
  YMM registers.)
  This patch enhances the previous fix to cover more cases 'vzerosupper' should
  not be inserted by checking if that function call won't clobber any YMM
  registers and skipping it if so.

llvm-svn: 196261
2013-12-03 09:17:32 +00:00
Hao Liu dca64f4a20 [AArch64]Add missing floating point convert, round and misc intrinsics.
E.g. int64x1_t vcvt_s64_f64(float64x1_t a) -> FCVTZS Dd, Dn

llvm-svn: 196210
2013-12-03 06:06:55 +00:00
Hao Liu c250cbc095 AArch64: add missing ACLE intrinsics mapping to general arithmetic operation from VFP instructions.
E.g. float64x1_t vadd_f64(float64x1_t a, float64x1_t b) -> FADD Dd, Dn, Dm.

llvm-svn: 196208
2013-12-03 05:58:30 +00:00
Hao Liu 21a461353a AArch64: Add missing scalar pair intrinsics.
E.g. "float32_t vaddv_f32(float32x2_t a)" to be matched into "faddp s0, v1.2s".

llvm-svn: 196198
2013-12-03 03:39:47 +00:00
Jiangning Liu 3a541d46a1 Add some missing pattern matches for AArch64 Neon intrinsics like vuqadd_s64 and friends.
llvm-svn: 196192
2013-12-03 01:33:52 +00:00
Jiangning Liu 94a7bb2130 Add some missing pattern matches for AArch64 Neon intrinsics like vmull_high_n_s16 and friends.
llvm-svn: 196190
2013-12-03 01:29:32 +00:00
Chad Rosier 3106de3f9d [AArch64] Implemented vcopy_lane patterns using scalar DUP instruction.
Patch by Ana Pazos!

llvm-svn: 196151
2013-12-02 21:05:16 +00:00
Vincent Lejeune 4b8d9e303c R600: Workaround for cayman loop bug
llvm-svn: 196121
2013-12-02 17:29:37 +00:00
Tim Northover dee8604caf ARM: decide whether to use movw/movt based on "minsize" attribute.
llvm-svn: 196102
2013-12-02 14:46:26 +00:00
Robert Lytton 7fbca3cce0 XCore target: Make handling of large frames not dependent upon an FP.
eliminateFrameIndex() has been reworked to handle both small & large frames
with either a FP or SP.
An additional Slot is required for Scavenging spills when not using FP for large frames.
Reworked the handling of Register Scavenging.

Whether we are using an FP or not, whether it is a large frame or not,
and whether we are using a large code model or not are now independent.

llvm-svn: 196091
2013-12-02 11:05:28 +00:00
Tim Northover 72360d201c ARM: add pseudo-instructions for lit-pool global materialisation
These are used by MachO only at the moment, and (much like the existing
MOVW/MOVT set) work around the fact that the labels used in the actual
instructions often contain PC-dependent components, which means that repeatedly
materialising the same global can't be CSEed.

With small modifications, it could be adapted to how ELF finds the address of
_GLOBAL_OFFSET_TABLE_, which would give similar benefits in PIC mode there.

llvm-svn: 196090
2013-12-02 10:35:41 +00:00
Robert Lytton d3ffa66c6c XCore target: fix large code model 'select' indirect address handling.
llvm-svn: 196088
2013-12-02 10:18:37 +00:00
Robert Lytton ff38d37c77 XCore target: Add large code model
When using large code model:
Global objects larger than 'CodeModelLargeSize' bytes are placed in sections named with a trailing ".large"
The folded global address of such objects are lowered into the const pool.

During inspection it was noted that LowerConstantPool() was using a default offset of zero.
A fix was made, but due to only offsets of zero being generated, testing only verifies the change is not detrimental.

Correct the flags emitted for explicitly specified sections.

We assume the size of the object queried by getSectionForConstant() is never greater than CodeModelLargeSize.
To handle greater than CodeModelLargeSize, changes to AsmPrinter would be required.

llvm-svn: 196087
2013-12-02 10:18:31 +00:00
Robert Lytton 7f4d1c9d11 XCore target: extend tests in preparation
llvm-svn: 196086
2013-12-02 10:18:24 +00:00
Robert Lytton 0abd2c96b5 XCore target: Fix eliminateFrameIndex() to handle large frames
Large frame offsets are loaded from the ConstantPool.
Where possible, offsets are encoded using the smaller MKMSK instruction.
Large frame offsets can only be used when there is a frame-pointer.

llvm-svn: 196085
2013-12-02 10:18:19 +00:00
Robert Lytton a9f984fb76 XCore target: Enable frames larger than 65535 to be lowered
llvm-svn: 196084
2013-12-02 10:18:14 +00:00
Rafael Espindola aaaf02206a Also test the created stubs on 32 bits.
llvm-svn: 196052
2013-12-01 21:24:30 +00:00
Andrew Trick ca45c817c3 Add -mcpu to stackmap.ll
llvm-svn: 196051
2013-12-01 18:17:05 +00:00
Tim Northover 45479dcf49 ARM: fix bug in -Oz stack adjustment folding
Previously, we clobbered callee-saved registers when folding an "add
sp, #N" into a "pop {rD, ...}" instruction. This change checks whether
a register we're going to add to the "pop" could actually be live
outside the function before doing so and should fix the issue.

This should fix PR18081.

llvm-svn: 196046
2013-12-01 14:16:24 +00:00
Hal Finkel ca93e47258 Convert a PPC test from grep to FileCheck
Convert this test to FileCheck, and improve it to check for the instructions it
is trying to exclude instead of checking for register use (especially because
grepping for r1 can be thrown off, for example, by a use of r12).

llvm-svn: 195979
2013-11-30 20:04:33 +00:00
Hal Finkel 2651f97333 Desensitize a couple of PPC regression tests
Use CHECK-DAG to make these regression tests more resilient against changes in
instruction scheduling.

llvm-svn: 195978
2013-11-30 19:52:28 +00:00
Hal Finkel 2b655bb228 Update the cpu specified on some PPC regression tests
Some of these tests did not specify a cpu but were also sensitive to
instruction scheduling and/or register assignment choices. A few others
similarly-sensitive tests specified a cpu (often the POWER7), and while the P7
currently uses the default model for PPC64, this will soon change. For those
tests which should not really be cpu-dependent anyway, the cpu is set to the
generic 'ppc64'.

llvm-svn: 195977
2013-11-30 19:39:27 +00:00
Daniel Sanders 7fd68d6018 [mips][msa] MSA loads and stores have a 10-bit offset. Account for this when lowering FrameIndex.
This prevents the compiler from emitting invalid ld.[bhwd]'s and st.[bhwd]'s
when the stack frame is between 512 and 32,768 bytes in size.

llvm-svn: 195973
2013-11-30 13:47:57 +00:00
Juergen Ributzka 5b6234dc4a Force CPU type to unbreak unit tests on Haswell machines.
llvm-svn: 195971
2013-11-30 03:07:16 +00:00
Reed Kotler ad450f239f Part 1 of 3 patches that completes very long conditional branches
in constant islands for Mips16. We introdcuce JalB16 as a synomnym
for Jal16. It makes it easier to read and is also necessary because
Jal16 is a call instruction but JalB16 is being used as a branch.
Various parts of LLVM will not work properly even in this late stage of
the backend if we use what was declared as a call instruction to function
as a branch. For one, basic block labels may not get emitted in some
situations. 

llvm-svn: 195968
2013-11-29 22:32:56 +00:00
Hao Liu ba38eee8ac AArch64: The pattern match should check the range of the immediate value.
Or we can generate some illegal instructions.
E.g. shrn2 v0.4s, v1.2d, #35. The legal range should be in [1, 16].

llvm-svn: 195941
2013-11-29 02:11:22 +00:00
Jiangning Liu f7b4c7c2ce Add missing test case for bsl_f64 support of AArch64 NEON.
llvm-svn: 195939
2013-11-29 01:38:08 +00:00
Reed Kotler 0d409e2dfe Check in conditional branches for constant islands. Still need to finish
conditional branches for very large targets. That will be the next small
patch. Everything now should in principle work as good (functionality
wise) as without constant islands so we decided at Mips/Imagination to
make constant islands the default for Mips16 now so that it will get
excercised a lot and this port is still experimentatl though hopefully soon
we will change the status. Some more cleanup and code review is in order
but things are converging fast.

llvm-svn: 195902
2013-11-28 00:56:37 +00:00
Akira Hatanaka 168d4e5b20 [mips] Implement the following optimizations using dominance information to
make PIC calls a little more efficient:

1. Remove instructions setting up $gp if it is known that a function has been
   called at least once.
2. Save the address of a called function in a register instead of loading
   it from the GOT at every call site.

llvm-svn: 195892
2013-11-27 23:38:42 +00:00
Tom Stellard 175e7a8c97 R600: Expand vector FABS
NOTE: This is a candidate for the 3.4 branch.
llvm-svn: 195881
2013-11-27 21:23:39 +00:00
Tom Stellard c149dc02d3 R600/SI: Implement spilling of SGPRs v5
SGPRs are spilled into VGPRs using the {READ,WRITE}LANE_B32 instructions.

v2:
  - Fix encoding of Lane Mask
  - Use correct register flags, so we don't overwrite the low dword
    when restoring multi-dword registers.

v3:
  - Register spilling seems to hang the GPU, so replace all shaders
    that need spilling with a dummy shader.

v4:
  - Fix *LANE definitions
  - Change destination reg class for 32-bit SMRD instructions

v5:
  - Remove small optimization that was crashing Serious Sam 3.

https://bugs.freedesktop.org/show_bug.cgi?id=68224
https://bugs.freedesktop.org/show_bug.cgi?id=71285

NOTE: This is a candidate for the 3.4 branch.
llvm-svn: 195880
2013-11-27 21:23:35 +00:00
Tom Stellard 859199dad8 R600/SI: Use SGPR_32 register class for 32-bit SMRD outputs
Writing to the M0 register from an SMRD instruction hangs the GPU, so
we need to use the SGPR_32 register class, which does not include M0.

NOTE: This is a candidate for the 3.4 branch.
llvm-svn: 195879
2013-11-27 21:23:29 +00:00
Tom Stellard 4d566b2edf R600: Add support for ISD::FROUND
NOTE: This is a candidate for the 3.4 branch.
llvm-svn: 195878
2013-11-27 21:23:20 +00:00
Rafael Espindola 745bd85c6a Use FileCheck and expand the test a bit.
In particular, check the name of the symbol we are putting in the constant pool.

llvm-svn: 195865
2013-11-27 19:22:14 +00:00
Jiangning Liu 97aa8cf8b7 Fix the AArch64 NEON bug exposed by checking constant integer argument range of ACLE intrinsics.
llvm-svn: 195843
2013-11-27 14:02:25 +00:00
Rafael Espindola 09cf06c75e Cleanup and test X86AsmPrinter::printPCRelImm.
It is only used for asm printing.

On X86 we put basic block addresses on register before passing them to inline
asm, so the MO_MachineBasicBlock case was dead.

MO_ExternalSymbol was dead since any symbol being passed to inline asm
is represented as MO_GlobalAddress.

The MO_GlobalAddress and MO_Register cases were not tested.

llvm-svn: 195824
2013-11-27 06:53:13 +00:00
Chad Rosier 75290c6307 [AArch64] Add support for NEON scalar floating-point absolute difference.
llvm-svn: 195803
2013-11-27 01:45:58 +00:00
Chad Rosier 9653d5c989 [AArch64] Add support for NEON scalar floating-point to integer convert
instructions.

llvm-svn: 195788
2013-11-26 22:17:37 +00:00
Reed Kotler 3aeb1d0857 Fix a bug related to constant islands for Mips16 and mips16/32 dual mode.
The determination of when we are doing constant pools was being made too
early in the asm printer.

llvm-svn: 195781
2013-11-26 20:38:40 +00:00
Michael Liao d617a3015d Fix PR18054
- Fix bug in (vsext (vzext x)) -> (vsext x) in SIGN_EXTEND_IN_REG
  lowering where we need to check whether x is a vector type (in-reg
  type) of i8, i16 or i32; otherwise, that optimization is not valid.

llvm-svn: 195779
2013-11-26 20:31:31 +00:00
Tim Northover fa36dfeeca Darwin-ARM: use movw/movt for static relocations
llvm-svn: 195759
2013-11-26 12:45:05 +00:00
Richard Sandiford dd7dd930d1 [SystemZ] Fix incorrect use of RISBG for a zero-extended right shift
We would wrongly transform the testcase into the equivalent of an AND with 1.
The problem was that, when testing whether the shifted-in bits of the right
shift were significant, we used the width of the final zero-extended result
rather than the width of the shifted value.

llvm-svn: 195731
2013-11-26 10:53:16 +00:00
Kevin Qin 599c47d0de Refactored the implementation of AArch64 NEON instruction ZIP, UZP
and TRN.
Fix a bug when mixed use of vget_high_u8() and vuzp_u8().

llvm-svn: 195716
2013-11-26 03:26:47 +00:00
Kevin Qin 33ca18fdcf [AArch64]Implement 128 bit register copy with NEON.
llvm-svn: 195713
2013-11-26 02:33:42 +00:00
Andrew Trick 391dbadb51 StackMap: Implement support for DirectMemRefOp.
A Direct stack map location records the address of frame index. This
address is itself the value that the runtime requested. This differs
from IndirectMemRefOp locations, which refer to a stack locations from
which the requested values must be loaded. Direct locations can
directly communicate the address if an alloca, while IndirectMemRefOp
handle register spills.

For example:

entry:
  %a = alloca i64...
  llvm.experimental.stackmap(i32 <ID>, i32 <shadowBytes>, i64* %a)

Since both the alloca and stackmap intrinsic are in the entry block,
and the intrinsic takes the address of the alloca, the runtime can
assume that LLVM will not substitute alloca with any intervening
value. This must be verified by the runtime by checking that the stack
map's location is a Direct location type. The runtime can then
determine the alloca's relative location on the stack immediately after
compilation, or at any time thereafter. This differs from Register and
Indirect locations, because the runtime can only read the values in
those locations when execution reaches the instruction address of the
stack map.

llvm-svn: 195712
2013-11-26 02:03:25 +00:00
Cameron McInally c592e5251c Add an intrinsic for the SSE2 PAUSE instruction.
llvm-svn: 195697
2013-11-26 00:20:43 +00:00
Bill Wendling 9200bb08f9 Unrevert r195599 with testcase fix.
I'm not sure how it was checking for the wrong values...
PR18023.

llvm-svn: 195670
2013-11-25 18:05:22 +00:00
Amara Emerson 34df448f7c [ARM] Enable FeatureMP for Cortex-A5 by default.
Patch by Oliver Stannard.

llvm-svn: 195640
2013-11-25 13:17:15 +00:00
Amara Emerson f59125f5bb Revert r195599 as it broke the builds.
llvm-svn: 195636
2013-11-25 11:24:18 +00:00
Daniel Sanders b021c6fdbd Fixed tryFoldToZero() for vector types that need expansion.
Summary:
Moved the requirement for SelectionDAG::getConstant() to return legally
typed nodes slightly earlier. There were two optional DAGCombine passes
that were missed out and were required to produce type-legal DAGs.

Simplified a code-path in tryFoldToZero() to use SelectionDAG::getConstant().
This provides support for both promoted and expanded vector types whereas the
previous code only supported promoted vector types.

Fixes a "Type for zero vector elements is not legal" assertion detected by
an llvm-stress generated test.

Reviewers: resistor

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D2251

llvm-svn: 195635
2013-11-25 11:14:43 +00:00
Bill Wendling e3c48709ed Don't look past volatile loads.
A volatile load should block us from trying to coalesce stores.
PR18023

llvm-svn: 195599
2013-11-25 05:01:21 +00:00
Venkatraman Govindaraju 1116868a0d [Sparc] Emit large negative adjustments to SP/FP with sethi+xor instead of sethi+or. This generates correct code for both sparc32 and sparc64.
llvm-svn: 195576
2013-11-24 20:23:25 +00:00
Venkatraman Govindaraju f79528c132 [SparcV9]: Do not emit .register directives for global registers that are clobbered by calls but not used in the function itself.
llvm-svn: 195574
2013-11-24 18:41:49 +00:00
Venkatraman Govindaraju 0510db0597 [SparcV9] Enable custom lowering of DYNAMIC_STACKALLOC in sparc64.
llvm-svn: 195573
2013-11-24 17:41:41 +00:00
Reed Kotler a787aa2b1e Make sure that for C++ emitting LwConstant32 pseudos, that it corresponds
to what is needed for constant islands. The prescan method for Mips16 constant
islands will eventually go away. It is only temporary and should be done
earlier when the instructions are first created or from the DAG. If we keep
it here we need to handle better the situation where constant islands
is called multiple times since don't want to prescan more than once.

llvm-svn: 195569
2013-11-24 06:18:50 +00:00
Reed Kotler ed00c59cdb Update older test cases for latest patch.
llvm-svn: 195566
2013-11-24 03:37:56 +00:00
Reed Kotler d3b28ebe03 Fix a funny bug I introduced during conversion of ARM constant islands to Mips.
I had to move some code and I moved a declaration forward past it's first use
in the function but by nutty coincidence there was another variable of the same
name and type and  with completely unrelated function that was declared globally
in the class so no compilation error ensued.
It required some unusual conditions for it to even matter. Caused test
case casts.c in test-suite to fail during compilation with a duplicate 
symbol error. I would have noticed it during final code review for this port.

llvm-svn: 195565
2013-11-24 02:53:09 +00:00
Manman Ren d664bd7725 Debug Info: update testing cases to specify the debug info version number.
We are going to drop debug info without a version number or with a different
version number, to make sure we don't crash when we see bitcode files with
different debug info metadata format.

Make tests more robust by removing hard-coded metadata numbers in CHECK lines.

llvm-svn: 195535
2013-11-23 01:16:29 +00:00
Tom Stellard c0845334da R600/SI: Fixing handling of condition codes
We were ignoring the ordered/onordered bits and also the signed/unsigned
bits of condition codes when lowering the DAG to MachineInstrs.

NOTE: This is a candidate for the 3.4 branch.
llvm-svn: 195514
2013-11-22 23:07:58 +00:00
Manman Ren 409558f81e Debug Info: update testing cases to specify the debug info version number.
We are going to drop debug info without a version number or with a different
version number, to make sure we don't crash when we see bitcode files with
different debug info metadata format.

llvm-svn: 195504
2013-11-22 21:49:45 +00:00
Jim Grosbach 860934a924 X86: Perform integer comparisons at i32 or larger.
Utilizing the 8 and 16 bit comparison instructions, even when an input can
be folded into the comparison instruction itself, is typically not worth it.
There are too many partial register stalls as a result, leading to significant
slowdowns. By always performing comparisons on at least 32-bit
registers, performance of the calculation chain leading to the
comparison improves. Continue to use the smaller comparisons when
minimizing size, as that allows better folding of loads into the
comparison instructions.

rdar://15386341

llvm-svn: 195496
2013-11-22 19:57:47 +00:00
Paul Robinson d89125a5d8 Teach ISel not to optimize 'optnone' functions (revised).
Improvements over r195317:
- Set/restore EnableFastISel flag instead of just running FastISel within
  SelectAllBasicBlocks; the flag is checked in various places, and
  FastISel won't run properly if those places don't do the right thing.
- Test looks for normal ISel versus FastISel behavior, and not
  something more subtle that doesn't work everywhere.

Based on work by Andrea Di Biagio.

llvm-svn: 195491
2013-11-22 19:11:24 +00:00
Andrew Trick 4a1abb7ab5 patchpoint: factor SD builder code for live vars. Plain stackmap also optimizes Constant values now.
llvm-svn: 195488
2013-11-22 19:07:36 +00:00
Michael Liao 02160d580b Fix PR18014
- When simplifying the mask generation for BLEND, check whether that mask is
  also consumed by other non-BLEND insns. If true, skip that simplification.

llvm-svn: 195476
2013-11-22 17:56:57 +00:00
Richard Sandiford f03789ca3f [SystemZ] Fix TMHH and TMHL usage for z10 with -O0
I've no idea why I decided to handle TMxx differently from all the other
high/low logic operations, but it was a stupid thing to do.  The high
registers aren't available as separate 32-bit registers on z10,
so subreg_h32 can't be used on a GR64 there.

I've normally been testing with z196 and with -O3 and so hadn't noticed
this until now.

llvm-svn: 195473
2013-11-22 17:28:28 +00:00
Daniel Sanders b516aae48e [mips][msa] Add test case that should have been added in r195456.
llvm-svn: 195469
2013-11-22 15:47:18 +00:00
Rafael Espindola 5a8e985ad3 Don't produce tail calls when the caller is x86_thiscallcc.
The callee will not pop the stack for us.

llvm-svn: 195467
2013-11-22 15:18:28 +00:00
Tim Northover 74e3637a0c ARM: use CHECK-LABEL on a test.
llvm-svn: 195457
2013-11-22 13:25:07 +00:00
Richard Barton c31078cded Add support for Cortex-A12.
Patch by Oliver Stannard!

llvm-svn: 195448
2013-11-22 11:53:16 +00:00
Daniel Sanders fd8e416879 [mips][msa] Float vector constants cannot use ldi.[wd] directly. Bitcast from the appropriate integer vector type.
Fixes an instruction selection failure detected by llvm-stress.

llvm-svn: 195444
2013-11-22 11:24:50 +00:00
Kostya Serebryany 4007009815 Revert r195318 as it causes miscompilation (PR18029)
llvm-svn: 195439
2013-11-22 10:30:39 +00:00
Hao Liu 25aed9bb5b Fix the bugs about AArch64 Load/Store vector types and bitcast between i64 and vector types.
e.g. "%tmp = load <2 x i64>* %ptr" can't be selected. 
     "%tmp = bitcast i64 %in to <2 x i32>" can't be selected.

llvm-svn: 195424
2013-11-22 08:47:22 +00:00
Jiangning Liu a91633a435 For AArch64 back-end instruction selection, lower Neon_Lowxxx with EXTRCT_SUBREG.
llvm-svn: 195408
2013-11-22 02:45:13 +00:00
NAKAMURA Takumi 08be5afea4 Tweak 3 tests in llvm/test/CodeGen/X86 to add -mcpu=generic since r195383.
They failed on bdver2 buildslave.

FIXME: FileCheck-ize them.
llvm-svn: 195407
2013-11-22 02:28:04 +00:00
Tom Stellard 06c67bcbe4 SelectionDAG: Optimize expansion of vec_type = BITCAST scalar_type
The legalizer can now do this type of expansion for more
type combinations without loading and storing to and
from the stack.

NOTE: This is a candidate for the 3.4 branch.
llvm-svn: 195398
2013-11-22 00:41:05 +00:00
Ekaterina Romanova d5fa55470c SHLD/SHRD are VectorPath (microcode) instructions known to have poor latency on certain architectures. While generating SHLD/SHRD instructions is acceptable when optimizing for size, optimizing for speed on these platforms should be implemented using alternative sequences of instructions composed of add, adc, shr, shl, or and lea which are directPath instructions. These alternative instructions not only have a lower latency but they also increase the decode bandwidth by allowing simultaneous decoding of a third directPath instruction.
AMD's processors family K7, K8, K10, K12, K15 and K16 are known to have SHLD/SHRD instructions with very poor latency. Optimization guides for these processors recommend using an alternative sequence of instructions. For these AMD's processors, I disabled folding (or (x << c) | (y >> (64 - c))) when we are not optimizing for size.

It might be beneficial to disable this folding for some of the Intel's processors. However, since I couldn't find specific recommendations regarding using SHLD/SHRD instructions on Intel's processors, I haven't disabled this peephole for Intel.

llvm-svn: 195383
2013-11-21 23:21:26 +00:00
Artyom Skrobov 97af2ec096 [ARM] add the overlooked tests for Cortex-A7 build attributes
llvm-svn: 195365
2013-11-21 16:22:39 +00:00
Daniel Sanders c8c50fb41f [mips][msa] Fix a corner case in performORCombine() when combining nodes into VSELECT.
Mask == ~InvMask asserts if the width of Mask and InvMask differ.
The combine isn't valid (with two exceptions, see below) if the widths differ
so test for this before testing Mask == ~InvMask.

In the specific cases of Mask=~0 and InvMask=0, as well as Mask=0 and
InvMask=~0, the combine is still valid. However, there are more appropriate
combines that could be used in these cases such as folding x & 0 to 0, or
x & ~0 to x.

llvm-svn: 195364
2013-11-21 16:11:31 +00:00
Daniel Sanders edc071b815 Add support for legalizing SETNE/SETEQ by inverting the condition code and the result of the comparison.
Summary:
LegalizeSetCCCondCode can now legalize SETEQ and SETNE by returning the inverse
condition and requesting that the caller invert the result of the condition.

The caller of LegalizeSetCCCondCode must handle the inverted CC, and they do
so as follows:
  SETCC, BR_CC:
    Invert the result of the SETCC with SelectionDAG::getNOT()
  SELECT_CC:
    Swap the true/false operands.

This is necessary for MSA which lacks an integer SETNE instruction.

Reviewers: resistor

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D2229

llvm-svn: 195355
2013-11-21 13:24:49 +00:00
Daniel Sanders 6e664bcef3 [mips][msa/dsp] Only do DSP combines if DSP is enabled.
Fixes a crash (null pointer dereferenced) when MSA is enabled.

llvm-svn: 195343
2013-11-21 11:40:14 +00:00
NAKAMURA Takumi 43aa939625 Revert r195317 (and r195333), "Teach ISel not to optimize 'optnone' functions."
It broke, at least, i686 target. It is reproducible with "llc -mtriple=i686-unknown".

FYI, it didn't appear to add either "-O0" or "-fast-isel".

llvm-svn: 195339
2013-11-21 10:55:15 +00:00
Kostya Serebryany ef26f4ed56 add 'REQUIRES: asserts' to a test that uses 'llc -debug'; this fixes the no-asserts build
llvm-svn: 195333
2013-11-21 09:28:16 +00:00
Ana Pazos 9ac2fc85d2 Implemented Neon scalar vdup_lane intrinsics.
Fixed scalar dup alias and added test case.

llvm-svn: 195330
2013-11-21 08:16:15 +00:00
Ana Pazos fbc1adbaa7 Implemented Neon scalar by element intrinsics.
Intrinsics implemented: vqdmull_lane, vqdmulh_lane, vqrdmulh_lane,
vqdmlal_lane, vqdmlsl_lane scalar Neon intrinsics.

llvm-svn: 195327
2013-11-21 07:37:04 +00:00
Bill Wendling 07787f8747 The basic problem is that some mainstream programs cannot deal with the way
clang optimizes tail calls, as in this example:

int foo(void);
int bar(void) {
 return foo();
}

where the call is transformed to:

  calll .L0$pb
.L0$pb:
  popl  %eax
.Ltmp0:
  addl  $_GLOBAL_OFFSET_TABLE_+(.Ltmp0-.L0$pb), %eax
  movl  foo@GOT(%eax), %eax
  popl  %ebp
  jmpl  *%eax                   # TAILCALL

However, the GOT references must all be resolved at dlopen() time, and so this
approach cannot be used with lazy dynamic linking (e.g. using RTLD_LAZY), which
usually populates the PLT with stubs that perform the actual resolving.

This patch changes X86TargetLowering::LowerCall() to skip tail call
optimization, if the called function is a global or external symbol.

Patch by Dimitry Andric!

PR15086

llvm-svn: 195318
2013-11-21 07:04:30 +00:00
Paul Robinson b379efeb53 Teach ISel not to optimize 'optnone' functions.
Based on work by Andrea Di Biagio.

llvm-svn: 195317
2013-11-21 06:33:32 +00:00
Reed Kotler 2fc05be887 Add, to constant islands, long jumps similar to ARM far branch.
llvm-svn: 195312
2013-11-21 05:13:23 +00:00
Hal Finkel 884bde3031 PPC popcnt[dw] do not have record forms
The instruction definitions incorrectly specified that popcntd and popcntw have
record forms; they do not. This mistake was causing invalid code generation.

llvm-svn: 195272
2013-11-20 20:54:55 +00:00
Benjamin Kramer c8160d6523 MachineBlockPlacement: Strengthen the source order bias when picking an exit block.
We now only allow breaking source order if the exit block frequency is
significantly higher than the other exit block. The actual bias is
currently under a flag so the best cut-off can be found; the flag
defaults to the old behavior. The idea is to get some benchmark coverage
over different values for the flag and pick the best one.

When we require the new frequency to be at least 20% higher than the old
frequency I see a 5% speedup on zlib's deflate when compressing a random
file on x86_64/westmere. Hal reported a small speedup on Fhourstones on
a BG/Q and no regressions in the test suite.

The test case is the full long_match function from zlib's deflate. I was
reluctant to add it for previous tweaks to branch probabilities because
it's large and potentially fragile, but changed my mind since it's an
important use case and more likely to break with all the current work
going into the PGO infrastructure.

Differential Revision: http://llvm-reviews.chandlerc.com/D2202

llvm-svn: 195265
2013-11-20 19:08:44 +00:00
Elena Demikhovsky e1f9bf054f AVX-512: Concat 4 128-bit vectors in one 512-bit vector.
llvm-svn: 195229
2013-11-20 09:10:40 +00:00
Hal Finkel 22498fa6e3 PPC: Optimize rldicl generation for masked shifts
Masking operations (where only some number of the low bits are being kept) are
selected to rldicl(x, 0, mb). If x is a logical right shift (which would become
rldicl(y, 64-n, n)), we might be able to fold the two instructions together:

  rldicl(rldicl(x, 64-n, n), 0, mb) -> rldicl(x, 64-n, mb) for n <= mb

The right shift is really a left rotate followed by a mask, and if the explicit
mask is a more-restrictive sub-mask of the mask implied by the shift, only one
rldicl is needed.

llvm-svn: 195185
2013-11-20 01:10:15 +00:00
Cameron McInally d1cd0be6f3 Fix assembly operands for the SSE2 cvtsd2ss instruction.
llvm-svn: 195129
2013-11-19 14:36:00 +00:00
Simon Atanasyan 1093afe27a [Mips] Adjust float ABI settings in case of MIPS16 mode.
Hard float for mips16 means essentially to compile as soft float but to
use a runtime library for soft float that is written with native mips32
floating point instructions (those runtime routines run in mips32 hard
float mode).

The patch reviewed by Reed Kotler.

llvm-svn: 195123
2013-11-19 12:20:17 +00:00
Andrew Trick 0ab5ba8c35 Use symbolic operands in the patchpoint folding routine and fix a spilling bug.
Fixes <rdar://15487687> [JS] AnyRegCC argument ends up being spilled

llvm-svn: 195094
2013-11-19 03:29:59 +00:00
Hao Liu 16edc4675c Implement AArch64 neon instructions class SIMD lsone and SIMD lone-post.
llvm-svn: 195078
2013-11-19 02:17:05 +00:00
Jiangning Liu 0c0c1e8598 Implement AArch64 SISD intrinsics for vget_high and vget_low.
llvm-svn: 195074
2013-11-19 01:46:48 +00:00
Jiangning Liu e329114ae5 Add predicate for AArch64 crypto instructions.
llvm-svn: 195071
2013-11-19 01:38:31 +00:00
Matt Arsenault 3a4d86a1a4 R600/SI: Fix moveToVALU when the first operand is VSrc.
Moving into a VSrc doesn't always work, since it could be
replaced with an SGPR later.

llvm-svn: 195042
2013-11-18 20:09:55 +00:00
Matt Arsenault 08f7e37aa9 R600/SI: Fix multiple SGPR reads when using VCC.
No other SGPR operands are allowed, so if VCC is
used, move the other to a VGPR.

llvm-svn: 195041
2013-11-18 20:09:50 +00:00
Matt Arsenault fb826fa6e1 R600/SI: Implement add i64, but do not yet enable.
Test doesn't actually check the output. I need
to fix add i64 being matched for the addressing
calculations.

llvm-svn: 195040
2013-11-18 20:09:47 +00:00
Matt Arsenault 43b8e4ed3b R600/SI: Move patterns to match add / sub to scalar instructions
llvm-svn: 195034
2013-11-18 20:09:29 +00:00
Tom Stellard 66df8a2c0a R600: Enable the IR structurizer by default
llvm-svn: 195031
2013-11-18 19:43:44 +00:00
Tom Stellard 827ec9b630 R600: Fix a crash in the AMDILCFGStrucurizer
The ifPatternMatch() function was not correctly reporting the number
of matches in some cases.

llvm-svn: 195030
2013-11-18 19:43:38 +00:00
Tom Stellard f340787d79 R600/SI: Fix illegal VGPR->SGPR copy inside of loop
llvm-svn: 195026
2013-11-18 18:50:20 +00:00
Tom Stellard 13de545693 R600/SI: Fix another case of illegal VGPR->SGPR copy
llvm-svn: 195025
2013-11-18 18:50:15 +00:00
NAKAMURA Takumi 4d3457e628 [PR17978] Mark two ARM/fast-isel tests as XFAIL:vg_leak due to GV.
llvm-svn: 195010
2013-11-18 13:50:19 +00:00
Daniel Sanders 08d3cd163d [mips] Fix 'ran out of registers' in MIPS32 with FP64 when generating code for (ConstantFP 0.0)
Fixed an inappropriate use of BuildPairF64 when compiling for MIPS32 with FP64
which resulted in an impossible constraint on the register allocation. It now
uses BuildPairF64_64.

llvm-svn: 195007
2013-11-18 13:12:43 +00:00
Hao Liu 5a4e4e107d Implement the newly added ACLE functions for ld1/st1 with 2/3/4 vectors.
The functions are like: vst1_s8_x2 ...

llvm-svn: 194990
2013-11-18 06:31:53 +00:00
Bill Wendling 1ead4a484a Testcase for PR17964
llvm-svn: 194961
2013-11-17 10:53:19 +00:00
Benjamin Kramer bb1dd73d3e DAGCombiner: Partially revert r192795, getNOT was fixed not to create illegal constants.
llvm-svn: 194959
2013-11-17 10:40:03 +00:00
Andrew Trick 10d5be4e6e Added a size field to the stack map record to handle subregister spills.
Implementing this on bigendian platforms could get strange. I added a
target hook, getStackSlotRange, per Jakob's recommendation to make
this as explicit as possible.

llvm-svn: 194942
2013-11-17 01:36:23 +00:00
Matt Arsenault 36f5eb5949 Use right address space pointer size
llvm-svn: 194940
2013-11-17 00:06:39 +00:00
Matt Arsenault dfb3e7092e Fix assert on unaligned access to global with different address space size.
llvm-svn: 194934
2013-11-16 20:50:54 +00:00
Matt Arsenault 19231e630e Fix codegen for null different sized pointer.
llvm-svn: 194932
2013-11-16 20:24:41 +00:00
Vincent Lejeune 745d4298b1 R600: Make dot_4 instructions predicable
llvm-svn: 194927
2013-11-16 16:24:41 +00:00
Ana Pazos d035209bd7 Implemented aarch64 Neon scalar vmulx_lane intrinsics
Implemented aarch64 Neon scalar vfma_lane intrinsics
Implemented aarch64 Neon scalar vfms_lane intrinsics

Implemented legacy vmul_n_f64, vmul_lane_f64, vmul_laneq_f64
intrinsics (v1f64 parameter type) using Neon scalar instructions.

Implemented legacy vfma_lane_f64, vfms_lane_f64,
vfma_laneq_f64, vfms_laneq_f64 intrinsics (v1f64 parameter type)
using Neon scalar instructions.

llvm-svn: 194888
2013-11-15 23:32:10 +00:00
Chad Rosier 0c57c3402e [AArch64] Fix the scalar NEON ACLE functions so that they return float/double
rather than the vector equivalent.

llvm-svn: 194853
2013-11-15 21:28:10 +00:00
Bob Wilson 9f3e6b25ee Avoid illegal integer promotion in fastisel
Stop folding constant adds into GEP when the type size doesn't match.
Otherwise, the adds' operands are effectively being promoted, changing the
conditions of an overflow.  Results are different when:

    sext(a) + sext(b) != sext(a + b)

Problem originally found on x86-64, but also fixed issues with ARM and PPC,
which used similar code.

<rdar://problem/15292280>

Patch by Duncan Exon Smith!

llvm-svn: 194840
2013-11-15 19:09:27 +00:00
Tom Stellard 519ae39c45 R600/SI: Add VReg_96 register class to SIRegisterInfo::hasVGPRs()
This fixes a crash with GNOME settings manager.

llvm-svn: 194836
2013-11-15 18:26:45 +00:00
Daniel Sanders 6747c64068 [mips][msa] Merge basic_operations_little.ll into basic_operations.ll.
Now that FileCheck supports multiple check prefixes, we don't need to keep the
little and big endian versions of this test separate anymore. Merge them back
together.

llvm-svn: 194826
2013-11-15 17:24:41 +00:00
Cameron McInally ad41f1f693 Add AVX512 unmasked FMA intrinsics and support.
llvm-svn: 194824
2013-11-15 17:01:14 +00:00
Daniel Sanders 50b8041066 Fix illegal DAG produced by SelectionDAG::getConstant() for v2i64 type
Summary:
When getConstant() is called for an expanded vector type, it is split into
multiple scalar constants which are then combined using appropriate build_vector
and bitcast operations.

In addition to the usual big/little endian differences, the case where the
element-order of the vector does not have the same endianness as the elements
themselves is also accounted for.  For example, for v4i32 on big-endian MIPS,
the byte-order of the vector is <3210,7654,BA98,FEDC>. For little-endian, it is
<0123,4567,89AB,CDEF>.
Handling this case turns out to be a nop since getConstant() returns a splatted
vector (so reversing the element order doesn't change the value)

This fixes a number of cases in MIPS MSA where calling getConstant() during
operation legalization introduces illegal types (e.g. to legalize v2i64 UNDEF
into a v2i64 BUILD_VECTOR of illegal i64 zeros). It should also handle bigger
differences between illegal and legal types such as legalizing v2i64 into v8i16.

lowerMSASplatImm() in the MIPS backend no longer needs to avoid calling
getConstant() so this function has been updated in the same patch.

For the sake of transparency, the steps I've taken since the review are:
* Added 'virtual' to isVectorEltOrderLittleEndian() as requested. This revealed
  that the MIPS tests were falsely passing because a polymorphic function was
  not actually polymorphic in the reviewed patch.
* Fixed the tests that were now failing. This involved deleting the code to
  handle the MIPS MSA element-order (which was previously doing an byte-order
  swap instead of an element-order swap). This left
  isVectorEltOrderLittleEndian() unused and it was deleted.
* Fixed build failures caused by rebasing beyond r194467-r194472. These build
  failures involved the bset, bneg, and bclr instructions added in these commits
  using lowerMSASplatImm() in a way that was no longer valid after this patch.
  Some of these were fixed by calling SelectionDAG::getConstant() instead,
  others were fixed by a new function getBuildVectorSplat() that provided the
  removed functionality of lowerMSASplatImm() in a more sensible way.

Reviewers: bkramer

Reviewed By: bkramer

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D1973

llvm-svn: 194811
2013-11-15 12:56:49 +00:00
Justin Holewinski 3d49e5c655 [NVPTX] Fix handling of indirect calls
Using a special machine node is cleaner than an InlineAsm node, and fixes an assertion failure in InstrEmitter

llvm-svn: 194810
2013-11-15 12:30:04 +00:00
Daniel Sanders 1ede3002fa [mips][msa] Build all the tests in little and big endian modes and correct an incorrect test.
Summary:
This patch (correctly) breaks some MSA tests by exposing the cases when
SelectionDAG::getConstant() produces illegal types. These have been temporarily
marked XFAIL and the XFAIL flag will be removed when
SelectionDAG::getConstant() is fixed.

There are three categories of failure:
* Immediate instructions are not selected in one endian mode.
* Immediates used in ldi.[bhwd] must be different according to endianness.
  (this only affects cases where the 'wrong' ldi is used to load the correct
   bitpattern. E.g. (bitcast:v2i64 (build_vector:v4i32 ...)))
* Non-immediate instructions that rely on immediates affected by the
  previous two categories as part of their match pattern.
  For example, the bset match pattern is the vector equivalent of
  'ws | (1 << wt)'.

One test needed correcting to expect different output depending on whether big
or little endian was in use. This test was
test/CodeGen/Mips/msa/basic_operations.ll and experiences the second category
of failure shown above. The little endian version of this test is named
basic_operations_little.ll and will be merged back into basic_operations.ll in
a follow up commit now that FileCheck supports multiple check prefixes.

Reviewers: bkramer, jacksprat, dsanders

Reviewed By: dsanders

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D1972

llvm-svn: 194806
2013-11-15 11:04:16 +00:00
Alexey Samsonov 07cdeb4a6c Redirect unused test case output to /dev/null
llvm-svn: 194798
2013-11-15 09:36:58 +00:00
Andrew Trick 4f0794fd47 Platform proof a test case.
llvm-svn: 194788
2013-11-15 05:52:56 +00:00