Commit Graph

8360 Commits

Author SHA1 Message Date
Nick Lewycky ce4ca41e47 Fix a typo, in a comment, in a test.
llvm-svn: 192632
2013-10-14 22:02:53 +00:00
Eric Christopher 740025745b Revert part of a fix from 2010, changes since then:
a) x86-64 TLS has been documented
b) the code path should use movq for the correct relocation
   to be generated.

I've also added a fixme for the test case that we should improve
the code generated, it should look something like is documented
in the tls abi document.

llvm-svn: 192631
2013-10-14 21:52:26 +00:00
Will Dietz 5cb7f4e3f2 MachineSink: Fix and tweak critical-edge breaking heuristic.
Per original comment, the intention of this loop
is to go ahead and break the critical edge
(in order to sink this instruction) if there's
reason to believe doing so might "unblock" the
sinking of additional instructions that define
registers used by this one.  The idea is that if
we have a few instructions to sink "together"
breaking the edge might be worthwhile.

This commit makes a few small changes
to help better realize this goal:

First, modify the loop to ignore registers
defined by this instruction.  We don't
sink definitions of physical registers,
and sinking an SSA definition isn't
going to unblock an upstream instruction.

Second, ignore uses of physical registers.
Instructions that define physical registers are
rejected for sinking, and so moving this one
won't enable moving any defining instructions.
As an added bonus, while virtual register
use-def chains are generally small due
to SSA goodness, iteration over the uses
and definitions (used by hasOneNonDBGUse)
for physical registers like EFLAGS
can be rather expensive in practice.
(This is the original reason for looking at this)

Finally, to keep things simple continue
to only consider this trick for registers that
have a single use (via hasOneNonDBGUse),
but to avoid spuriously breaking critical edges
only do so if the definition resides
in the same MBB and therefore this one directly
blocks it from being sunk as well.
If sinking them together is meant to be,
let the iterative nature of this pass
sink the definition into this block first.

Update tests to accomodate this change,
add new testcase where sinking avoids pipeline stalls.

llvm-svn: 192608
2013-10-14 16:57:17 +00:00
Chad Rosier d1f40d760a [AArch64] Add support for NEON scalar integer compare instructions.
llvm-svn: 192596
2013-10-14 14:37:20 +00:00
Bernard Ogden 53169762d0 Add Cortex-A57 support
llvm-svn: 192591
2013-10-14 13:17:07 +00:00
Bernard Ogden 4400cde89a Add subtarget feature support for Cortex-A53
Some previous implicit defaults have changed, for example FP and NEON
are now on by default.

llvm-svn: 192590
2013-10-14 13:16:57 +00:00
Elena Demikhovsky 82a46ebe0a Fixed a bug in dynamic allocation memory on stack.
The alignment of allocated space was wrong, see Bugzila 17345.

Done by Zvi Rackover <zvi.rackover@intel.com>.

llvm-svn: 192573
2013-10-14 07:26:51 +00:00
Vincent Lejeune d6cbede9c5 R600: improve dump of S_WAITCNT
llvm-svn: 192557
2013-10-13 17:56:28 +00:00
Vincent Lejeune fa58a5fb60 R600: Use masked read sel for texture instructions
llvm-svn: 192554
2013-10-13 17:56:10 +00:00
Vincent Lejeune 301beb80d4 R600: fix swizzle export
llvm-svn: 192553
2013-10-13 17:56:04 +00:00
Benjamin Kramer f016258068 Force a CPU on test so it doesn't depend on microarchitectural scheduling decisions.
llvm-svn: 192532
2013-10-12 11:17:12 +00:00
Reed Kotler de64774b4d For Mips16, start to consolidate all forms of 32 bit literal loading so that
they can be better handled and optimized in the Mips16 constant island code.

llvm-svn: 192520
2013-10-12 02:19:08 +00:00
Matt Arsenault 441387832e R600: Add scalar i32 add test
llvm-svn: 192501
2013-10-11 21:03:41 +00:00
Matt Arsenault c15b85715e Use CHECK-LABEL
llvm-svn: 192500
2013-10-11 21:03:39 +00:00
Matthias Braun d616ccc069 Remove kill flags after if conversion if necessary
When if converting something like:
true:
   ... = R0<kill>

false:
   ... = R0<kill>

then the instructions of the true block must not have a <kill> flag
anymore, as the instruction of the false block follow and do still read
the R0 value.
Specifically this patch determines the set of register live-in in the
false block (possibly after simulating the liveness changes of the
duplicated instructions). Each of these live-in registers mustn't be
killed.

llvm-svn: 192482
2013-10-11 19:04:37 +00:00
Quentin Colombet c50b943b31 [DAGCombiner] Load slicing test case: attempt to really fix the buildbots (used sse4.2 instead of avx!).
<rdar://problem/14477220>

llvm-svn: 192480
2013-10-11 18:54:49 +00:00
Quentin Colombet de0e06234c [DAGCombiner] Reapply load slicing (192471) with a test that explicitly set sse4.2 support.
This should fix the buildbots.

Original commit message:
[DAGCombiner] Slice a big load in two loads when the element are next to each
other in memory and the target has paired load and performs post-isel loads
combining.

E.g., this optimization will transform something like this:
a = load i64* addr
b = trunc i64 a to i32
c = lshr i64 a, 32
d = trunc i64 c to i32

into:
b = load i32* addr1
d = load i32* addr2
Where addr1 = addr2 +/- sizeof(i32), if the target supports paired load and
performs post-isel loads combining.

One should overload TargetLowering::hasPairedLoad to provide this information.
The default is false.

<rdar://problem/14477220>

llvm-svn: 192476
2013-10-11 18:29:42 +00:00
Quentin Colombet 5aee63d9e3 [DAGCombiner] Revert load slicing (r192471), until I figure out why it fails on ubuntu.
llvm-svn: 192474
2013-10-11 18:17:17 +00:00
Matthias Braun 77219d8424 Revert "Tests: Be less dependent on a specific schedule/regalloc"
This reverts r192454

Apparently FileCheck isn't as smart as I though and does not enforce a
topological order between variable defs+uses.

llvm-svn: 192472
2013-10-11 18:09:19 +00:00
Quentin Colombet 41dc258f71 [DAGCombiner] Slice a big load in two loads when the element are next to each
other in memory and the target has paired load and performs post-isel loads
combining.

E.g., this optimization will transform something like this:
 a = load i64* addr
 b = trunc i64 a to i32
 c = lshr i64 a, 32
 d = trunc i64 c to i32

into:
 b = load i32* addr1
 d = load i32* addr2
Where addr1 = addr2 +/- sizeof(i32), if the target supports paired load and
performs post-isel loads combining.

One should overload TargetLowering::hasPairedLoad to provide this information.
The default is false.

<rdar://problem/14477220>

llvm-svn: 192471
2013-10-11 18:01:14 +00:00
Amara Emerson ac6950863f [ARM] Fix FP ABI attributes with no VFP enabled.
llvm-svn: 192458
2013-10-11 16:03:43 +00:00
Matthias Braun 94b88b8851 Tests: Be less dependent on a specific schedule/regalloc
llvm-svn: 192454
2013-10-11 15:40:12 +00:00
Matheus Almeida 49b7564717 [mips][msa] Improves robustness of the test by enhancing pattern matching.
llvm-svn: 192446
2013-10-11 13:18:01 +00:00
Justin Holewinski a51418c1ca [NVPTX] Switch from StrongPHIElimination to PHIElimination in NVPTXTargetMachine, and add some missing optimization passes to addOptimizedRegAlloc
Fixes PR17529

llvm-svn: 192445
2013-10-11 12:39:39 +00:00
Justin Holewinski 660597d190 Make AsmPrinter::emitImplicitDef a virtual method so targets can emit custom comments for implicit defs
For NVPTX, this fixes a crash where the emitImplicitDef implementation was expecting physical registers,
while NVPTX uses virtual registers (with a couple of exceptions).  Now, the implicit def comment will be
emitted as a true PTX register name. Other targets can use this to customize the output of implicit def
comments.

Fixes PR17519

llvm-svn: 192444
2013-10-11 12:39:36 +00:00
Amara Emerson cc00dd4d5d [ARM] Add a test case for disabled neon/fpu features.
llvm-svn: 192440
2013-10-11 11:07:00 +00:00
Daniel Sanders 50e5ed3d08 [mips][msa] Added support for matching maddv.[bhwd], and msubv.[bhwd] from normal IR (i.e. not intrinsics)
llvm-svn: 192438
2013-10-11 10:50:42 +00:00
Daniel Sanders e67bd87c48 [mips][msa] Added support for matching fmsub.[wd] from normal IR (i.e. not intrinsics)
llvm-svn: 192435
2013-10-11 10:27:32 +00:00
Robert Lytton 4eec834c2b XCore target fix bug in emitArrayBound() causing segmentation fault
llvm-svn: 192434
2013-10-11 10:27:13 +00:00
Robert Lytton 117deb9695 XCore target does not emit '.hidden' or '.protected' attributes
llvm-svn: 192433
2013-10-11 10:27:00 +00:00
Robert Lytton 9715375c25 XCore target: fix bug in XCoreLowerThreadLocal.cpp
When a ConstantExpr which uses a thread local is part of a PHI node
instruction, the insruction that replaces the ConstantExpr must
be inserted in the predecessor block, in front of the terminator instruction.
If the predecessor block has multiple successors, the edge is first split.

llvm-svn: 192432
2013-10-11 10:26:48 +00:00
Robert Lytton 3139db02c8 XCore target: add XCoreTargetLowering::isZExtFree()
llvm-svn: 192431
2013-10-11 10:26:29 +00:00
Daniel Sanders d7103f3187 [mips][msa] Added support for matching fmadd.[wd] from normal IR (i.e. not intrinsics)
llvm-svn: 192430
2013-10-11 10:14:25 +00:00
Daniel Sanders 015972bd95 [mips][msa] Added support for matching ffint_[us].[wd], and ftrunc_[us].[wd] from normal IR (i.e. not intrinsics)
llvm-svn: 192429
2013-10-11 10:00:06 +00:00
Kevin Qin a89e7a0e1c Implement aarch64 neon instruction set AdvSIMD (copy).
llvm-svn: 192410
2013-10-11 02:33:55 +00:00
Matthias Braun b98a950cc6 Tests: Do not unnecessarily depend on kill comments
llvm-svn: 192404
2013-10-10 22:37:49 +00:00
Matthias Braun 72d3607d3c Tests: Use CHECK-LABEL where possible
llvm-svn: 192403
2013-10-10 22:37:47 +00:00
Matt Arsenault 204cfa6e43 R600: Fix trunc i64 to i32 on SI
llvm-svn: 192375
2013-10-10 18:04:16 +00:00
Tom Stellard 70f13dba15 R600/SI: Use -verify-machineinstrs for most tests
We can't enable the verifier for tests with SI_IF and SI_ELSE, because
these instructions are always followed by a COPY which copies their
result to the next basic block.  This violates the machine verifier's
rule that non-terminators can not folow terminators.

Reviewed-by: Vincent Lejeune<vljn at ovi.com>
llvm-svn: 192366
2013-10-10 17:11:46 +00:00
Hao Liu 99eac7ee44 Implement AArch64 vector load/store multiple N-element structure class SIMD(lselem).
Including following 14 instructions:
4 ld1 insts: load multiple 1-element structure to sequential 1/2/3/4 registers.
ld2/ld3/ld4: load multiple N-element structure to sequential N registers (N=2,3,4).
4 st1 insts: store multiple 1-element structure from sequential 1/2/3/4 registers.
st2/st3/st4: store multiple N-element structure from sequential N registers (N = 2,3,4).

llvm-svn: 192361
2013-10-10 17:00:52 +00:00
Rafael Espindola 9558af461d Revert "Implement AArch64 vector load/store multiple N-element structure class SIMD(lselem). Including following 14 instructions: 4 ld1 insts: load multiple 1-element structure to sequential 1/2/3/4 registers. ld2/ld3/ld4: load multiple N-element structure to sequential N registers (N=2,3,4). 4 st1 insts: store multiple 1-element structure from sequential 1/2/3/4 registers. st2/st3/st4: store multiple N-element structure from sequential N registers (N = 2,3,4)."
This reverts commit r192352. It broke the build.

llvm-svn: 192354
2013-10-10 15:15:17 +00:00
Hao Liu 9123ad8ab9 Implement AArch64 vector load/store multiple N-element structure class SIMD(lselem).
Including following 14 instructions:
4 ld1 insts: load multiple 1-element structure to sequential 1/2/3/4 registers.
ld2/ld3/ld4: load multiple N-element structure to sequential N registers (N=2,3,4).
4 st1 insts: store multiple 1-element structure from sequential 1/2/3/4 registers.
st2/st3/st4: store multiple N-element structure from sequential N registers (N = 2,3,4).

llvm-svn: 192352
2013-10-10 15:01:24 +00:00
Benjamin Kramer 20a6a436ae Disable function padding to get this test to pass on atom.
llvm-svn: 192348
2013-10-10 12:46:23 +00:00
Tim Northover 569f69dace ARM: correct liveness flags during ARMLoadStoreOpt
When we had a sequence like:

    s1 = VLDRS [r0, 1], Q0<imp-def>
    s3 = VLDRS [r0, 2], Q0<imp-use,kill>, Q0<imp-def>
    s0 = VLDRS [r0, 0], Q0<imp-use,kill>, Q0<imp-def>
    s2 = VLDRS [r0, 4], Q0<imp-use,kill>, Q0<imp-def>

we were gathering the {s0, s1} loads below the s3 load. This is fine,
but confused the verifier since now the s3 load had Q0<imp-use> with
no definition above it.

This should mark such uses <undef> as well. The liveness structure at
the beginning and end of the block is unaffected, and the true sN
definitions should prevent any dodgy reorderings being introduced
elsewhere.

rdar://problem/15124449

llvm-svn: 192344
2013-10-10 09:28:20 +00:00
Akira Hatanaka 4a3836bed4 [mips] Do not generate INS/EXT nodes if target does not have support for
ins/ext.

llvm-svn: 192330
2013-10-09 23:36:17 +00:00
Venkatraman Govindaraju 8812485d41 [Sparc] Disable tail call optimization for sparc64.
This patch fixes PR17506.

llvm-svn: 192294
2013-10-09 12:50:39 +00:00
Elena Demikhovsky a3a714082b AVX-512: Added VRCP28 and VRSQRT28 instructions and intrinsics.
llvm-svn: 192283
2013-10-09 08:16:14 +00:00
Tim Northover 1fdb076a31 AArch64: enable MISched by default.
Substantial SelectionDAG scheduling is going away soon, and is
interfering with Hao's attempts to implement LDn/STn instructions, so
I say we make the leap first.

There were a few reorderings (inevitably) which broke some tests. I
tried to replace them with CHECK-DAG variants mostly, but some too
complex for that to be useful and I just reordered them.

llvm-svn: 192282
2013-10-09 07:53:57 +00:00
Tim Northover 74cf0bd77d AArch64: migrate ADRP relaxation test to be llvm-mc only.
llvm-svn: 192281
2013-10-09 07:53:49 +00:00
Craig Topper bc749db947 Add in64BitMode/in32BitMode to the MMX/SSE2/AVX maskmovq/dq instructions. This way the asm parser will pick the right one based on the mode. Instruction selection already did the right thing based on the pointer size.
llvm-svn: 192266
2013-10-09 02:18:34 +00:00