Commit Graph

8199 Commits

Author SHA1 Message Date
Richard Sandiford e3827751e2 [SystemZ] Fix handling of 64-bit memcmp results
Generalize r188163 to cope with return types other than MVT::i32, just
as the existing visitMemCmpCall code did.  I've split this out into a
subroutine so that it can be used for other upcoming patches.

I also noticed that I'd used the wrong API to record the out chain.
It's a load that uses DAG.getRoot() rather than getRoot(), so the out
chain should go on PendingLoads.  I don't have a testcase for that because
we don't do any interesting scheduling on z yet.

llvm-svn: 188540
2013-08-16 10:55:47 +00:00
Richard Sandiford a59012577c [SystemZ] Fix sign of integer memcmp result
r188163 used CLC to implement memcmp.  Code that compares the result
directly against zero can test the CC value produced by CLC, but code
that needs an integer result must use IPM.  The sequence I'd used was:

   ipm <reg>
   sll <reg>, 2
   sra <reg>, 30

but I'd forgotten that this inverts the order, so that CC==1 ("less")
becomes an integer greater than zero, and CC==2 ("greater") becomes
an integer less than zero.  This sequence should only be used if the
CLC arguments are reversed to compensate.  The problem then is that
the branch condition must also be reversed when testing the CLC
result directly.

Rather than do that, I went for a different sequence that works with
the natural CLC order:

   ipm <reg>
   srl <reg>, 28
   rll <reg>, <reg>, 31

One advantage of this is that it doesn't clobber CC.  A disadvantage
is that any sign extension to 64 bits must be done separately,
rather than being folded into the shifts.

llvm-svn: 188538
2013-08-16 10:22:54 +00:00
Craig Topper 8c929627d9 Don't use v16i32 for load pattern matching. All 512-bit loads are cated to v8i64.
llvm-svn: 188534
2013-08-16 06:07:34 +00:00
Daniel Dunbar c7581db4b9 [tests] Add a hack to eliminate some dangling .s files on buildbots.
- Benjamin fixed the emission of this file in r179937, but it still lives on a
   few buildbots. We should probably clean up the build dirs once in a while,
   eh?

llvm-svn: 188527
2013-08-16 02:54:00 +00:00
Daniel Dunbar f296f31319 [tests] Remove an out-dated failing test.
llvm-svn: 188526
2013-08-16 02:53:29 +00:00
Tom Stellard dba25713a6 Revert "R600/SI: Fix incorrect encoding of DS_WRITE_B32 instructions"
This reverts commit a6a39ced095c2f453624ce62c4aead25db41a18f.
This is the wrong version of this fix.

llvm-svn: 188523
2013-08-16 01:18:43 +00:00
Tom Stellard 82bef57f20 R600/SI: Fix incorrect encoding of DS_WRITE_B32 instructions
The SIInsertWaits pass was overwriting the first operand (gds bit) of
DS_WRITE_B32 with the second operand (value to write).  This meant that
any time the value to write was stored in an odd number VGPR, the gds
bit would be set causing the instruction to write to GDS instead of LDS.

llvm-svn: 188522
2013-08-16 01:12:20 +00:00
Tom Stellard b03edeca67 R600: Add support for global vector loads with element types less than 32-bits
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 188521
2013-08-16 01:12:16 +00:00
Tom Stellard fbab827e2a R600: Add support for global vector stores with elements less than 32-bits
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 188520
2013-08-16 01:12:11 +00:00
Tom Stellard d3ee8c103a R600: Add support for i16 and i8 global stores
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 188519
2013-08-16 01:12:06 +00:00
Tom Stellard 6d1379e180 R600: Add support for v4i32 stores on Cayman
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 188518
2013-08-16 01:12:00 +00:00
Tom Stellard 16da74c205 R600: Enable folding of inline literals into REQ_SEQUENCE instructions
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 188517
2013-08-16 01:11:55 +00:00
Tom Stellard ac00f9df79 R600: Change the RAT instruction assembly names so they match the docs
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 188515
2013-08-16 01:11:46 +00:00
Daniel Dunbar 9efbedfd35 [tests] Cleanup initialization of test suffixes.
- Instead of setting the suffixes in a bunch of places, just set one master
   list in the top-level config. We now only modify the suffix list in a few
   suites that have one particular unique suffix (.ml, .mc, .yaml, .td, .py).

 - Aside from removing the need for a bunch of lit.local.cfg files, this enables
   4 tests that were inadvertently being skipped (one in
   Transforms/BranchFolding, a .s file each in DebugInfo/AArch64 and
   CodeGen/PowerPC, and one in CodeGen/SI which is now failing and has been
   XFAILED).

 - This commit also fixes a bunch of config files to use config.root instead of
   older copy-pasted code.

llvm-svn: 188513
2013-08-16 00:37:11 +00:00
Jack Carter d12e837f05 [Mips][msa] Added the simple builtins (madd_q to xori)
Includes:
madd_q, maddr_q, maddv, max_[asu], maxi_[su], min_[asu], mini_[su], mod_[su],
msub_q, msubr_q, msubv, mul_q, mulr_q, mulv, nloc, nlzc, nori, ori, pckev,
pckod, pcnt, sat_[su], shf, sld, sldi, sll, slli, splat, splati, sr[al],
sr[al]i, subs_[su], subss_u, subus_s, subv, subvi, vshf, xori

Patch by Daniel Sanders

llvm-svn: 188460
2013-08-15 14:22:07 +00:00
Jack Carter b95ee69163 [Mips][msa] Added the simple builtins (fadd to ftq)
Includes:
fadd, fceq, fcg[et], fclass, fcl[et], fcne, fcun, fdiv, fexdo, fexp2,
fexup[lr], ffint_[su], ffql, ffqr, fill, flog2, fmadd, fmax, fmax_a, fmin,
fmin_a, fmsub, fmul, frint, frcp, frsqrt, fseq, fsge, fsgt, fsle, fslt,
fsne, fsqr, fsub, ftint_s, ftq

Patch by Daniel Sanders

llvm-svn: 188458
2013-08-15 13:45:36 +00:00
Jack Carter babdcc8c2c [Mips][msa] Added the simple builtins (add_a to dpsub[su], ilvev to ldi)
Includes:
add_a, adds_[asu], addv, addvi, andi.b, asub_[su].[bhwd], aver?_[su]_[bhwd],
bclr, bclri, bins[lr], bins[lr]i, bmnzi, bmzi, bneg, bnegi, bseli, bset, bseti,
c(eq|ne), c(eq|ne)i, cl[et]_[su], cl[et]i_[su], copy_[su].[bhw], div_[su],
dotp_[su], dpadd_[su], dpsub_[su], ilvev, ilvl, ilvod, ilvr, insv, insve,
ldi

Patch by Daniel Sanders

llvm-svn: 188457
2013-08-15 12:24:57 +00:00
Craig Topper 8dbc7e9d35 Revert r188449 as it turns out we're just missing the instructions that need the v16i32/v16f32 matching.
llvm-svn: 188454
2013-08-15 08:38:25 +00:00
Hao Liu cd8b02dce3 Clang and AArch64 backend patches to support shll/shl and vmovl instructions and ACLE functions
llvm-svn: 188451
2013-08-15 08:26:11 +00:00
Craig Topper 2ffd06528d Don't let isPermImmMask handle v16i32 since VPERMI doesn't match on that type. Remove 128-bit vector handling from isPermImmMask too, it's covered by isPSHUFDMask.
llvm-svn: 188449
2013-08-15 07:30:51 +00:00
Tom Stellard d86003e31f R600/SI: Improve legalization of vector operations
This should fix hangs in the OpenCL piglit tests.

llvm-svn: 188431
2013-08-14 23:25:00 +00:00
Tom Stellard 6785065ace R600/SI: Replace v1i32 type with i32 in imageload and sample intrinsics
llvm-svn: 188430
2013-08-14 23:24:53 +00:00
Tom Stellard 9fa1791a1b R600/SI: Convert v16i8 resource descriptors to i128
Now that compute support is better on SI, we can't continue using v16i8
for descriptors since this is also a legal type in OpenCL.

This patch fixes numerous hangs with the piglit OpenCL test and since
we now use a target specific DAG node for LOAD_CONSTANT with the
correct MemOperandFlags, this should also fix:

https://bugs.freedesktop.org/show_bug.cgi?id=66805

llvm-svn: 188429
2013-08-14 23:24:45 +00:00
Tom Stellard b81df0c7ea R600/SI: Use i8 types for resource descriptors in tests
We switched from i32 to i8 types a while ago and the tests were never
updated.

llvm-svn: 188428
2013-08-14 23:24:37 +00:00
Tom Stellard 8e5da41374 R600/SI: Lower BUILD_VECTOR to REG_SEQUENCE v2
Using REG_SEQUENCE for BUILD_VECTOR rather than a series of INSERT_SUBREG
instructions should make it easier for the register allocator to coalasce
unnecessary copies.

v2:
  - Use an SGPR register class if all the operands of BUILD_VECTOR are
    SGPRs.

llvm-svn: 188427
2013-08-14 23:24:32 +00:00
Tom Stellard 16a9a205c8 R600/SI: Assign a register class to the $vaddr operand for MIMG instructions
The previous code declared the operand as unknown:$vaddr, which made
it possible for scalar registers to be used instead of vector registers.

llvm-svn: 188425
2013-08-14 23:24:17 +00:00
Tom Stellard 3494b7ee42 R600/SI: Handle MSAA texture targets
Patch by: Marek Olšák

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
llvm-svn: 188421
2013-08-14 22:22:14 +00:00
Tom Stellard 20ee94f152 R600/SI: Allow conversion between v32i8 and v8i32
Patch by: Marek Olšák

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
llvm-svn: 188420
2013-08-14 22:22:09 +00:00
Tom Stellard 73c31d541e R600/SI: Add pattern for fp_to_uint
This fixes the F2U opcode for the Mesa driver.

Patch by: Marek Olšák

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
llvm-svn: 188418
2013-08-14 22:21:57 +00:00
Hal Finkel b3ca00d2a3 Actually fix PPC64 64-bit GPR inline asm constraint matching
This is a follow-up to r187693, correcting that code to request the correct
register class. The previous version, with the wrong register class, was not
really correcting the constraints, but rather was removing them. Coincidentally,
this fixed the failing test case in r187693, but obviously created other
problems.

llvm-svn: 188407
2013-08-14 20:05:04 +00:00
Renato Golin b184cd99ba Let t2LDRBi8 and t2LDRBi12 have same Base Pointer
When determining if two different loads are from the same base address,
this patch allows one load to use a t2LDRi8 address mode and another to
use a t2LDRi12 address mode. The current implementation is very
conservative and this allows the case of differing Thumb2 byte loads to
be considered. Allowing these differing modes instead of forcing the exact
same opcode is useful for situations where one opcodes loads from a base
address+1 and a second opcode loads for a base address-1.

Patch by Daniel Stewart.

llvm-svn: 188385
2013-08-14 16:35:29 +00:00
NAKAMURA Takumi 89c1bfbd9d llvm/test/CodeGen/X86/setcc-sentinals.ll: Relax expressions for x86_64-win32.
llvm-svn: 188340
2013-08-14 00:46:00 +00:00
Akira Hatanaka 7473b4705a [mips] Properly parse registers that appear in inline-asm constraints.
llvm-svn: 188336
2013-08-14 00:21:25 +00:00
Jim Grosbach 327ccc787e DAG: Combine (and (setne X, 0), (setne X, -1)) -> (setuge (add X, 1), 2)
A common idiom is to use zero and all-ones as sentinal values and to
check for both in a single conditional ("x != 0 && x != (unsigned)-1").
That generates code, for i32, like:
  testl %edi, %edi
  setne %al
  cmpl  $-1, %edi
  setne %cl
  andb  %al, %cl

With this transform, we generate the simpler:
  incl  %edi
  cmpl  $1, %edi
  seta  %al

Similar improvements for other integer sizes and on other platforms. In
general, combining the two setcc instructions into one is better.

rdar://14689217

llvm-svn: 188315
2013-08-13 21:30:58 +00:00
Elena Demikhovsky 60b1f289f2 AVX-512: Added CMP and BLEND instructions.
Lowering for SETCC.

llvm-svn: 188265
2013-08-13 13:24:07 +00:00
Tom Stellard fc455471c3 R600: Set scheduling preference to Sched::Source
R600 doesn't need to do any scheduling on the SelectionDAG now that it
has a very good MachineScheduler.  Also, using the VLIW SelectionDAG
scheduler was having a major impact on compile times. For example with
the phatk kernel here are the LLVM IR to machine code compile times:

With Sched::VLIW

Total Compile Time:                  1.4890 Seconds (User + System)
SelectionDAG Instruction Scheduling: 1.1670 Seconds (User + System)

With Sched::Source

Total Compile Time:                  0.3330 Seconds (User + System)
SelectionDAG Instruction Scheduling: 0.0070 Seconds (User + System)

The code ouput was identical with both schedulers.  This may not be true
for all programs, but it gives me confidence that there won't be much
reduction, if any, in code quality by using Sched::Source.

llvm-svn: 188215
2013-08-12 22:33:21 +00:00
Tim Northover 501977eb7a Fix FileCheck --check-prefix lines.
Various tests had sprung up over the years which had --check-prefix=ABC on the
RUN line, but "CHECK-ABC:" later on. This happened to work before, but was
strictly incorrect. FileCheck is getting stricter soon though.

Patch by Ron Ofir.

llvm-svn: 188173
2013-08-12 12:43:26 +00:00
Richard Sandiford 564681c88d [SystemZ] Use CLC and IPM to implement memcmp
For now this is restricted to fixed-length comparisons with a length
in the range [1, 256], as for memcpy() and MVC.

llvm-svn: 188163
2013-08-12 10:28:10 +00:00
Tim Northover 707d68f082 Allow compatible extension attributes for tail calls
If the tail-callee and caller give the same bits via the same signext/zeroext
attribute then a tail-call should be allowed, since the extension has already
been done by the callee.

llvm-svn: 188159
2013-08-12 09:45:46 +00:00
Reed Kotler d265e88827 Don't generate floating point stubs for mips16 code if the function
is actually an instrinsic that will not occur in libc. This list here
is not exhaustive but fixes the one places in test-suite where this occurs.
I have filed a bug against myself to research the full list and add them
to the array of such cases. In the future, actual stub generation will occur
in a later phase and we won't need this code because we will know at that time
during the compilation that in fact no helper function was even needed.

llvm-svn: 188149
2013-08-11 21:30:27 +00:00
Elena Demikhovsky 5fed3b95db AVX-512: Added more tests for BROADCAST
llvm-svn: 188148
2013-08-11 12:29:16 +00:00
Elena Demikhovsky cf5b1458e6 AVX-512: Added VPERM* instructons and MOV* zmm-to-zmm instructions.
Added a test for shuffles using VPERM.

llvm-svn: 188147
2013-08-11 07:55:09 +00:00
Niels Ole Salscheider d3a039fed2 R600/SI: FMA is faster than fmul and fadd for f64
llvm-svn: 188136
2013-08-10 10:38:54 +00:00
Niels Ole Salscheider 6509ac65a9 R600/SI: Add FMA pattern
llvm-svn: 188135
2013-08-10 10:38:47 +00:00
Reed Kotler be316cffa7 Add another intrinsic that LLVM gives an incorrect prototype to.
I need to go through all the runtime routine list and see if there
are any more I need to add for mips16 floating point. Prototypes must
be correct or else I don't know to add a helper function call.

llvm-svn: 188106
2013-08-09 21:33:41 +00:00
Michael Gottesman 8afcf3a408 [stackprotector] Simplify SP Pass so that we emit different fail basic blocks for each fail condition.
This patch decouples the stack protector pass so that we can support stack
protector implementations that do not use the IR level generated stack protector
fail basic block.

No codesize increase is caused by this change since the MI level tail merge pass
properly merges together the fail condition blocks (see the updated test).

llvm-svn: 188105
2013-08-09 21:26:18 +00:00
Stephen Lin 5532f9a9c3 CHECK-LABEL-ify tests
llvm-svn: 188087
2013-08-09 17:50:15 +00:00
Craig Topper 215b00a66a Add missing 'v' prefix in front of palignr on one of checks.
llvm-svn: 188054
2013-08-09 05:41:12 +00:00
Hal Finkel 8ec43c6a0f Set ISD::FROUND to Expand by default for all types
For most libm ISD nodes, TargetLoweringBase::initActions sets the default
scalar-type action to Expand, and leaves the vector-type action default as
Legal. This is not appropriate for the new ISD::FROUND node (which no backend
but PowerPC handles explicitly).

Fixes PR16842.

llvm-svn: 188048
2013-08-09 04:13:44 +00:00
Arnold Schwaighofer c31c2de18b Revert "Reapply r185872 now that the address sanitizer has been changed to support this."
This reverts commit r187939. It broke an O0 build of a spec benchmark.

llvm-svn: 188012
2013-08-08 21:04:16 +00:00
David Fang b88cdf62f5 initial draft of PPCMachObjectWriter.cpp
this records relocation entries in the mach-o object file
for PIC code generation.
tested on powerpc-darwin8, validated against darwin otool -rvV

llvm-svn: 188004
2013-08-08 20:14:40 +00:00
Niels Ole Salscheider 719fbc9ae7 R600/SI: Implement fp32<->fp64 conversions
llvm-svn: 187988
2013-08-08 16:06:15 +00:00
Niels Ole Salscheider 4715d886f8 R600/SI: Implement sint<->fp64 conversions
llvm-svn: 187987
2013-08-08 16:06:08 +00:00
Andrea Di Biagio 612789ac97 test commit.
llvm-svn: 187974
2013-08-08 10:46:36 +00:00
Eric Christopher 0df08e2ff9 Make sure that if we're going to attempt to add a type to a DIE that
the type exists.

Fix up cases where we weren't checking for optional types and add
an assert to addType to make sure we catch this in the future.

Fix up a testcase that was using the tag for DW_TAG_array_type
when it meant DW_TAG_enumeration_type.

llvm-svn: 187963
2013-08-08 07:40:37 +00:00
Hal Finkel 2b7b2f373b PPC: Map frin to round() not nearbyint() and rint()
Making use of the recently-added ISD::FROUND, which allows for custom lowering
of round(), the PPC backend will now map frin to round(). Previously, we had
been using frin to lower nearbyint() (and rint() via some custom lowering to
handle the extra fenv flags requirements), but only in fast-math mode because
frin does not tie-to-even. Several users had complained about this behavior,
and this new mapping of frin to round is certainly more appropriate (and does
not require fast-math mode).

In effect, this reverts r178362 (and part of r178337, replacing the nearbyint
mapping with the round mapping).

llvm-svn: 187960
2013-08-08 04:31:34 +00:00
Bill Wendling b80f9791e4 Reapply r185872 now that the address sanitizer has been changed to support this.
Original commit message:

Stop emitting weak symbols into the "coal" sections.

The Mach-O linker has been able to support the weak-def bit on any symbol for
quite a while now. The compiler however continued to place these symbols into a
"coal" section, which required the linker to map them back to the base section
name.

Replace the sections like this:

  __TEXT/__textcoal_nt   instead use  __TEXT/__text
  __TEXT/__const_coal    instead use  __TEXT/__const
  __DATA/__datacoal_nt   instead use  __DATA/__data

<rdar://problem/14265330>

llvm-svn: 187939
2013-08-07 23:42:09 +00:00
Elena Demikhovsky 45c54ad8dc AVX-512 set: Added BROADCAST instructions
with lowering logic and a test.

llvm-svn: 187884
2013-08-07 12:34:55 +00:00
Richard Sandiford 0897fce2f4 [SystemZ] Optimize floating-point comparisons with zero
This follows the same lines as the integer code.  In the end it seemed
easier to have a second 4-bit mask in TSFlags to specify the compare-like
CC values.  That eats one more TSFlags bit than adding a CCHasUnordered
would have done, but it feels more concise.

llvm-svn: 187883
2013-08-07 11:10:06 +00:00
Richard Sandiford 9f11bc1956 [SystemZ] Add floating-point load-and-test instructions
These instructions can also be used as comparisons with zero.

llvm-svn: 187882
2013-08-07 11:03:34 +00:00
Reed Kotler bb870e20e2 Create a pattern for the "trap" instruction.
llvm-svn: 187863
2013-08-07 04:00:26 +00:00
Tom Stellard 2f7cdda57e R600/SI: Use VSrc_* register classes as the default classes for types
Since the VSrc_* register classes contain both VGPRs and SGPRs, copies
that used be emitted by isel like this:

SGPR = COPY VGPR

Will now be emitted like this:

VSrC = COPY VGPR

This patch also adds a pass that tries to identify and fix situations where
a VGPR to SGPR copy may occur.  Hopefully, these changes will make it
impossible for the compiler to generate illegal VGPR to SGPR copies.

llvm-svn: 187831
2013-08-06 23:08:28 +00:00
Tom Stellard 4c0ffccbbf R600/SI: Add more special cases for opcodes to ensureSRegLimit()
Also factor out the register class lookup to its own function.

llvm-svn: 187830
2013-08-06 23:08:18 +00:00
Manman Ren b75e0c92f3 Debug Info Finder|Verifier: handle DbgLoc attached to instructions.
Also remove checking of llvm.dbg.sp since it is not used in generating dwarf.

Current state of Finder:
DebugInfoFinder tries to list all debug info MDNodes used in a module. To
list debug info MDNodes used by an instruction, DebugInfoFinder provides
processDeclare, processValue and processLocation to handle DbgDeclareInst,
DbgValueInst and DbgLoc attached to instructions. processModule will go
through all DICompileUnits in llvm.dbg.cu and list debug info MDNodes
used by the CUs.

TODO:
1> Finder has a list of CUs, SPs, Types, Scopes and global variables. We
need to add a list of variables that are used by DbgDeclareInst and
DbgValueInst.
2> MDString fields should be null or isa<MDString> and MDNode fields should be
null or isa<MDNode>. We currently use empty string or int 0 to represent null.
3> Go though Verify functions and make sure that they check field types.
4> Clean up existing testing cases to remove llvm.dbg.sp and make sure each
testing case has a llvm.dbg.cu.

Re-apply r187609 with fix to pass ocaml binding. vmcore.ml generates a debug
location with scope being metadata !{}, in verifier we treat this as a null
scope.

llvm-svn: 187812
2013-08-06 19:38:43 +00:00
Hal Finkel 11b9e452f6 Add PPC64 mulli pattern
The PPC backend had been missing a pattern to generate mulli for 64-bit
multiples. We had been generating it only for 32-bit multiplies. Unfortunately,
generating li + mulld unnecessarily increases register pressure.

llvm-svn: 187807
2013-08-06 17:03:03 +00:00
Justin Holewinski debe686f05 [NVPTX] Add missing patterns for i1 [s,u]int_to_fp
llvm-svn: 187800
2013-08-06 14:13:34 +00:00
Justin Holewinski 871ec93909 [NVPTX] Fix bug in stack code generation causes by MC conversion
We do use a very small set of physical registers, so account for
them in the virtual register encoding between MachineInstr and MC

llvm-svn: 187799
2013-08-06 14:13:31 +00:00
Justin Holewinski a2a63d28df [NVPTX] Start conversion to MC infrastructure
This change converts the NVPTX target to use the MC infrastructure
instead of directly emitting MachineInstr instances. This brings
the target more up-to-date with LLVM TOT, and should fix PR15175
and PR15958 (libNVPTXInstPrinter is empty) as a side-effect.

llvm-svn: 187798
2013-08-06 14:13:27 +00:00
Tim Northover cc2e903bda ARM: implement allowTruncateForTailCall
Now that it's in place, it seems silly not to let ARM make use of the extra
tail call opportunities.

llvm-svn: 187795
2013-08-06 13:58:03 +00:00
Tim Northover a4415854db Refactor isInTailCallPosition handling
This change came about primarily because of two issues in the existing code.
Niether of:

define i64 @test1(i64 %val) {
  %in = trunc i64 %val to i32
  tail call i32 @ret32(i32 returned %in)
  ret i64 %val
}

define i64 @test2(i64 %val) {
  tail call i32 @ret32(i32 returned undef)
  ret i32 42
}

should be tail calls, and the function sameNoopInput is responsible. The main
problem is that it is completely symmetric in the "tail call" and "ret" value,
but in reality different things are allowed on each side.

For these cases:
1. Any truncation should lead to a larger value being generated by "tail call"
   than needed by "ret".
2. Undef should only be allowed as a source for ret, not as a result of the
   call.

Along the way I noticed that a mismatch between what this function treats as a
valid truncation and what the backends see can lead to invalid calls as well
(see x86-32 test case).

This patch refactors the code so that instead of being based primarily on
values which it recurses into when necessary, it starts by inspecting the type
and considers each fundamental slot that the backend will see in turn. For
example, given a pathological function that returned {{}, {{}, i32, {}}, i32}
we would consider each "real" i32 in turn, and ask if it passes through
unchanged. This is much closer to what the backend sees as a result of
ComputeValueVTs.

Aside from the bug fixes, this eliminates the recursion that's going on and, I
believe, makes the bulk of the code significantly easier to understand. The
trade-off is the nasty iterators needed to find the real types inside a
returned value.

llvm-svn: 187787
2013-08-06 09:12:35 +00:00
Tom Stellard aa664d9b92 Factor FlattenCFG out from SimplifyCFG
Patch by: Mei Ye

llvm-svn: 187764
2013-08-06 02:43:45 +00:00
Tom Stellard eef2ad92c7 R600/SI: Add missing test for r187749
llvm-svn: 187754
2013-08-05 22:45:56 +00:00
Richard Sandiford c212125d27 [SystemZ] Use BRCT and BRCTG to eliminate add-&-compare sequences
This patch just uses a peephole test for "add; compare; branch" sequences
within a single block.  The IR optimizers already convert loops to
decrement-and-branch-on-nonzero form in some cases, so even this
simplistic test triggers many times during a clang bootstrap and
projects/test-suite run.  It looks like there are still cases where we
need to more strongly prefer branches on nonzero though.  E.g. I saw a
case where a loop that started out with a check for 0 ended up with a
check for -1.  I'll try to look at that sometime.

I ended up adding the Reference class because MachineInstr::readsRegister()
doesn't check for subregisters (by design, as far as I could tell).

llvm-svn: 187723
2013-08-05 11:23:46 +00:00
Richard Sandiford b49a3ab262 [SystemZ] Use LOAD AND TEST to eliminate comparisons against zero
llvm-svn: 187720
2013-08-05 11:03:20 +00:00
Elena Demikhovsky 40864b690b AVX-512 set: added mask operations, lowering BUILD_VECTOR for i1 vector types.
Added intrinsics and tests.

llvm-svn: 187717
2013-08-05 08:52:21 +00:00
Reed Kotler 9c285b300d Add the saving of S2. This is needed for some of the floating point
helper functions. This can be optimized out later when the remaining
parts of the helper function work is moved into the Mips16HardFloat pass.
For now it forces us to use the 32 bit save/restore instructions instead
of the 16 bit ones.

llvm-svn: 187712
2013-08-04 23:56:53 +00:00
Benjamin Kramer 5bc180c14f X86: Turn fp selects into mask operations.
double test(double a, double b, double c, double d) { return a<b ? c : d; }

before:
_test:
	ucomisd	%xmm0, %xmm1
	ja	LBB0_2
	movaps	%xmm3, %xmm2
LBB0_2:
	movaps	%xmm2, %xmm0

after:
_test:
	cmpltsd	%xmm1, %xmm0
	andpd	%xmm0, %xmm2
	andnpd	%xmm3, %xmm0
	orpd	%xmm2, %xmm0

Small speedup on Benchmarks/SmallPT

llvm-svn: 187706
2013-08-04 12:05:16 +00:00
Elena Demikhovsky cd46691728 AVX-512 set: added VEXTRACTPS instruction
llvm-svn: 187705
2013-08-04 10:46:07 +00:00
Tim Northover adb550068a X86: specify CPU on new test to fix atom buildbot
Apparently Atoms use lea for stack adjustment, which we weren't
looking for.

llvm-svn: 187704
2013-08-04 10:00:45 +00:00
Tim Northover ecc018c7b7 X86: correct tail return address calculation
Due to the weird and wondeful usual arithmetic conversions, some
calculations involving negative values were getting performed in
uint32_t and then promoted to int64_t, which is really not a good
idea.

Patch by Katsuhiro Ueno.

llvm-svn: 187703
2013-08-04 09:35:57 +00:00
Reed Kotler 30cedf65ef Clean up code for Mips16 large frame handling.
llvm-svn: 187701
2013-08-04 01:13:25 +00:00
Hal Finkel b176acb6b7 Fix PPC64 64-bit GPR inline asm constraint matching
Internally, the PowerPC backend names the 32-bit GPRs R[0-9]+, and names the
64-bit parent GPRs X[0-9]+. When matching inline assembly constraints with
explicit register names, on PPC64 when an i64 MVT has been requested, we need
to follow gcc's convention of using r[0-9]+ to refer to the 64-bit (parent)
registers.

At some point, we'll probably want to arrange things so that the generic code
in TargetLowering uses the AsmName fields declared in *RegisterInfo.td in order
to match these inline asm register constraints. If we do that, this change can
be reverted.

llvm-svn: 187693
2013-08-03 12:25:10 +00:00
Akira Hatanaka 7be35cb1bf [mips] Expand vector truncating stores and extending loads.
llvm-svn: 187667
2013-08-02 19:23:33 +00:00
Eric Christopher cdc78961d3 Temporarily revert "Debug Info Finder|Verifier: handle DbgLoc attached to
instructions." in an attempt to bring back some bots.

This reverts commit r187609.

llvm-svn: 187638
2013-08-02 00:49:44 +00:00
Bill Wendling a5c536e1ee Use function attributes to indicate that we don't want to realign the stack.
Function attributes are the future! So just query whether we want to realign the
stack directly from the function instead of through a random target options
structure.

llvm-svn: 187618
2013-08-01 21:42:05 +00:00
Reed Kotler 83f879ddb2 Fix some issues with Mips16 floating when certain intrinsics are present.
This is actually an LLVM bug in the way it generates signatures for these
when soft float is enabled. For example, floor ends up having the signature
of int64(int64). The signature part is not the same as where the actual
parameter types are recorded, and those ARE of course int64(int64) when
soft float is enabled. (Yes, Mips16 hard float uses soft float but with
different runtime rounes but then has to interoperate with Mips32 using
normal floating point). This logic will eventually be moved to the 
Mips16HardFloat pass so it's not worth sorting out these issues in LLVM
since nobody but Mips16 cares about these signatures, as far as I know,
and even I won't eventually either.

llvm-svn: 187613
2013-08-01 21:17:53 +00:00
Manman Ren 4c065e779c Debug Info Finder|Verifier: handle DbgLoc attached to instructions.
Also remove checking of llvm.dbg.sp since it is not used in generating dwarf.

Current state of Finder:
DebugInfoFinder tries to list all debug info MDNodes used in a module. To
list debug info MDNodes used by an instruction, DebugInfoFinder provides
processDeclare, processValue and processLocation to handle DbgDeclareInst,
DbgValueInst and DbgLoc attached to instructions. processModule will go
through all DICompileUnits in llvm.dbg.cu and list debug info MDNodes
used by the CUs.

TODO:
1> Finder has a list of CUs, SPs, Types, Scopes and global variables. We
need to add a list of variables that are used by DbgDeclareInst and
DbgValueInst.
2> MDString fields should be null or isa<MDString> and MDNode fields should be
null or isa<MDNode>. We currently use empty string or int 0 to represent null.
3> Go though Verify functions and make sure that they check field types.
4> Clean up existing testing cases to remove llvm.dbg.sp and make sure each
testing case has a llvm.dbg.cu.

llvm-svn: 187609
2013-08-01 20:52:39 +00:00
Tom Stellard 0344cdfe39 R600: Add 64-bit float load/store support
* Added R600_Reg64 class
* Added T#Index#.XY registers definition
* Added v2i32 register reads from parameter and global space
* Added f32 and i32 elements extraction from v2f32 and v2i32
* Added v2i32 -> v2f32 conversions

Tom Stellard:
  - Mark vec2 operations as expand.  The addition of a vec2 register
    class made them all legal.

Patch by: Dmitry Cherkassov

Signed-off-by: Dmitry Cherkassov <dcherkassov@gmail.com>
llvm-svn: 187582
2013-08-01 15:23:42 +00:00
Tom Stellard 53698938a4 R600: Use 64-bit alignment for 64-bit kernel arguments
llvm-svn: 187581
2013-08-01 15:23:31 +00:00
Tom Stellard 98f675a994 R600/SI: Custom lower i64 ZERO_EXTEND
llvm-svn: 187580
2013-08-01 15:23:26 +00:00
Richard Sandiford fd7f4ae6d4 [SystemZ] Reuse CC results for integer comparisons with zero
This also fixes a bug in the predication of LR to LOCR: I'd forgotten
that with these in-place instruction builds, the implicit operands need
to be added manually.  I think this was latent until now, but is tested
by int-cmp-45.c.  It also adds a CC valid mask to STOC, again tested by
int-cmp-45.c.

llvm-svn: 187573
2013-08-01 10:39:40 +00:00
Richard Sandiford a075708abe [SystemZ] Prefer comparisons with zero
Convert >= 1 to > 0, etc.  Using comparison with zero isn't a win on its own,
but it exposes more opportunities for CC reuse (the next patch).

llvm-svn: 187571
2013-08-01 10:29:45 +00:00
Tim Northover 40e9efd725 AArch64: add initial NEON support
Patch by Ana Pazos.

- Completed implementation of instruction formats:
AdvSIMD three same
AdvSIMD modified immediate
AdvSIMD scalar pairwise

- Completed implementation of instruction classes
(some of the instructions in these classes
belong to yet unfinished instruction formats):
Vector Arithmetic
Vector Immediate
Vector Pairwise Arithmetic

- Initial implementation of instruction formats:
AdvSIMD scalar two-reg misc
AdvSIMD scalar three same

- Intial implementation of instruction class:
Scalar Arithmetic

- Initial clang changes to support arm v8 intrinsics.
Note: no clang changes for scalar intrinsics function name mangling yet.

- Comprehensive test cases for added instructions
To verify auto codegen, encoding, decoding, diagnosis, intrinsics.

llvm-svn: 187567
2013-08-01 09:20:35 +00:00
Robert Lytton 4be00f8ad1 XCore target: Fix Vararg handling
llvm-svn: 187565
2013-08-01 08:29:44 +00:00
Robert Lytton 4e60a3f4e3 XCore target: Add byval handling
llvm-svn: 187563
2013-08-01 08:18:55 +00:00
Robert Lytton b4787a159d Xcore target
Fix emitArrayBound() calling OutStreamer.Emit*() multiple times when trying to print a single line

llvm-svn: 187562
2013-08-01 07:52:05 +00:00
Reed Kotler 302ae6b002 Fix some misc. issues with Mips16 fp stubs.
1) They should never be inlined.
2) A naming inconsistency with gcc mips16
3) Stubs should not have the global attribute

llvm-svn: 187555
2013-08-01 02:26:31 +00:00
Tom Stellard ca69a53bae Revert "R600: Non vector only instruction can be scheduled on trans unit"
This reverts commit 98ce62780ea7185ba710868bf83c8077e8d7f6d6.

llvm-svn: 187526
2013-07-31 20:43:27 +00:00
Vincent Lejeune bb3f931123 R600: Avoid more than 4 literals in the same instruction group at scheduling
llvm-svn: 187515
2013-07-31 19:32:07 +00:00
Vincent Lejeune df18804e26 R600: Non vector only instruction can be scheduled on trans unit
llvm-svn: 187514
2013-07-31 19:31:56 +00:00
Richard Sandiford 791bea4182 [SystemZ] Implement isLegalAddressingMode()
The loop optimizers were assuming that scales > 1 were OK.  I think this
is actually a bug in TargetLoweringBase::isLegalAddressingMode(),
since it seems to be trying to reject anything that isn't r+i or r+r,
but it has no default case for scales other than 0, 1 or 2.  Implementing
the hook for z means that z can no longer test any change there though.

llvm-svn: 187497
2013-07-31 12:58:26 +00:00
Richard Sandiford ee8343822e [SystemZ] Be more careful about inverting CC masks (conditional loads)
Extend r187495 to conditional loads.  I split this out because the
easiest way seemed to be to force a particular operand order in
SystemZISelDAGToDAG.cpp.

llvm-svn: 187496
2013-07-31 12:38:08 +00:00
Richard Sandiford 3d768e334b [SystemZ] Be more careful about inverting CC masks
System z branches have a mask to select which of the 4 CC values should
cause the branch to be taken.  We can invert a branch by inverting the mask.
However, not all instructions can produce all 4 CC values, so inverting
the branch like this can lead to some oddities.  For example, integer
comparisons only produce a CC of 0 (equal), 1 (less) or 2 (greater).
If an integer EQ is reversed to NE before instruction selection,
the branch will test for 1 or 2.  If instead the branch is reversed
after instruction selection (by inverting the mask), it will test for
1, 2 or 3.  Both are correct, but the second isn't really canonical.
This patch therefore keeps track of which CC values are possible
and uses this when inverting a mask.

Although this is mostly cosmestic, it fixes undefined behavior
for the CIJNLH in branch-08.ll.  Another fix would have been
to mask out bit 0 when generating the fused compare and branch,
but the point of this patch is that we shouldn't need to do that
in the first place.

The patch also makes it easier to reuse CC results from other instructions.

llvm-svn: 187495
2013-07-31 12:30:20 +00:00
Richard Sandiford 8a757bba10 [SystemZ] Move compare-and-branch generation even later
r187116 moved compare-and-branch generation from the instruction-selection
pass to the peephole optimizer (via optimizeCompare).  It turns out that even
this is a bit too early.  Fused compare-and-branch instructions don't
interact well with predication, where a CC result is needed.  They also
make it harder to reuse the CC side-effects of earlier instructions
(not yet implemented, but the subject of a later patch).

Another problem was that the AnalyzeBranch family of routines weren't
handling compares and branches, so we weren't able to reverse the fused
form in cases where we would reverse a separate branch.  This could have
been fixed by extending AnalyzeBranch, but given the other problems,
I've instead moved the fusing to the long-branch pass, which is also
responsible for the opposite transformation: splitting out-of-range
compares and branches into separate compares and long branches.

I've added a test for the AnalyzeBranch problem.  A test for the
predication problem is included in the next patch, which fixes a bug
in the choice of CC mask.

llvm-svn: 187494
2013-07-31 12:11:07 +00:00
Richard Sandiford 6a06ba36ba [SystemZ] Postpone NI->RISBG conversion to convertToThreeAddress()
r186399 aggressively used the RISBG instruction for immediate ANDs,
both because it can handle some values that AND IMMEDIATE can't,
and because it allows the destination register to be different from
the source.  I realized later while implementing the distinct-ops
support that it would be better to leave the choice up to
convertToThreeAddress() instead.  The AND IMMEDIATE form is shorter
and is less likely to be cracked.

This is a problem for 32-bit ANDs because we assume that all 32-bit
operations will leave the high word untouched, whereas RISBG used in
this way will either clear the high word or copy it from the source
register.  The patch uses the z196 instruction RISBLG for this instead.

This means that z10 will be restricted to NILL, NILH and NILF for
32-bit ANDs, but I think that should be OK for now.  Although we're
using z10 as the base architecture, the optimization work is going
to be focused more on z196 and zEC12.

llvm-svn: 187492
2013-07-31 11:36:35 +00:00
Elena Demikhovsky 67b05fc0b3 Added INSERT and EXTRACT intructions from AVX-512 ISA.
All insertf*/extractf* functions replaced with insert/extract since we have insertf and inserti forms.
Added lowering for INSERT_VECTOR_ELT / EXTRACT_VECTOR_ELT for 512-bit vectors.
Added lowering for EXTRACT/INSERT subvector for 512-bit vectors.
Added a test.

llvm-svn: 187491
2013-07-31 11:35:14 +00:00
Craig Topper efd67d4612 Changed register names (and pointer keywords) to be lower case when using Intel X86 assembler syntax.
Patch by Richard Mitton.

llvm-svn: 187476
2013-07-31 02:47:52 +00:00
Andrew Trick 3f423dec77 This test may have been sensitive to the ARM ABI...
llvm-svn: 187442
2013-07-30 20:34:59 +00:00
Andrew Trick d9761776bc MI Sched fix: assert "Disconnected LRG within the scheduling region."
llvm-svn: 187435
2013-07-30 19:59:08 +00:00
Tom Stellard aa313d0a74 R600/SI: Expand vector fp <-> int conversions
llvm-svn: 187421
2013-07-30 14:31:03 +00:00
Saleem Abdulrasool 0c2ee5a2cb [ARM] check bitwidth in PerformORCombine
When simplifying a (or (and B A) (and C ~A)) to a (VBSL A B C) ensure that the
bitwidth of the second operands to both ands match before comparing the negation
of the values.

Split the check of the value of the second operands to the ands.  Move the cast
and variable declaration slightly higher to make it slightly easier to follow.

Bug-Id: 16700
Signed-off-by: Saleem Abdulrasool <compnerd@compnerd.org>
llvm-svn: 187404
2013-07-30 04:43:08 +00:00
Quentin Colombet e2e0548d77 [R600] Replicate old DAGCombiner behavior in target specific DAG combine.
build_vector is lowered to REG_SEQUENCE, which is something the register
allocator does a good job at optimizing.

llvm-svn: 187397
2013-07-30 00:27:16 +00:00
Quentin Colombet 6bf4baa408 [DAGCombiner] insert_vector_elt: Avoid building a vector twice.
This patch prevents the following combine when the input vector is used more
than once.
insert_vector_elt (build_vector elt0, ..., eltN), NewEltIdx, idx
=>
build_vector elt0, ..., NewEltIdx, ..., eltN 

The reasons are:
- Building a vector may be expensive, so try to reuse the existing part of a
  vector instead of creating a new one (think big vectors).
- elt0 to eltN now have two users instead of one. This may prevent some other
  optimizations.

llvm-svn: 187396
2013-07-30 00:24:09 +00:00
Manman Ren 620e978f69 Debug Info: enable verifier for testing cases.
llvm-svn: 187375
2013-07-29 20:18:19 +00:00
Manman Ren e9a52e18da Debug Info: update testing cases to pass verifier.
llvm-svn: 187362
2013-07-29 18:12:58 +00:00
Nico Rieck 06d17c80cc Proper va_arg/va_copy lowering on win64
Win64 uses CharPtrBuiltinVaList instead of X86_64ABIBuiltinVaList like
other 64-bit targets.

llvm-svn: 187355
2013-07-29 13:07:06 +00:00
Silviu Baranga 91ddaa1b48 Allow generation of vmla.f32 instructions when targeting Cortex-A15. The patch also adds the VFP4 feature to Cortex-A15 and fixes the DontUseFusedMAC predicate so that we can still generate vmla.f32 instructions on non-darwin targets with VFP4.
llvm-svn: 187349
2013-07-29 09:25:50 +00:00
Manman Ren 921382ed78 Debug Info Verifier: verify SPs in llvm.dbg.sp.
Also always add DIType, DISubprogram and DIGlobalVariable to the list
in DebugInfoFinder without checking them, so we can verify them later
on.

llvm-svn: 187285
2013-07-27 01:26:08 +00:00
Rafael Espindola f9709e753f next batch of -disable-debug-info-verifier
llvm-svn: 187260
2013-07-26 22:31:26 +00:00
Akira Hatanaka a3d9ab90dc [mips] Implement llvm.trap intrinsic.
Patch by Sasa Stankovic.

llvm-svn: 187244
2013-07-26 20:58:55 +00:00
Manman Ren cc4e4d80fe Debug Info Verifier: enable verification of DICompileUnit.
We used to call Verify before adding DICompileUnit to the list, and now we
remove the check and always add DICompileUnit to the list in DebugInfoFinder,
so we can verify them later on.

llvm-svn: 187237
2013-07-26 20:04:30 +00:00
Akira Hatanaka 53900e5124 [mips] Print instructions "beq", "bne" and "or" using assembler pseudo
instructions "beqz", "bnez" and "move", when possible.

beq $2, $zero, $L1 => beqz $2, $L1
bne $2, $zero, $L1 => bnez $2, $L1
or  $2, $3, $zero  => move $2, $3

llvm-svn: 187229
2013-07-26 18:34:25 +00:00
Justin Holewinski d3f2035a3c Add a target legalize hook for SplitVectorOperand (again)
CustomLowerNode was not being called during SplitVectorOperand,
meaning custom legalization could not be used by targets.

This also adds a test case for NVPTX that depends on this custom
legalization.

Differential Revision: http://llvm-reviews.chandlerc.com/D1195

Attempt to fix the buildbots by making the X86 test I just added platform independent

llvm-svn: 187202
2013-07-26 13:28:29 +00:00
Rafael Espindola 1d812728cc Revert "Add a target legalize hook for SplitVectorOperand"
This reverts commit 187198. It broke the bots.

The soft float test probably needs a -triple because of name differences.
On the hard float test I am getting a "roundss $1, %xmm0, %xmm0", instead of
"vroundss $1, %xmm0, %xmm0, %xmm0".

llvm-svn: 187201
2013-07-26 13:18:16 +00:00
Justin Holewinski f848a24e50 Add a target legalize hook for SplitVectorOperand
CustomLowerNode was not being called during SplitVectorOperand,
meaning custom legalization could not be used by targets.

This also adds a test case for NVPTX that depends on this custom
legalization.

Differential Revision: http://llvm-reviews.chandlerc.com/D1195

llvm-svn: 187198
2013-07-26 12:46:39 +00:00
Roman Divacky c3825df87e PPC32 va_list is an actual structure so va_copy needs to copy the whole
structure not just a pointer. This implements that and thus fixes va_copy
on PPC32. Fixes #15286. Both bug and patch by Florian Zeitz!

llvm-svn: 187158
2013-07-25 21:36:47 +00:00
Manman Ren 5873770238 Debug Info: improve the verifier to check field types.
Make sure the context field of DIType is MDNode.
Fix testing cases to make them pass the verifier.

llvm-svn: 187150
2013-07-25 19:33:30 +00:00
Rafael Espindola 729866670b Remove the mblaze backend from llvm.
Approval in here http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-July/064169.html

llvm-svn: 187145
2013-07-25 18:55:05 +00:00
Andrew Trick 8bb0a251fd Evict local live ranges if they can be reassigned.
The previous change to local live range allocation also suppressed
eviction of local ranges. In rare cases, this could result in more
expensive register choices. This commit actually revives a feature
that I added long ago: check if live ranges can be reassigned before
eviction. But now it only happens in rare cases of evicting a local
live range because another local live range wants a cheaper register.

The benefit is improved code size for some benchmarks on x86 and armv7.

I measured no significant compile time increase and performance
changes are noise.

llvm-svn: 187140
2013-07-25 18:35:19 +00:00
Andrew Trick 8485257d6d Allocate local registers in order for optimal coloring.
Also avoid locals evicting locals just because they want a cheaper register.

Problem: MI Sched knows exactly how many registers we have and assumes
they can be colored. In cases where we have large blocks, usually from
unrolled loops, greedy coloring fails. This is a source of
"regressions" from the MI Scheduler on x86. I noticed this issue on
x86 where we have long chains of two-address defs in the same live
range. It's easy to see this in matrix multiplication benchmarks like
IRSmk and even the unit test misched-matmul.ll.

A fundamental difference between the LLVM register allocator and
conventional graph coloring is that in our model a live range can't
discover its neighbors, it can only verify its neighbors. That's why
we initially went for greedy coloring and added eviction to deal with
the hard cases. However, for singly defined and two-address live
ranges, we can optimally color without visiting neighbors simply by
processing the live ranges in instruction order.

Other beneficial side effects:

It is much easier to understand and debug regalloc for large blocks
when the live ranges are allocated in order. Yes, global allocation is
still very confusing, but it's nice to be able to comprehend what
happened locally.

Heuristics could be added to bias register assignment based on
instruction locality (think late register pairing, banks...).

Intuituvely this will make some test cases that are on the threshold
of register pressure more stable.

llvm-svn: 187139
2013-07-25 18:35:14 +00:00
Rafael Espindola cb9afe9ad7 Current batch of -disable-debug-info-verifier.
llvm-svn: 187130
2013-07-25 17:16:05 +00:00
Tim Northover 8f7613ae03 AArch64: add llc-based tests for previous commit.
Better to have tests run even on non-AArch64 platforms.

llvm-svn: 187128
2013-07-25 16:23:55 +00:00
Richard Sandiford c3f85d73ab [SystemZ] Rework compare and branch support
Before the patch we took advantage of the fact that the compare and
branch are glued together in the selection DAG and fused them together
(where possible) while emitting them.  This seemed to work well in practice.
However, fusing the compare so early makes it harder to remove redundant
compares in cases where CC already has a suitable value.  This patch
therefore uses the peephole analyzeCompare/optimizeCompareInstr pair of
functions instead.

No behavioral change intended, but it paves the way for a later patch.

llvm-svn: 187116
2013-07-25 09:34:38 +00:00
Richard Sandiford f2404164ba [SystemZ] Add LOCR and LOCGR
llvm-svn: 187113
2013-07-25 09:11:15 +00:00
Richard Sandiford 09a8cf3604 [SystemZ] Add LOC and LOCG
As with the stores, these instructions can trap when the condition is false,
so they are only used for things like (cond ? x : *ptr).

llvm-svn: 187112
2013-07-25 09:04:52 +00:00
Richard Sandiford a68e6f5660 [SystemZ] Add STOC and STOCG
These instructions are allowed to trap even if the condition is false,
so for now they are only used for "*ptr = (cond ? x : *ptr)"-style
constructs.

llvm-svn: 187111
2013-07-25 08:57:02 +00:00
Manman Ren e1fb94306d Debug Info: improve the verifier to check field types.
Make sure the context and type fields are MDNodes. We will generate
verification errors if those fields are non-empty strings.
Fix testing cases to make them pass the verifier.

llvm-svn: 187106
2013-07-25 06:43:01 +00:00
Bill Wendling 440e9d81bf Replace the "NoFramePointerElimNonLeaf" target option with a function attribute.
There's no need to specify a flag to omit frame pointer elimination on non-leaf
nodes...(Honestly, I can't parse that option out.) Use the function attribute
stuff instead.

llvm-svn: 187093
2013-07-25 00:34:29 +00:00
Manman Ren ed696c3dc1 Update testing cases to pass debug info verifier.
llvm-svn: 187083
2013-07-24 22:23:00 +00:00
Quentin Colombet bdab227e53 Fix a bug in IfConverter with nested predicates.
Prior to this patch, IfConverter may widen the cases where a sequence of
instructions were executed because of the way it uses nested predicates. This
result in incorrect execution.

For instance, Let A be a basic block that flows conditionally into B and B be a
predicated block.
B can be predicated with A.BrToBPredicate into A iff B.Predicate is less
"permissive" than A.BrToBPredicate, i.e., iff A.BrToBPredicate subsumes
B.Predicate.

The IfConverter was checking the opposite: B.Predicate subsumes
A.BrToBPredicate.

<rdar://problem/14379453>

llvm-svn: 187071
2013-07-24 20:20:37 +00:00
Manman Ren fdfc1ebfbc Debug Info: improve the Finder.
Improve the Finder to handle context of a DIVariable used by DbgValueInst.
Fix testing cases to make them pass the verifier.

llvm-svn: 187052
2013-07-24 17:10:09 +00:00
Manman Ren db5a0a5818 Update testing cases to pass debug info verifier.
llvm-svn: 187049
2013-07-24 15:55:41 +00:00
Manman Ren 3f93d3d61e Update testing cases to make them pass debug info verification.
llvm-svn: 187016
2013-07-24 01:26:37 +00:00
Tom Stellard c54731aa9d DAGCombiner: Pass the correct type to TargetLowering::isF(Abs|Neg)Free
This commit also implements these functions for R600 and removes a test
case that was relying on the buggy behavior.

llvm-svn: 187007
2013-07-23 23:55:03 +00:00
Tom Stellard 8cb0e47c9e R600: Treat CONSTANT_ADDRESS loads like GLOBAL_ADDRESS loads when necessary
These are really the same address space in hardware.  The only
difference is that CONSTANT_ADDRESS uses a special cache for faster
access.  When we are unable to use the constant kcache for some reason
(e.g. smaller types or lack of indirect addressing) then the instruction
selector must use GLOBAL_ADDRESS loads instead.

llvm-svn: 187006
2013-07-23 23:54:56 +00:00
Manman Ren 8f1a3cf4c3 Debug Info: improve the Finder.
Improve the Finder to handle context of a DIVariable.
If Scope is a DICompileUnit, add it to the list of CUs.

llvm-svn: 187003
2013-07-23 23:10:00 +00:00
Quentin Colombet 0f2fe74aaf [ARM][ISel] Improve the lowering of vector loads.
When vectors are built from a single value, the ARM lowering issues a
scalar_to_vector node.
This node is then always morphed into a move from the general purpose unit to
the vector unit.
When the value comes from a load, this can be simplified into a vector load to
the right lane.

This patch changes the lowering of insert_vector_elt to expose a vector
friendly pattern in this situation.

This is a step toward fixing <rdar://problem/14170854>.

llvm-svn: 186999
2013-07-23 22:34:47 +00:00
Tom Stellard 5263948a7b R600: Add support for 24-bit MAD instructions
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186923
2013-07-23 01:48:49 +00:00
Tom Stellard 41fc7853be R600: Add support for 24-bit MUL instructions
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186922
2013-07-23 01:48:42 +00:00
Tom Stellard 9f95033d33 R600: Improve support for < 32-bit loads
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186921
2013-07-23 01:48:35 +00:00
Tom Stellard 840214437b R600: Move CONST_ADDRESS folding into AMDGPUDAGToDAGISel::Select()
This increases the number of opportunites we have for folding.  With the
previous implementation we were unable to fold into any instructions
other than the first when multiple instructions were selected from a
single SDNode.

Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186919
2013-07-23 01:48:24 +00:00
Tom Stellard 1e80309ebe R600: Use KCache for kernel arguments
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186918
2013-07-23 01:48:18 +00:00
Tom Stellard acfeebf883 R600: Use the same compute kernel calling convention for all GPUs
A side-effect of this is that now the compiler expects kernel arguments
to be 4-byte aligned.

Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186916
2013-07-23 01:48:05 +00:00
Tom Stellard 78e012969c R600: Use correct LoadExtType when lowering kernel arguments
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186915
2013-07-23 01:47:58 +00:00
Tom Stellard 33dd04bfbe R600: Clean up extended load patterns
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186914
2013-07-23 01:47:52 +00:00
Tom Stellard beed74af48 R600: Expand vector FNEG
llvm-svn: 186913
2013-07-23 01:47:46 +00:00
Manman Ren 9974c88f76 Debug Info Finder: use processDeclare and processValue to list debug info
MDNodes used by DbgDeclareInst and DbgValueInst.

Another 16 testing cases failed and they are disabled with
-disable-debug-info-verifier.
A total of 34 cases are disabled with -disable-debug-info-verifier and will be
corrected.

llvm-svn: 186902
2013-07-23 00:22:51 +00:00
Mihai Popa 8a9da5b00c This adds range checking for "ldr Rn, [pc, #imm]" Thumb
instructions. With this patch:

1. ldr.n is recognized as mnemonic for the short encoding
2. ldr.w is recognized as menmonic for the long encoding
3. ldr will map to either short or long encodings depending on the size of the offset

llvm-svn: 186831
2013-07-22 15:49:36 +00:00
Justin Holewinski cd069e6dec [NVPTX] Use approximate FP ops when unsafe-fp-math is used, and append
.ftz to instructions if the nvptx-f32ftz attribute is set to "true"

llvm-svn: 186820
2013-07-22 12:18:04 +00:00
Lang Hames 24864fe150 Refactor AnalyzeBranch on ARM. The previous version did not always analyze
indirect branches correctly. Under some circumstances, this led to the deletion
of basic blocks that were the destination of indirect branches. In that case it
left indirect branches to nowhere in the code.

This patch replaces, and is more general than either of the previous fixes for
indirect-branch-analysis issues, r181161 and r186461.

For other branches (not indirect) this refactor should have *almost* identical
behavior to the previous version. There are some corner cases where this
refactor is able to analyze blocks that the previous version could not (e.g.
this necessitated the update to thumb2-ifcvt2.ll). 

<rdar://problem/14464830>

llvm-svn: 186735
2013-07-19 23:52:47 +00:00
Vincent Lejeune 8b8a7b5514 R600: Don't emit empty then clause and use alu_pop_after
llvm-svn: 186725
2013-07-19 21:45:15 +00:00
Rafael Espindola 9aadcc4c0e s/compiler_used/compiler.used/.
We were incorrectly using compiler_used instead of compiler.used. Unfortunately
the passes using the broken name had tests also using the broken name.

llvm-svn: 186705
2013-07-19 18:44:51 +00:00
Richard Sandiford dd170bd977 [SystemZ] Add tests for ALHSIK and ALGHSIK
The insn definitions themselves crept into r186689, sorry.
This should be the last of the distinct-ops instructions.

llvm-svn: 186690
2013-07-19 16:44:32 +00:00
Richard Sandiford fac8b10a84 [SystemZ] Add ALRK, AGLRK, SLRK and SGLRK
Follows the same lines as r186686, but much more limited, since we only
use ADD LOGICAL for multi-i64 additions.

llvm-svn: 186689
2013-07-19 16:37:00 +00:00
Richard Sandiford 7d6a453623 [SystemZ] Add AHIK and AGHIK
I did these as a separate patch because it uses a slightly different
form of RIE layout.

llvm-svn: 186687
2013-07-19 16:32:12 +00:00
Richard Sandiford c575df6dcc [SystemZ] Add ARK, AGRK, SRK and SGRK
The testsuite changes follow the same lines as for r186683.

llvm-svn: 186686
2013-07-19 16:26:39 +00:00
Richard Sandiford c57e586792 [SystemZ] Add NGRK, OGRK and XGRK
Like r186683, but for 64 bits.

llvm-svn: 186685
2013-07-19 16:24:22 +00:00
Richard Sandiford 0175b4a353 [SystemZ] Add NRK, ORK and XRK
The atomic tests assume the two-operand forms, so I've restricted them to z10.

Running and-01.ll, or-01.ll and xor-01.ll for z196 as well as z10 shows why
using convertToThreeAddress() is better than exposing the three-operand forms
first and then converting back to two operands where possible (which is what
I'd originally tried).  Using the three-operand form first stops us from
taking advantage of NG, OG and XG for spills.

llvm-svn: 186683
2013-07-19 16:21:55 +00:00
Richard Sandiford ff6c5a5609 [SystemZ] Use SLLK, SRLK and SRAK for codegen
This patch uses the instructions added in r186680 for codegen.

llvm-svn: 186681
2013-07-19 16:12:08 +00:00
Manman Ren 9152f30019 Try to appease the bots.
llvm-svn: 186653
2013-07-19 04:56:51 +00:00
Andrew Trick d97516ddc5 MI Sched: test case fix for previous checkin.
llvm-svn: 186635
2013-07-19 00:31:31 +00:00
Manman Ren 74c61b9c80 Debug Info: enable verifying by default and disable testing cases that fail.
1> Use DebugInfoFinder to find debug info MDNodes.
2> Add disable-debug-info-verifier to disable verifying debug info.
3> Disable verifying for testing cases that fail (will update the testing cases
   later on).
4> MDNodes generated by clang can have empty filename for TAG_inheritance and
   TAG_friend, so DIType::Verify is modified accordingly.

Note that DebugInfoFinder does not list all debug info MDNode.
For example, clang can generate:
metadata !{i32 786468}, which will fail to verify.
This MDNode is used by debug info but not included in DebugInfoFinder.
This MDNode is generated as a temporary node in DIBuilder::createFunction
  Value *TElts[] = { GetTagConstant(VMContext, DW_TAG_base_type) };
  MDNode::getTemporary(VMContext, TElts)

llvm-svn: 186634
2013-07-19 00:31:03 +00:00
Stephen Lin 6f36b45076 Update to more CodeGen tests to use CHECK-LABEL for labels corresponding to function definitions for more informative error messages. No functionality change.
All changes were made by the following bash script:

  find test/CodeGen -name "*.ll" | \
  while read NAME; do
    echo "$NAME"
    grep -q "^; *RUN: *llc.*debug" $NAME && continue
    grep -q "^; *RUN:.*llvm-objdump" $NAME && continue
    grep -q "^; *RUN: *opt.*" $NAME && continue
    TEMP=`mktemp -t temp`
    cp $NAME $TEMP
    sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \
    while read FUNC; do
      sed -i '' "s/;\([A-Za-z0-9_-]*\)\([A-Za-z0-9_-]*\):\( *\)$FUNC[:]* *\$/;\1\2-LABEL:\3$FUNC:/g" $TEMP
    done
    sed -i '' "s/;\(.*\)-LABEL-LABEL:/;\1-LABEL:/" $TEMP
    sed -i '' "s/;\(.*\)-NEXT-LABEL:/;\1-NEXT:/" $TEMP
    sed -i '' "s/;\(.*\)-NOT-LABEL:/;\1-NOT:/" $TEMP
    sed -i '' "s/;\(.*\)-DAG-LABEL:/;\1-DAG:/" $TEMP
    mv $TEMP $NAME
  done

This script catches a superset of the cases caught by the script associated with commit r186280. It initially found some false positives due to unusual constructs in a minority of tests; all such cases were disambiguated first in commit r186621.

llvm-svn: 186624
2013-07-18 22:47:09 +00:00
Stephen Lin 98cbca2e4d Disambiguate function names in some CodeGen tests. (Some tests were using function names that also were names of instructions and/or doing other unusual things that were making the test not amenable to otherwise scriptable pattern matching.) No functionality change.
llvm-svn: 186621
2013-07-18 22:29:15 +00:00
Tom Stellard 8374720aad R600/SI: Fix crash with VSELECT
https://bugs.freedesktop.org/show_bug.cgi?id=66175

llvm-svn: 186616
2013-07-18 21:43:53 +00:00
Tom Stellard adf732cfbc R600/SI: Add support for v2f32 loads
llvm-svn: 186615
2013-07-18 21:43:48 +00:00
Tom Stellard ed2f6149f3 R600/SI: Add support for v2f32 stores
llvm-svn: 186614
2013-07-18 21:43:42 +00:00
Tom Stellard 67ae4762ef R600: Expand VSELECT for all types
llvm-svn: 186613
2013-07-18 21:43:35 +00:00
Stephen Lin 3e1f15abc2 Update to CodeGen tests to use CHECK-LABEL for labels corresponding to function definitions for more informative error messages. No functionality change.
llvm-svn: 186594
2013-07-18 18:35:22 +00:00
Joey Gouly 49c7e22366 Forgot 'svn add' again, sorry!
Tests for r186574.

llvm-svn: 186580
2013-07-18 13:17:26 +00:00
Richard Sandiford 5109321042 [SystemZ] Use RNSBG
This should be the last of the R.SBG patches for now.

llvm-svn: 186573
2013-07-18 10:40:35 +00:00
Richard Sandiford 297f7d2724 [SystemZ] Generalize RxSBG SRA case
The original code only folded SRA into ROTATE ... SELECTED BITS
if there was no outer shift.  This patch splits out that check
and generalises it slightly.  The extra cases aren't really that
interesting, but this is paving the way for RNSBG support.

llvm-svn: 186571
2013-07-18 10:14:55 +00:00
Richard Sandiford 7878b852e6 [SystemZ] Use RXSBG
Extend the previous R.SBG patches to handle XORs.

llvm-svn: 186570
2013-07-18 10:06:15 +00:00
Craig Topper ad1fff9be7 Fix copy and paste bug from r186491 to make v2f64 use MOVAPD/MOVUPD as it should.
llvm-svn: 186566
2013-07-18 07:16:44 +00:00
Hal Finkel 1860763c76 PPC: Support dynamic allocas with large alignment
Support for dynamic stack alignments in the PPC backend has been unfinished, in
part because it depends on dynamic stack realignment (which I only just
recently implemented fully). Now we can also support dynamic allocas with
higher than the default target stack alignment (16 bytes).

In order to round-up the requested size to the maximum requested alignment, we
need an additional register to hold the rounded-up size. We're already using one
scavenged register to hold the previous stack-pointer value (which needs to be
stored with the signal-safe stdux update), and so when we have dynamic allocas
and a large alignment, we allocate two emergency spill slots for the scavenger.

llvm-svn: 186562
2013-07-18 04:28:21 +00:00
Hal Finkel f05d6c7843 PPC: Add base-pointer support to builtin setjmp/longjmp
First, this changes the base-pointer implementation to remove an unnecessary
complication (and one that is incompatible with how builtin SjLj is
implemented): instead of using r31 as the base pointer when it is not needed as
a frame pointer, now the base pointer will always be r30 when needed.

Second, we introduce another pseudo register, BP, which is used just like the FP
pseudo register to refer to the base register before we know for certain what
register it will be.

Third, we now save BP into the jmp_buf, and restore r30 from that slot in
longjmp.  If the function that called setjmp did not use a base pointer, then
r30 will be overwritten by the setjmp-calling-function's restore code. FP
restoration (which is restored into r31) works the same way.

llvm-svn: 186545
2013-07-17 23:50:51 +00:00
Joey Gouly 8c25b9d86a Add the tests that I forgot to 'svn add' with my previous commit (r186504).
llvm-svn: 186506
2013-07-17 14:03:49 +00:00
Richard Osborne 9ff96e6f9b [XCore] Ensure implicit operands aren't lost on the return instruction.
Patch by Robert Lytton.

llvm-svn: 186500
2013-07-17 10:58:37 +00:00
Craig Topper 4f55b0efd2 Make x86 fast-isel correctly choose between aligned and unaligned operations for vector stores. Fixes PR16640.
llvm-svn: 186491
2013-07-17 05:57:45 +00:00
Hal Finkel 40f76d5830 PPC: Add CTR-register clobber to builtin setjmp
Because the builtin longjmp implementation uses a CTR-based indirect jump, when
the control flow arrives at the builtin setjmp call, the CTR register has
necessarily been clobbered. Correspondingly, this adds CTR to the list of
implicit definitions of the builtin setjmp pseudo instruction.

We don't need to add CTR to the implicit definitions of builtin longjmp
because, even though it does clobber the CTR register, the control flow cannot
return to inside the loop unless there is also a builtin setjmp call.

llvm-svn: 186488
2013-07-17 05:35:44 +00:00
Hal Finkel a7c54e8cf4 PPC: Implement base pointer and stack realignment
This builds on some frame-lowering code that has existed since 2005 (r24224)
but was disabled in 2008 (r48188) because it needed base pointer support to
function correctly. This implementation follows the strategy suggested by Dale
Johannesen in r48188 where the following comment was added:

  This does not currently work, because the delta between old and new stack
  pointers is added to offsets that reference incoming parameters after the
  prolog is generated, and the code that does that doesn't handle a variable
  delta.  You don't want to do that anyway; a better approach is to reserve
  another register that retains to the incoming stack pointer, and reference
  parameters relative to that.

And now we do exactly that. If we don't need a frame pointer, then we use r31
as a base pointer. If we do need a frame pointer, then we use r30 as a base
pointer. The base pointer retains the value of the stack pointer before it was
decremented in the prologue. We then use the base pointer to resolve all
negative frame indicies. The basic scheme follows that for base pointers in the
X86 backend.

We use a base pointer when we need to dynamically realign the incoming stack
pointer. This currently applies only to static objects (dynamic allocas with
large alignments, and base-pointer support in SjLj lowering will come in future
commits).

llvm-svn: 186478
2013-07-17 00:45:52 +00:00
NAKAMURA Takumi 4caf019a1a llvm/test/CodeGen/X86/vec_setcc.ll: Add explicit -mtriple=x86_64-unknown-unknown to satisfy win32-targeted configuration.
llvm-svn: 186477
2013-07-17 00:42:37 +00:00
Benjamin Kramer d9508b704d Finally, force the target for this test. Should unbreak non-x86 buildbots.
llvm-svn: 186445
2013-07-16 19:22:07 +00:00
Benjamin Kramer 0edeabfe43 Label names also differ between platforms. Use a relaxed regex.
llvm-svn: 186442
2013-07-16 18:54:21 +00:00
Benjamin Kramer cadc611e93 Fix test not to fail when the target doesn't use leading underscores on symbols.
llvm-svn: 186439
2013-07-16 18:42:01 +00:00
Manman Ren 18ba5b2e0f Cleanup testing case by using a shorter name for types.
llvm-svn: 186436
2013-07-16 18:26:48 +00:00
Juergen Ributzka 3d527d80b8 [X86] Use min/max to optimze unsigend vector comparison on X86
Use PMIN/PMAX for UGE/ULE vector comparions to reduce the number of required
instructions. This trick also works for UGT/ULT, but there is no advantage in
doing so. It wouldn't reduce the number of instructions and it would actually
reduce performance.

Reviewer: Ben

radar:5972691

llvm-svn: 186432
2013-07-16 18:20:45 +00:00
Ulrich Weigand 1d4dbda5b9 [APFloat] PR16573: Avoid losing mantissa bits in ppc_fp128 to double truncation
When truncating to a format with fewer mantissa bits, APFloat::convert
will perform a right shift of the mantissa by the difference of the
precision of the two formats.  Usually, this will result in just the
mantissa bits needed for the target format.

One special situation is if the input number is denormal.  In this case,
the right shift may discard significant bits.  This is usually not a
problem, since truncating a denormal usually results in zero (underflow)
after normalization anyway, since the result format's exponent range is
usually smaller than the target format's.

However, there is one case where the latter property does not hold:
when truncating from ppc_fp128 to double.  In particular, truncating
a ppc_fp128 whose first double of the pair is denormal should result
in just that first double, not zero.  The current code however
performs an excessive right shift, resulting in lost result bits.
This is then caught in the APFloat::normalize call performed by
APFloat::convert and causes an assertion failure.

This patch checks for the scenario of truncating a denormal, and
attempts to (possibly partially) replace the initial mantissa
right shift by decrementing the exponent, if doing so will still
result in a valid *target format* exponent.


Index: test/CodeGen/PowerPC/pr16573.ll
===================================================================
--- test/CodeGen/PowerPC/pr16573.ll	(revision 0)
+++ test/CodeGen/PowerPC/pr16573.ll	(revision 0)
@@ -0,0 +1,11 @@
+; RUN: llc < %s | FileCheck %s
+
+target triple = "powerpc64-unknown-linux-gnu"
+
+define double @test() {
+  %1 = fptrunc ppc_fp128 0xM818F2887B9295809800000000032D000 to double
+  ret double %1
+}
+
+; CHECK: .quad -9111018957755033591
+
Index: lib/Support/APFloat.cpp
===================================================================
--- lib/Support/APFloat.cpp	(revision 185817)
+++ lib/Support/APFloat.cpp	(working copy)
@@ -1956,6 +1956,23 @@
     X86SpecialNan = true;
   }
 
+  // If this is a truncation of a denormal number, and the target semantics
+  // has larger exponent range than the source semantics (this can happen
+  // when truncating from PowerPC double-double to double format), the
+  // right shift could lose result mantissa bits.  Adjust exponent instead
+  // of performing excessive shift.
+  if (shift < 0 && isFiniteNonZero()) {
+    int exponentChange = significandMSB() + 1 - fromSemantics.precision;
+    if (exponent + exponentChange < toSemantics.minExponent)
+      exponentChange = toSemantics.minExponent - exponent;
+    if (exponentChange < shift)
+      exponentChange = shift;
+    if (exponentChange < 0) {
+      shift -= exponentChange;
+      exponent += exponentChange;
+    }
+  }
+
   // If this is a truncation, perform the shift before we narrow the storage.
   if (shift < 0 && (isFiniteNonZero() || category==fcNaN))
     lostFraction = shiftRight(significandParts(), oldPartCount, -shift);

llvm-svn: 186409
2013-07-16 13:03:25 +00:00
Richard Osborne ab29d19536 [XCore] Fix printing of inline asm operands.
Previously an asm operand with no operand modifier would give the error
"invalid operand in inline asm".

llvm-svn: 186407
2013-07-16 12:48:34 +00:00
Richard Sandiford 885140c951 [SystemZ] Use ROSBG and non-zero form of RISBG for OR nodes
llvm-svn: 186405
2013-07-16 11:55:57 +00:00
Richard Sandiford 82ec87dbdb [SystemZ] Use RISBG for (shift (and ...))
Another patch in the series to make more use of R.SBG.  This one extends
r186072 and r186073 to handle cases where the AND is inside the shift.

llvm-svn: 186399
2013-07-16 11:02:24 +00:00
Tim Northover a7ecd241d2 ARM: implement ldrex, strex and clrex intrinsics
Intrinsics already existed for the 64-bit variants, so these support operations
of size at most 32-bits.

llvm-svn: 186392
2013-07-16 09:46:55 +00:00
Renato Golin 8761069e22 ARM EABI divmod support
This patch enables calls to __aeabi_idivmod when in EABI mode,
by using the remainder value returned on registers (R1),
enabled by the ARM triple "none-eabi". Note that Darwin and
GNUEABI triples will continue lowering on GNU style, that is,
using the stack for the remainder.

Still need to add SREM/UREM support fix for 64-bit lowering.

llvm-svn: 186390
2013-07-16 09:32:17 +00:00
Manman Ren b827123cf7 PEI: Support for non-zero SPAdj at beginning of a basic block.
We can have a FrameSetup in one basic block and the matching FrameDestroy
in a different basic block when we have struct byval. In that case, SPAdj
is not zero at beginning of the basic block.

Modify PEI to correctly set SPAdj at beginning of each basic block using
DFS traversal. We used to assume SPAdj is 0 at beginning of each basic block.

PEI had an assert SPAdjCount || SPAdj == 0.
If we have a Destroy <n> followed by a Setup <m>, PEI will assert failure.
We can add an extra condition to make sure the pairs are matched:
  The pairs start with a FrameSetup.
But since we are doing a much better job in the verifier, this patch removes
the check in PEI.

PR16393

llvm-svn: 186364
2013-07-15 23:47:29 +00:00
Hal Finkel 8e8618ae5c Fix register subclass handling in PPCInstrInfo::insertSelect
PPCInstrInfo::insertSelect and PPCInstrInfo::canInsertSelect were computing the
common subclass of the true and false inputs, and then selecting either the
32-bit or the 64-bit isel variant based on the result of calling
PPC::GPRCRegClass.hasSubClassEq(RC) and PPC::G8RCRegClass.hasSubClassEq(RC)
(where RC is the common subclass). Unfortunately, this is not quite right: if
we have something like this:

  %vreg8<def> = SELECT_CC_I8 %vreg4<kill>, %vreg7<kill>, %vreg6<kill>, 76;
    G8RC_and_G8RC_NOX0:%vreg8 CRRC:%vreg4 G8RC_NOX0:%vreg7,%vreg6

then the common subclass of G8RC_and_G8RC_NOX0 and G8RC_NOX0 is G8RC_NOX0, and
G8RC_NOX0 is not a subclass of G8RC (because it also contains the ZERO8
pseudo-register). As a result, we also need to check the common subclass
against GPRC_NOR0 and G8RC_NOX0 explicitly.

This had not been a problem for clients of insertSelect that called
canInsertSelect first (because it had a compensating mistake), but insertSelect
is also used by the PPC pseudo-instruction expander, and this error was causing
a problem in that context.

This problem was found by csmith.

llvm-svn: 186343
2013-07-15 20:22:58 +00:00
Tom Stellard 31209cc8eb R600/SI: Add support for 64-bit loads
https://bugs.freedesktop.org/show_bug.cgi?id=65873

llvm-svn: 186339
2013-07-15 19:00:09 +00:00
Hal Finkel 2f5e8e3d95 Remove invalid assert in DAGTypeLegalizer::RemapValue
There is a comment at the top of DAGTypeLegalizer::PerformExpensiveChecks
which, in part, says:

  // Note that these invariants may not hold momentarily when processing a node:
  // the node being processed may be put in a map before being marked Processed.

Unfortunately, this assert would be valid only if the above-mentioned invariant
held unconditionally. This was causing llc to assert when, in fact,
everything was fine.

Thanks to Richard Sandiford for investigating this issue!

Fixes PR16562.

llvm-svn: 186338
2013-07-15 18:57:05 +00:00
Anton Korobeynikov 5714237ca5 Use conventional syntax for branches.
Patch by Job!

llvm-svn: 186291
2013-07-14 18:19:44 +00:00
Anton Korobeynikov fee796d734 Properly lower jump tables on MSP430. Patch by Job Noorman!
llvm-svn: 186283
2013-07-14 15:11:00 +00:00
Stephen Lin d24ab20e9b Mass update to CodeGen tests to use CHECK-LABEL for labels corresponding to function definitions for more informative error messages. No functionality change and all updated tests passed locally.
This update was done with the following bash script:

  find test/CodeGen -name "*.ll" | \
  while read NAME; do
    echo "$NAME"
    if ! grep -q "^; *RUN: *llc.*debug" $NAME; then
      TEMP=`mktemp -t temp`
      cp $NAME $TEMP
      sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \
      while read FUNC; do
        sed -i '' "s/;\(.*\)\([A-Za-z0-9_-]*\):\( *\)$FUNC: *\$/;\1\2-LABEL:\3$FUNC:/g" $TEMP
      done
      sed -i '' "s/;\(.*\)-LABEL-LABEL:/;\1-LABEL:/" $TEMP
      sed -i '' "s/;\(.*\)-NEXT-LABEL:/;\1-NEXT:/" $TEMP
      sed -i '' "s/;\(.*\)-NOT-LABEL:/;\1-NOT:/" $TEMP
      sed -i '' "s/;\(.*\)-DAG-LABEL:/;\1-DAG:/" $TEMP
      mv $TEMP $NAME
    fi
  done

llvm-svn: 186280
2013-07-14 06:24:09 +00:00
Stephen Lin 552c915e84 Convert Windows to Unix line endings, no functionality change.
llvm-svn: 186264
2013-07-13 22:08:55 +00:00
Stephen Lin f799e3f944 Convert CodeGen/*/*.ll tests to use the new CHECK-LABEL for easier debugging. No functionality change and all tests pass after conversion.
This was done with the following sed invocation to catch label lines demarking function boundaries:
    sed -i '' "s/^;\( *\)\([A-Z0-9_]*\):\( *\)test\([A-Za-z0-9_-]*\):\( *\)$/;\1\2-LABEL:\3test\4:\5/g" test/CodeGen/*/*.ll
which was written conservatively to avoid false positives rather than false negatives. I scanned through all the changes and everything looks correct.

llvm-svn: 186258
2013-07-13 20:38:47 +00:00
Benjamin Kramer e7d26f9b49 Convert a couple of grep tests to FileCheck.
llvm-svn: 186250
2013-07-13 17:30:25 +00:00
Akira Hatanaka f2826aacf9 [mips] Remove trailing whitespace.
llvm-svn: 186230
2013-07-12 23:47:38 +00:00
Akira Hatanaka 66bc419366 [mips] Implement MipsTargetMachine::getInstrItineraryData().
llvm-svn: 186227
2013-07-12 23:33:22 +00:00
JF Bastien 583db65031 Fix ARM paired GPR COPY lowering
ARM paired GPR COPY was being lowered to two MOVr without CC. This
patch puts the CC back.

My test is a reduction of the case where I encountered the issue,
64-bit atomics use paired GPRs.

The issue only occurs with selectionDAG, FastISel doesn't encounter it
so I didn't bother calling it.

llvm-svn: 186226
2013-07-12 23:33:03 +00:00
Benjamin Kramer e07b7a9f02 R600: Reapply testcase from r186178, the big endian issue should be fixed by r186196.
llvm-svn: 186209
2013-07-12 21:54:43 +00:00
Tom Stellard 6547fbee03 R600: Remove the fpconst64.ll test which was failing on non-x86 buildbots
I'm guessing the failure had something to do with the double precision
floating point constant used in the test.

llvm-svn: 186191
2013-07-12 19:29:54 +00:00
Tom Stellard ccae60acc3 R600/SI: Add support for f64 kernel arguments
Patch by: Niels Ole Salscheider

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 186182
2013-07-12 18:15:26 +00:00
Tom Stellard 4e1100ab75 R600/SI: Implement select and compares for SI
Patch by: Niels Ole Salscheider

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 186181
2013-07-12 18:15:19 +00:00
Tom Stellard 8ed7b45da3 R600/SI: Add fsqrt pattern for SI
Patch by: Niels Ole Salscheider

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 186180
2013-07-12 18:15:13 +00:00
Tom Stellard 2a6a610516 R600/SI: Add double precision fsub pattern for SI
Patch by: Niels Ole Salscheider

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 186179
2013-07-12 18:15:08 +00:00
Tom Stellard ab8a8c84d4 R600/SI: SI support for 64bit ConstantFP
Patch by: Niels Ole Salscheider

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 186178
2013-07-12 18:15:02 +00:00
Tom Stellard 7512c0803c R600/SI: Add initial double precision support for SI
Patch by: Niels Ole Salscheider

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 186177
2013-07-12 18:14:56 +00:00
Benjamin Kramer 068a2253e9 X86: Shrink certain forms of movsx.
In particular:
movsbw %al, %ax   --> cbtw
movswl %ax, %eax  --> cwtl
movslq %eax, %rax --> cltq

According to Intel's manual those have the same performance characteristics but
come with a smaller encoding.

llvm-svn: 186174
2013-07-12 18:06:44 +00:00
Stephen Lin fda967fdea X86: fold SSE2/AVX2 logical shift by immediate amount into zero vector when possible
Patch by Andrea Di Biagio

llvm-svn: 186165
2013-07-12 15:31:36 +00:00
Stephen Lin 764d8d3d6f Start using CHECK-LABEL in some tests.
llvm-svn: 186163
2013-07-12 14:54:12 +00:00
Richard Sandiford 17276d3567 [SystemZ] Add test missing from r186148
Sigh, twice in two days sorry.  One day I'll remember...

llvm-svn: 186150
2013-07-12 09:20:14 +00:00
Richard Sandiford 6d4bd28322 [SystemZ] Optimize sign-extends of vector setccs
Normal (sext (setcc ...)) sequences are optimised into
(select_cc ..., -1, 0) by DAGCombiner::visitSIGN_EXTEND.
However, this is deliberately not done for vectors, and after
vector type legalization we have (sext_inreg (setcc ...)) instead.

I wondered about trying to extend DAGCombiner to handle this case too,
but it seemed to be a loss on some other targets I tried, even those for
which SETCC isn't "legal" and SELECT_CC is.

llvm-svn: 186149
2013-07-12 09:17:10 +00:00
Richard Sandiford 3f0edc2903 [SystemZ] Improve spilling of LGDR and LDGR
If the source of these instructions is spilled we should load the destination.
If the destination is spilled we should store the source.

llvm-svn: 186147
2013-07-12 08:37:17 +00:00
Charles Davis e8f297ca94 Target/X86: Add explicit Win64 and System V/x86-64 calling conventions.
Summary:
This patch adds explicit calling convention types for the Win64 and
System V/x86-64 ABIs. This allows code to override the default, and use
the Win64 convention on a target that wants to use SysV (and
vice-versa). This is needed to implement the `ms_abi` and `sysv_abi` GNU
attributes.

Reviewers:

CC:

llvm-svn: 186144
2013-07-12 06:02:35 +00:00
Hal Finkel 4715081787 PPC: Add some missing V_SET0 patterns
We had patterns to match v4i32 immAllZerosV -> V_SET0, but not patterns for
v8i16 (which occurs in the test case) or v16i8. The same was true for
V_SETALLONES (so I added the associated patterns for those as well).

Another bug found by llvm-stress.

llvm-svn: 186108
2013-07-11 17:43:32 +00:00
Hal Finkel ff3ea8060c PPCDAGToDAGISel::isRunOfOnes should return false on zero
This fixes a bug (found by csmith) at -O0 where we attempt to create a RLWIMI
with an out-of-range operand. Most uses of the isRunOfOnes function are guarded
by a condition that the value is not zero. This was not true in two places, and
in both places a zero input would result in an out-of-rage MB value (= 32).

To fix this, isRunOfOnes returns false on a zero input (and I've remove one
now-redundant guard).

llvm-svn: 186101
2013-07-11 16:31:51 +00:00
Richard Sandiford 4209e7f6c6 [SystemZ] Add testcase missing from r186073
llvm-svn: 186074
2013-07-11 09:10:38 +00:00
Richard Sandiford ea9b6aa20b [SystemZ] Use zeroing form of RISBG for shift-and-AND sequences
Extend r186072 to handle shifts and ANDs.

llvm-svn: 186073
2013-07-11 09:10:09 +00:00
Richard Sandiford 84f54a3bc9 [SystemZ] Use zeroing form of RISBG for some AND sequences
RISBG can handle some ANDs for which no AND IMMEDIATE exists.
It also acts as a three-operand AND for some cases where an
AND IMMEDIATE could be used instead.

It might be worth adding a pass to replace RISBG with AND IMMEDIATE
in cases where the register operands end up being the same and where
AND IMMEDIATE is smaller.

llvm-svn: 186072
2013-07-11 08:59:12 +00:00
Hal Finkel 743b194084 RegScavenger should not exclude undef uses
When computing currently-live registers, the register scavenger excludes undef
uses. As a result, undef uses are ignored when computing the restore points of
registers spilled into the emergency slots. While the register scavenger
normally excludes from consideration, when scavenging, registers used by the
current instruction, we need to not exclude undef uses. Otherwise, we might end
up requiring more emergency spill slots than we have (in the case where the
undef use *is* the currently-spilled register).

Another bug found by llvm-stress.

llvm-svn: 186067
2013-07-11 05:55:57 +00:00
Hal Finkel 94383e542b Move r186044 tests into CodeGen/X86
I had thought that these tests could be target-neutral, but in practice this is
not the case (on some targets, like Hexagon and Darwin), they trigger an assert
(a different assert than the one that r186044 fixes).

llvm-svn: 186051
2013-07-11 01:55:55 +00:00
Michel Danzer 49812b5bbd R600/SI: Initial local memory support
Enough for the radeonsi driver to use it for calculating derivatives.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 186012
2013-07-10 16:37:07 +00:00
Michel Danzer 8d69617b27 R600/SI: Add intrinsic for retrieving the current thread ID
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 186010
2013-07-10 16:36:52 +00:00
Michel Danzer 83f87c4c2e R600/SI: Add intrinsics for texture sampling with user derivatives
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 186008
2013-07-10 16:36:36 +00:00
Jim Grosbach ebcad2e063 ARM: Fix incorrect pack pattern for thumb2
Propagate the fix from r185712 to Thumb2 codegen as well. Original
commit message applies here as well:

A "pkhtb x, x, y asr #num" uses the lower 16 bits of "y asr #num" and
packs them in the bottom half of "x". An arithmetic and logic shift are
only equivalent in this context if the shift amount is 16. We would be
shifting in ones into the bottom 16bits instead of zeros if "y" is
negative.

rdar://14338767

llvm-svn: 185982
2013-07-09 22:59:22 +00:00
Adrian Prantl 1014fcfd99 move test into the appropriate subdir.
llvm-svn: 185972
2013-07-09 21:44:11 +00:00
Adrian Prantl 418d1d1ea9 Reapply an improved version of r180816/180817.
Change the informal convention of DBG_VALUE machine instructions so that
we can express a register-indirect address with an offset of 0.
The old convention was that a DBG_VALUE is a register-indirect value if
the offset (operand 1) is nonzero. The new convention is that a DBG_VALUE
is register-indirect if the first operand is a register and the second
operand is an immediate. For plain register values the combination reg,
reg is used. MachineInstrBuilder::BuildMI knows how to build the new
DBG_VALUES.

rdar://problem/13658587

llvm-svn: 185966
2013-07-09 20:28:37 +00:00
Stephen Lin 4ee5e873d5 Appease buildbots after r185956: just set -mcpu explicitly, as it should have been from the beginning.
llvm-svn: 185962
2013-07-09 19:27:10 +00:00
Stephen Lin 228765f61f Appease Atom buildbot after r185956 (explicitly turn on AVX)
llvm-svn: 185961
2013-07-09 18:55:52 +00:00
Hal Finkel e4dd5c29f0 WidenVecRes_BUILD_VECTOR must use the first operand's type
Because integer BUILD_VECTOR operands may have a larger type than the result's
vector element type, and all operands must have the same type, when widening a
BUILD_VECTOR node by adding UNDEFs, we cannot use the vector element type, but
rather must use the type of the existing operands.

Another bug found by llvm-stress.

llvm-svn: 185960
2013-07-09 18:55:10 +00:00
Bill Schmidt 4122169308 [PowerPC] Better fix for PR16556.
A more complete example of the bug in PR16556 was recently provided,
showing that the previous fix was not sufficient.  The previous fix is
reverted herein.

The real problem is that ReplaceNodeResults() uses LowerFP_TO_INT as
custom lowering for FP_TO_SINT during type legalization, without
checking whether the input type is handled by that routine.
LowerFP_TO_INT requires the input to be f32 or f64, so we fail when
the input is ppcf128.

I'm leaving the test case from the initial fix (r185821) in place, and
adding the new test as another crash-only check.

llvm-svn: 185959
2013-07-09 18:50:20 +00:00
Stephen Lin 73fa842e2e Attempt to appease buildbot after r185956 by explicitly turning setting -fma,-fma4 attrs (I'm assuming they're set because the bot is running on machine that has one or the other.)
llvm-svn: 185958
2013-07-09 18:41:43 +00:00
Stephen Lin 73de7bf5de AArch64/PowerPC/SystemZ/X86: This patch fixes the interface, usage, and all
in-tree implementations of TargetLoweringBase::isFMAFasterThanMulAndAdd in
order to resolve the following issues with fmuladd (i.e. optional FMA)
intrinsics:

1. On X86(-64) targets, ISD::FMA nodes are formed when lowering fmuladd
intrinsics even if the subtarget does not support FMA instructions, leading
to laughably bad code generation in some situations.

2. On AArch64 targets, ISD::FMA nodes are formed for operations on fp128,
resulting in a call to a software fp128 FMA implementation.

3. On PowerPC targets, FMAs are not generated from fmuladd intrinsics on types
like v2f32, v8f32, v4f64, etc., even though they promote, split, scalarize,
etc. to types that support hardware FMAs.

The function has also been slightly renamed for consistency and to force a
merge/build conflict for any out-of-tree target implementing it. To resolve,
see comments and fixed in-tree examples.

llvm-svn: 185956
2013-07-09 18:16:56 +00:00
Hal Finkel ff666bd962 Don't crash in SE dealing with ashr x, -1
ScalarEvolution::getSignedRange uses ComputeNumSignBits from ValueTracking on
ashr instructions. ComputeNumSignBits can return zero, but this case was not
handled correctly by the code in getSignedRange which was calling:
  APInt::getSignedMinValue(BitWidth).ashr(NS - 1)
with NS = 0, resulting in an assertion failure in APInt::ashr.

Now, we just return the conservative result (as with NS == 1).

Another bug found by llvm-stress.

llvm-svn: 185955
2013-07-09 18:16:16 +00:00
Hal Finkel 6c29bd9088 DAGCombine tryFoldToZero cannot create illegal types after type legalization
When folding sub x, x (and other similar constructs), where x is a vector, the
result is a vector of zeros. After type legalization, make sure that the input
zero elements have a legal type. This type may be larger than the result's
vector element type.

This was another bug found by llvm-stress.

llvm-svn: 185949
2013-07-09 17:02:45 +00:00
Ulrich Weigand 52cf8e4488 [PowerPC] Revert r185476 and fix up TLS variant kinds
In the commit message to r185476 I wrote:

>The PowerPC-specific modifiers VK_PPC_TLSGD and VK_PPC_TLSLD
>correspond exactly to the generic modifiers VK_TLSGD and VK_TLSLD.
>This causes some confusion with the asm parser, since VK_PPC_TLSGD
>is output as @tlsgd, which is then read back in as VK_TLSGD.
>
>To avoid this confusion, this patch removes the PowerPC-specific
>modifiers and uses the generic modifiers throughout.  (The only
>drawback is that the generic modifiers are printed in upper case
>while the usual convention on PowerPC is to use lower-case modifiers.
>But this is just a cosmetic issue.)

This was unfortunately incorrect, there is is fact another,
serious drawback to using the default VK_TLSLD/VK_TLSGD
variant kinds: using these causes ELFObjectWriter::RelocNeedsGOT
to return true, which in turn causes the ELFObjectWriter to emit
an undefined reference to _GLOBAL_OFFSET_TABLE_.

This is a problem on powerpc64, because it uses the TOC instead
of the GOT, and the linker does not provide _GLOBAL_OFFSET_TABLE_,
so the symbol remains undefined.  This means shared libraries
using TLS built with the integrated assembler are currently
broken.

While the whole RelocNeedsGOT / _GLOBAL_OFFSET_TABLE_ situation
probably ought to be properly fixed at some point, for now I'm
simply reverting the r185476 commit.  Now this in turn exposes
the breakage of handling @tlsgd/@tlsld in the asm parser that
this check-in was originally intended to fix.

To avoid this regression, I'm also adding a different fix for
this problem: while common code now parses @tlsgd as VK_TLSGD,
a special hack in the asm parser translates this code to the
platform-specific VK_PPC_TLSGD that the back-end now expects.
While this is not really pretty, it's self-contained and
shouldn't hurt anything else for now.  One the underlying
problem is fixed, this hack can be reverted again.

llvm-svn: 185945
2013-07-09 16:41:09 +00:00
Vincent Lejeune ce499744b3 R600: Do not predicated basic block with multiple alu clause
Test is not included as it is several 1000 lines long.
To test this functionnality, a test case must generate at least 2 ALU clauses,
where an ALU clause is ~110 instructions long.

NOTE: This is a candidate for the stable branch.
llvm-svn: 185943
2013-07-09 15:03:33 +00:00
Vincent Lejeune b8aac8d720 R600: Fix a rare bug where swizzle optimization returns wrong values
llvm-svn: 185942
2013-07-09 15:03:25 +00:00
Vincent Lejeune a4d8d2ef2b R600: Fix wrong export reswizzling
llvm-svn: 185941
2013-07-09 15:03:19 +00:00
Vincent Lejeune b55940cc7d R600: Use DAG lowering pass to handle fcos/fsin
NOTE: This is a candidate for the stable branch.
llvm-svn: 185940
2013-07-09 15:03:11 +00:00
Alexander Potapenko 8d2d79d05f Revert r185872 - "Stop emitting weak symbols into the "coal" sections"
This patch broke `make check-asan` on Mac, causing ld warnings like the following one:

ld: warning: direct access in __GLOBAL__I_a to global weak symbol
___asan_mapping_scale means the weak symbol cannot be overridden at
runtime. This was likely caused by different translation units being
compiled with different visibility settings.

The resulting test binaries crashed with incorrect ASan warnings.

llvm-svn: 185923
2013-07-09 10:00:16 +00:00
Richard Sandiford 9784649157 [SystemZ] Use MVC for simple load/store pairs
Look for patterns of the form (store (load ...), ...) in which the two
locations are known not to partially overlap.  (Identical locations are OK.)
These sequences are better implemented by MVC unless either the load or
the store could use RELATIVE LONG instructions.

The testcase showed that we weren't using LHRL and LGHRL for extload16,
only sextloadi16.  The patch fixes that too.

llvm-svn: 185919
2013-07-09 09:46:39 +00:00
Richard Sandiford 47660c148c [SystemZ] Use "STC;MVC" for memset
Use "STC;MVC" for memsets that are too big for two STCs or MV...Is yet
small enough for a single MVC.  As with memcpy, I'm leaving longer cases
till later.

The number of tests might seem excessive, but f33 & f34 from memset-04.ll
failed the first cut because I'd not added the "?:" on the calculation
of Size1.

llvm-svn: 185918
2013-07-09 09:32:42 +00:00
Hal Finkel dbbf09b28e PPC: Allocate RS spill slot for unaligned i64 load/store
This fixes another bug found by llvm-stress!

If we happen to be doing an i64 load or store into a stack slot that has less
than a 4-byte alignment, then the frame-index elimination may need to use an
indexed load or store instruction (because the offset may not be a multiple of
4, a requirement of the STD/LD instructions). The extra register needed to hold
the offset comes from the register scavenger, and it is possible that the
scavenger will need to use an emergency spill slot. As a result, we need to
make sure that a spill slot is allocated when doing an i64 load/store into a
less-than-4-byte-aligned stack slot.

Because test cases for things like this tend to be fairly fragile, I've
concatenated a few small bugpoint-reduced test cases together to form the
regression test.

llvm-svn: 185907
2013-07-09 06:34:51 +00:00
Bill Wendling 0176708e85 Stop emitting weak symbols into the "coal" sections.
The Mach-O linker has been able to support the weak-def bit on any symbol for
quite a while now. The compiler however continued to place these symbols into a
"coal" section, which required the linker to map them back to the base section
name.

Replace the sections like this:

  __TEXT/__textcoal_nt   instead use  __TEXT/__text
  __TEXT/__const_coal    instead use  __TEXT/__const
  __DATA/__datacoal_nt   instead use  __DATA/__data

<rdar://problem/14265330>

llvm-svn: 185872
2013-07-08 21:34:52 +00:00
Ulrich Weigand 266db7fe04 [PowerPC] Always use "assembler dialect" 1
A setting in MCAsmInfo defines the "assembler dialect" to use.  This is used
by common code to choose between alternatives in a multi-alternative GNU
inline asm statement like the following:

  __asm__ ("{sfe|subfe} %0,%1,%2" : "=r" (out) : "r" (in1), "r" (in2));

The meaning of these dialects is platform specific, and GCC defines those
for PowerPC to use dialect 0 for old-style (POWER) mnemonics and 1 for
new-style (PowerPC) mnemonics, like in the example above.

To be compatible with inline asm used with GCC, LLVM ought to do the same.
Specifically, this means we should always use assembler dialect 1 since
old-style mnemonics really aren't supported on any current platform.

However, the current LLVM back-end uses:
  AssemblerDialect = 1;           // New-Style mnemonics.
in PPCMCAsmInfoDarwin, and
  AssemblerDialect = 0;           // Old-Style mnemonics.
in PPCLinuxMCAsmInfo.

The Linux setting really isn't correct, we should be using new-style
mnemonics everywhere.  This is changed by this commit.

Unfortunately, the setting of this variable is overloaded in the back-end
to decide whether or not we are on a Darwin target.  This is done in
PPCInstPrinter (the "SyntaxVariant" is initialized from the MCAsmInfo
AssemblerDialect setting), and also in PPCMCExpr.  Setting AssemblerDialect
to 1 for both Darwin and Linux no longer allows us to make this distinction.

Instead, this patch uses the MCSubtargetInfo passed to createPPCMCInstPrinter
to distinguish Darwin targets, and ignores the SyntaxVariant parameter.
As to PPCMCExpr, this patch adds an explicit isDarwin argument that needs
to be passed in by the caller when creating a target MCExpr.  (To do so
this patch implicitly also reverts commit 184441.)

llvm-svn: 185858
2013-07-08 20:20:51 +00:00
Hal Finkel 21ada79757 PPC: Mark vector CC action for SETO and SETONE as Expand
Another bug found by llvm-stress! This fixes hitting
  llvm_unreachable("Invalid integer vector compare condition");
at the end of getVCmpInst in PPCISelDAGToDAG.

llvm-svn: 185855
2013-07-08 20:00:03 +00:00
Joey Gouly 392cdad2b1 Add a comment to this change, requested by Eric Christopher.
llvm-svn: 185853
2013-07-08 19:52:51 +00:00
Jim Grosbach 24e102a947 ARM: Improve codegen for generic vselect.
Fall back to by-element insert rather than building it up on the stack.

rdar://14351991

llvm-svn: 185846
2013-07-08 18:18:52 +00:00
Hal Finkel e39302258e PPC: Mark vector FREM as Expand by default
Another bug found by llvm-stress! This fixes crashing with:
  LLVM ERROR: Cannot select: v4f32 = frem ...

llvm-svn: 185840
2013-07-08 17:30:25 +00:00
Bill Schmidt 2db29ef467 [PowerPC] Fix PR16556 (handle undef ppcf128 in LowerFP_TO_INT).
PPCTargetLowering::LowerFP_TO_INT() expects its source operand to be
either an f32 or f64, but this is not checked.  A long double
(ppcf128) operand will normally be custom-lowered to a conversion to
f64 in this context.  However, this isn't the case for an UNDEF node.

This patch recognizes a ppcf128 as a legal source operand for
FP_TO_INT only if it's an undef, in which case it creates an undef of
the target type.

At some point we might want to do a wholesale custom lowering of
ISD::UNDEF when the type is ppcf128, but it's not really clear that's
a great idea, and probably more work than it's worth for a situation
that only arises in the case of a programming error.  At this point I
think simple is best.

The test case comes from PR16556, and is a crash-test only.

llvm-svn: 185821
2013-07-08 14:22:45 +00:00
Nico Rieck 51969be724 Reuse %rax after calling __chkstk on win64
Reapply this as I reverted the wrong commit.

llvm-svn: 185807
2013-07-08 11:20:11 +00:00
Nico Rieck 4801303ce1 Revert "Proper va_arg/va_copy lowering on win64"
This reverts commit 2b52880592a525cfe04d8f9008a35da8c2ea94c3.

Needs review.

llvm-svn: 185806
2013-07-08 11:19:44 +00:00
Richard Sandiford d131ff8cf8 [SystemZ] Use MVC for memcpy
Use MVC for memcpy in cases where a single MVC is enough.  Using MVC is
a win for longer copies too, but I'll leave that for later.

llvm-svn: 185802
2013-07-08 09:35:23 +00:00
Hal Finkel 8cb9a0e1d3 Fix PromoteIntRes_BUILD_VECTOR crash with i1 vectors
This fixes a bug (found by llvm-stress) in
DAGTypeLegalizer::PromoteIntRes_BUILD_VECTOR where it assumed that the result
type would always be larger than the original operands. This is not always
true, however, with boolean vectors. For example, promoting a node of type v8i1
(where the operands will be of type i32, the type to which i1 is promoted) will
yield a node with a result vector element type of i16 (and operands of type
i32). As a result, we cannot blindly assume that we can ANY_EXTEND the operands
to the result type.

llvm-svn: 185794
2013-07-08 06:16:58 +00:00
Nico Rieck 43b51056d6 Revert "Reuse %rax after calling __chkstk on win64"
This reverts commit 01f8d579f7672872324208ac5bc4ac311e81b22e.

llvm-svn: 185781
2013-07-08 01:30:57 +00:00
Nico Rieck 7adf6111a8 Reuse %rax after calling __chkstk on win64
llvm-svn: 185778
2013-07-07 16:48:39 +00:00
Nico Rieck 99ef2890c0 Proper va_arg/va_copy lowering on win64
llvm-svn: 185763
2013-07-06 18:08:19 +00:00
Benjamin Kramer c7332b2796 DAGCombiner: Don't drop extension behavior when shrinking a load when unsafe.
ReduceLoadWidth unconditionally drops extensions from loads. Limit it to the
case when all of the bits the extension would otherwise produce are dropped by
the shrink. It would be possible to shrink the load in more cases by merging
the extensions, but this isn't trivial and a very rare case. I left a TODO for
that case.

Fixes PR16551.

llvm-svn: 185755
2013-07-06 14:05:09 +00:00
Tim Northover dab4db5372 Stop putting operations after a tail call.
This prevents the emission of DAG-generated vreg definitions after a
tail call be dropping them entirely (on the grounds that nothing could
use them anyway, and they interfere with O0 CodeGen).

llvm-svn: 185754
2013-07-06 12:58:45 +00:00
Arnold Schwaighofer 97c1343c45 ARM: Add a pack pattern for matching arithmetic shift right
llvm-svn: 185714
2013-07-05 18:57:49 +00:00
Arnold Schwaighofer 50b76b5226 ARM: Fix incorrect pack pattern
A "pkhtb x, x, y asr #num" uses the lower 16 bits of "y asr #num" and packs them
in the bottom half of "x". An arithmetic and logic shift are only equivalent in
this context if the shift amount is 16. We would be shifting in ones into the
bottom 16bits instead of zeros if "y" is negative.

radar://14338767

llvm-svn: 185712
2013-07-05 18:28:39 +00:00
Richard Sandiford c40f27b52d [SystemZ] Remove no-op MVCs
The stack coloring pass has code to delete stores and loads that become
trivially dead after coloring.  Extend it to cope with single instructions
that copy from one frame index to another.

The testcase happens to show an example of this kicking in at the moment.
It did occur in Real Code too though.

llvm-svn: 185705
2013-07-05 14:38:48 +00:00
Richard Sandiford b5d9bd6f59 Fix double renaming bug in stack coloring pass
The stack coloring pass renumbered frame indexes with a loop of the form:

  for each frame index FI
    for each instruction I that uses FI
      for each use of FI in I
        rename FI to FI'

This caused problems if an instruction used two frame indexes F0 and F1
and if F0 was renamed to F1 and F1 to F2.  The first time we visited the
instruction we changed F0 to F1, then we changed both F1s to F2.

In other words, the problem was that SSRefs recorded which instructions
used an FI, but not which MachineOperands and MachineMemOperands within
that instruction used it.

This is easily fixed for MachineOperands by walking the instructions
once and processing each operand in turn.  There's already a loop to
do that for dead store elimination, so it seemed more efficient to
fuse the two at the block level.

MachineMemOperands are more tricky because they can be shared between
instructions.  The patch handles them by making SSRefs an array of
MachineMemOperands rather than an array of MachineInstrs.  We might end
up processing the same MachineMemOperand twice, but that's OK because
we always know from the SSRefs index what the original frame index was.

llvm-svn: 185703
2013-07-05 14:24:47 +00:00
Richard Sandiford 8976ea72ab [SystemZ] Enable the use of MVC for frame-to-frame spills
...now that the problem that prompted the restriction has been fixed.

The original spill-02.py was a compromise because at the time I couldn't
find an example that actually failed without the two scavenging slots.
The version included here did.

llvm-svn: 185701
2013-07-05 14:02:01 +00:00
Richard Sandiford 23943229f6 [SystemZ] Allocate a second register scavenging slot
This is another prerequisite for frame-to-frame MVC copies.
I'll commit the patch that makes use of the slot separately.

The downside of trying to test many corner cases with each of the
available addressing modes is that a fair few tests need to account
for the new frame layout.  I do still think it's useful to have all
these tests though, since it's something that wouldn't get much coverage
otherwise.

llvm-svn: 185698
2013-07-05 13:11:52 +00:00
Joey Gouly 606f3fbc2b PR16490: fix a crash in ARMDAGToDAGISel::SelectInlineAsm.
In the SelectionDAG immediate operands to inline asm are constructed as
two separate operands. The first is a constant of value InlineAsm::Kind_Imm
and the second is a constant with the value of the immediate.

In ARMDAGToDAGISel::SelectInlineAsm, if we reach an operand of Kind_Imm we
should skip over the next operand too.

llvm-svn: 185688
2013-07-05 10:19:40 +00:00
Quentin Colombet 04b3a0fdb2 [ARM] Improve the instruction selection of vector loads.
In the ARM back-end, build_vector nodes are lowered to a target specific
build_vector that uses floating point type. 
This works well, unless the inserted bitcasts survive until instruction
selection. In that case, they incur moves between integer unit and floating
point unit that may result in inefficient code.

In other words, this conversion may introduce artificial dependencies when the
code leading to the build vector cannot be completed with a floating point type.

In particular, this happens when loads are not aligned.

Before this patch, in that case, the compiler generates general purpose loads
and creates the floating point vector from them, instead of directly using the
vector unit.

The patch uses a vector friendly sequence of code when the inserted bitcasts to
floating point survived DAGCombine.

This is done by a target specific DAGCombine that changes the target specific
build_vector into a sequence of insert_vector_elt that get rid of the bitcasts.

<rdar://problem/14170854>

llvm-svn: 185587
2013-07-03 21:42:57 +00:00
Ulrich Weigand 49f487e6cd [PowerPC] Use mtocrf when available
Just as with mfocrf, it is also preferable to use mtocrf instead of
mtcrf when only a single CR register is to be written.

Current code however always emits mtcrf.  This probably does not matter
when using an external assembler, since the GNU assembler will in fact
automatically replace mtcrf with mtocrf when possible.  It does create
inefficient code with the integrated assembler, however.

To fix this, this patch adds MTOCRF/MTOCRF8 instruction patterns and
uses those instead of MTCRF/MTCRF8 everything.  Just as done in the
MFOCRF patch committed as 185556, these patterns will be converted
back to MTCRF if MTOCRF is not available on the machine.

As a side effect, this allows to modify the MTCRF pattern to accept
the full range of mask operands for the benefit of the asm parser.

llvm-svn: 185561
2013-07-03 17:59:07 +00:00
Rafael Espindola b0fccb225c Prefix failing commands with not to make clear they are expected to fail.
llvm-svn: 185554
2013-07-03 16:41:29 +00:00
Rafael Espindola 8490bbd16b Remove another old test.
It was only passing because 'grep andpd' was not finding any andpd, but
we don't fail if part of a pipe fails.

llvm-svn: 185552
2013-07-03 16:35:26 +00:00
Rafael Espindola 447dbc38b6 Remove test for the old EH system. It doesn't parse anymore.
llvm-svn: 185551
2013-07-03 16:30:01 +00:00
Richard Sandiford ed1fab6b5b [SystemZ] Fold more spills
Add a mapping from register-based <INSN>R instructions to the corresponding
memory-based <INSN>.  Use it to cut down on the number of spill loads.

Some instructions extend their operands from smaller fields, so this
required a new TSFlags field to say how big the unextended operand is.

This optimisation doesn't trigger for C(G)R and CL(G)R because in practice
we always combine those instructions with a branch.  Adding a test for every
other case probably seems excessive, but it did catch a missed optimisation
for DSGF (fixed in r185435).

llvm-svn: 185529
2013-07-03 10:10:02 +00:00
Tim Northover 36b2417f18 ARM: relax the atomic release barrier to "dmb ishst" on Swift
Swift cores implement store barriers that are stronger than the ARM
specification but weaker than general barriers. They are, in fact, just about
enough to provide the ordering needed for atomic operations with release
semantics.

This patch makes use of that quirk.

llvm-svn: 185527
2013-07-03 09:20:36 +00:00
Richard Osborne a1cff61dec [XCore] Add ISel pattern for LDWCP
Patch by Robert Lytton.

llvm-svn: 185518
2013-07-03 07:48:50 +00:00
Ulrich Weigand 4050995650 [PowerPC] Remove VK_PPC_TLSGD and VK_PPC_TLSLD
The PowerPC-specific modifiers VK_PPC_TLSGD and VK_PPC_TLSLD
correspond exactly to the generic modifiers VK_TLSGD and VK_TLSLD.
This causes some confusion with the asm parser, since VK_PPC_TLSGD
is output as @tlsgd, which is then read back in as VK_TLSGD.

To avoid this confusion, this patch removes the PowerPC-specific
modifiers and uses the generic modifiers throughout.  (The only
drawback is that the generic modifiers are printed in upper case
while the usual convention on PowerPC is to use lower-case modifiers.
But this is just a cosmetic issue.)

llvm-svn: 185476
2013-07-02 21:29:06 +00:00
Richard Sandiford e6e7885591 [SystemZ] Use DSGFR over DSGR in more cases
Fixes some cases where we were using full 64-bit division for (sdiv i32, i32)
and (sdiv i64, i32).

The "32" in "SDIVREM32" just refers to the second operand.  The first operand
of all *DIVREM*s is a GR128.

llvm-svn: 185435
2013-07-02 15:40:22 +00:00
Richard Sandiford f6bae1e434 [SystemZ] Use MVC to spill loads and stores
Try to use MVC when spilling the destination of a simple load or the source
of a simple store.  As explained in the comment, this doesn't yet handle
the case where the load or store location is also a frame index, since
that could lead to two simultaneous scavenger spills, something the
backend can't handle yet.  spill-02.py tests that this restriction kicks in,
but unfortunately I've not yet found a case that would fail without it.
The volatile trick I used for other scavenger tests doesn't work here
because we can't use MVC for volatile accesses anyway.

I'm planning on relaxing the restriction later, hopefully with a test
that does trigger the problem...

Tests @f8 and @f9 also showed that L(G)RL and ST(G)RL were wrongly
classified as SimpleBDX{Load,Store}.  It wouldn't be easy to test for
that bug separately, which is why I didn't split out the fix as a
separate patch.

llvm-svn: 185434
2013-07-02 15:28:56 +00:00
Richard Osborne e4cc98686a [XCore] Fix instruction selection for zext, mkmsk instructions.
r182680 replaced CountLeadingZeros_32 with a template function
countLeadingZeros that relies on using the correct argument type to give
the right result. The type passed in the XCore backend after this
revision was incorrect in a couple of places.

Patch by Robert Lytton.

llvm-svn: 185430
2013-07-02 14:46:34 +00:00
Tim Northover 6823900e55 DAGCombiner: fix use-counting issue when forming zextload
DAGCombiner was counting all uses of a load node  when considering whether it's
worth combining into a zextload. Really, it wants to ignore the chain and just
count real uses.

rdar://problem/13896307

llvm-svn: 185419
2013-07-02 09:58:53 +00:00
Hal Finkel 52727c6b82 Cleanup PPC Altivec registers in CSR lists and improve VRSAVE handling
There are a couple of (small) related changes here:

1. The printed name of the VRSAVE register has been changed from VRsave to
vrsave in order to match the name accepted by GNU binutils.

2. Support for parsing vrsave has been added to the asm parser (it seems that
there was no test case specifically covering this code, so I've added one).

3. The list of Altivec registers, which was common to all calling conventions,
has been separated out. This allows us to define the base CSR lists, and then
lists for each ABI with Altivec included. This allows SjLj, for example, to
work correctly on non-Altivec targets without using unnatural definitions of
the NoRegs CSR list.

4. VRSAVE is now always reserved on non-Darwin targets and all Altivec
registers are reserved when Altivec is disabled.

With these changes, it is now possible to compile a function containing
__builtin_unwind_init() on Linux/PPC64 with debugging information. This did not
work previously because GNU binutils assumes that all .cfi_offset offsets will
be 8-byte aligned on PPC64 (and errors out if you provide a non-8-byte-aligned
offset). This is not true for the vrsave register, however, because this
register is used only on Darwin, GCC does not bother printing a .cfi_offset
entry for it (even though there is a slot in the stack frame for it as
specified by the ABI). This change allows us to do the same: we will also not
print .cfi_offset directives for vrsave.

llvm-svn: 185409
2013-07-02 03:39:34 +00:00
Bill Schmidt 48fc20a034 Index: test/CodeGen/PowerPC/reloc-align.ll
===================================================================
--- test/CodeGen/PowerPC/reloc-align.ll	(revision 0)
+++ test/CodeGen/PowerPC/reloc-align.ll	(revision 0)
@@ -0,0 +1,34 @@
+; RUN: llc -mcpu=pwr7 -O1 < %s | FileCheck %s
+
+; This test verifies that the peephole optimization of address accesses
+; does not produce a load or store with a relocation that can't be
+; satisfied for a given instruction encoding.  Reduced from a test supplied
+; by Hal Finkel.
+
+target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"
+target triple = "powerpc64-unknown-linux-gnu"
+
+%struct.S1 = type { [8 x i8] }
+
+@main.l_1554 = internal global { i8, i8, i8, i8, i8, i8, i8, i8 } { i8 -1, i8 -6, i8 57, i8 62, i8 -48, i8 0, i8 58, i8 80 }, align 1
+
+; Function Attrs: nounwind readonly
+define signext i32 @main() #0 {
+entry:
+  %call = tail call fastcc signext i32 @func_90(%struct.S1* byval bitcast ({ i8, i8, i8, i8, i8, i8, i8, i8 }* @main.l_1554 to %struct.S1*))
+; CHECK-NOT: ld {{[0-9]+}}, main.l_1554@toc@l
+  ret i32 %call
+}
+
+; Function Attrs: nounwind readonly
+define internal fastcc signext i32 @func_90(%struct.S1* byval nocapture %p_91) #0 {
+entry:
+  %0 = bitcast %struct.S1* %p_91 to i64*
+  %bf.load = load i64* %0, align 1
+  %bf.shl = shl i64 %bf.load, 26
+  %bf.ashr = ashr i64 %bf.shl, 54
+  %bf.cast = trunc i64 %bf.ashr to i32
+  ret i32 %bf.cast
+}
+
+attributes #0 = { nounwind readonly "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"="true" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false" "use-soft-float"="false" }
Index: lib/Target/PowerPC/PPCAsmPrinter.cpp
===================================================================
--- lib/Target/PowerPC/PPCAsmPrinter.cpp	(revision 185327)
+++ lib/Target/PowerPC/PPCAsmPrinter.cpp	(working copy)
@@ -679,7 +679,26 @@ void PPCAsmPrinter::EmitInstruction(const MachineI
       OutStreamer.EmitRawText(StringRef("\tmsync"));
       return;
     }
+    break;
+  case PPC::LD:
+  case PPC::STD:
+  case PPC::LWA: {
+    // Verify alignment is legal, so we don't create relocations
+    // that can't be supported.
+    // FIXME:  This test is currently disabled for Darwin.  The test
+    // suite shows a handful of test cases that fail this check for
+    // Darwin.  Those need to be investigated before this sanity test
+    // can be enabled for those subtargets.
+    if (!Subtarget.isDarwin()) {
+      unsigned OpNum = (MI->getOpcode() == PPC::STD) ? 2 : 1;
+      const MachineOperand &MO = MI->getOperand(OpNum);
+      if (MO.isGlobal() && MO.getGlobal()->getAlignment() < 4)
+        llvm_unreachable("Global must be word-aligned for LD, STD, LWA!");
+    }
+    // Now process the instruction normally.
+    break;
   }
+  }
 
   LowerPPCMachineInstrToMCInst(MI, TmpInst, *this);
   OutStreamer.EmitInstruction(TmpInst);
Index: lib/Target/PowerPC/PPCISelDAGToDAG.cpp
===================================================================
--- lib/Target/PowerPC/PPCISelDAGToDAG.cpp	(revision 185327)
+++ lib/Target/PowerPC/PPCISelDAGToDAG.cpp	(working copy)
@@ -1530,6 +1530,14 @@ void PPCDAGToDAGISel::PostprocessISelDAG() {
       if (GlobalAddressSDNode *GA = dyn_cast<GlobalAddressSDNode>(ImmOpnd)) {
         SDLoc dl(GA);
         const GlobalValue *GV = GA->getGlobal();
+        // We can't perform this optimization for data whose alignment
+        // is insufficient for the instruction encoding.
+        if (GV->getAlignment() < 4 &&
+            (StorageOpcode == PPC::LD || StorageOpcode == PPC::STD ||
+             StorageOpcode == PPC::LWA)) {
+          DEBUG(dbgs() << "Rejected this candidate for alignment.\n\n");
+          continue;
+        }
         ImmOpnd = CurDAG->getTargetGlobalAddress(GV, dl, MVT::i64, 0, Flags);
       } else if (ConstantPoolSDNode *CP =
                  dyn_cast<ConstantPoolSDNode>(ImmOpnd)) {

llvm-svn: 185380
2013-07-01 20:52:27 +00:00
Akira Hatanaka 8b5b1e072f [mips] Fix test case to check that mips64 instructions are generated.
llvm-svn: 185371
2013-07-01 20:18:58 +00:00