Commit Graph

5147 Commits

Author SHA1 Message Date
Chandler Carruth 8a3f20abac FileCheck-ize another test, and upgrade its syntax a bit.
llvm-svn: 134332
2011-07-02 20:42:17 +00:00
Chandler Carruth 38d367e473 FileCheck-ize another codegen test, tightening it up.
llvm-svn: 134331
2011-07-02 20:42:14 +00:00
Chandler Carruth bf252382b8 FileCheck-ize another test, making it much more precise for testing the
individual cases, while hard coding less about registers in use.

llvm-svn: 134330
2011-07-02 20:42:11 +00:00
Chandler Carruth 308b3b66b1 FileCheck-ize another test. This one is more clear and runs fewer
commands as a result.

llvm-svn: 134329
2011-07-02 20:42:08 +00:00
Chandler Carruth 334faf8f1a FileCheck-ize a test, no functionality changed.
llvm-svn: 134328
2011-07-02 20:42:06 +00:00
Jakob Stoklund Olesen 54f7c59c1a Better diagnostics when inline asm fails to allocate.
asm.c:2:7: error: ran out of registers during register allocation
  asm(""::"r"(0), "r"(1), "r"(2), "r"(3), "r"(4), "r"(5), "r"(6), "r"(7), "r"(8), "r"(9));
        ^

llvm-svn: 134310
2011-07-02 07:17:37 +00:00
Eric Christopher 2eca9d5ddf Be less specific about register allocation ordering.
llvm-svn: 134308
2011-07-02 04:06:41 +00:00
Eric Christopher a8a56f7e5c TargetConstant immediates won't be placed into registers so tighten
up the valid constant check earlier.

rdar://9692967

llvm-svn: 134286
2011-07-01 23:04:38 +00:00
Dan Gohman a293f24a0d Teach IVUsers to stop at non-affine expressions unless they are both
outside the loop and reducible.

This more completely hides them from LSR, which isn't usually able to
do anything meaningful with non-affine expressions anyway, and this
consequently hides them from SCEVExpander, which is acutely unprepared
for non-affine expressions.

Replace test/CodeGen/X86/lsr-nonaffine.ll with a new test that tests
the new behavior.

This works around the bug in PR10117 / rdar://problem/9633149, and is
generally an improvement besides.

llvm-svn: 134268
2011-07-01 22:05:19 +00:00
Jim Grosbach cf1464d943 ARMv7M vs. ARMv7E-M support.
The DSP instructions in the Thumb2 instruction set are an optional extension
in the Cortex-M* archtitecture. When present, the implementation is considered
an "ARMv7E-M implementation," and when not, an "ARMv7-M implementation."

Add a subtarget feature hook for the v7e-m instructions and hook it up. The
cortex-m3 cpu is an example of a v7m implementation, while the cortex-m4 is
a v7e-m implementation.

rdar://9572992

llvm-svn: 134261
2011-07-01 21:12:19 +00:00
Eric Christopher 29f1db85dd Add support for the 'j' immediate constraint. This is conditionalized on
supporting the instruction that the constraint is for 'movw'.

Part of rdar://9119939

llvm-svn: 134222
2011-07-01 01:00:07 +00:00
Eric Christopher c011d31543 Add support for the ARM 't' register constraint. And another testcase
for the 'x' register constraint.

Part of rdar://9119939

llvm-svn: 134220
2011-07-01 00:30:46 +00:00
Eric Christopher f1c74595aa Add support for the 'x' constraint.
Part of rdar://9307836 and rdar://9119939

llvm-svn: 134215
2011-07-01 00:14:47 +00:00
Jakob Stoklund Olesen d0e2352b65 Fix a problem with fast-isel return values introduced in r134018.
We would put the return value from long double functions in the wrong
register.

This fixes gcc.c-torture/execute/conversion.c

llvm-svn: 134205
2011-06-30 23:42:18 +00:00
Eric Christopher f45daac30f Add support for the 'h' constraint.
Part of rdar://9119939

llvm-svn: 134203
2011-06-30 23:23:01 +00:00
Jim Grosbach b98ab91e39 Thumb1 register to register MOV instruction is predicable.
Fix a FIXME and allow predication (in Thumb2) for the T1 register to
register MOV instructions. This allows some better codegen with
if-conversion (as seen in the test updates), plus it lays the groundwork
for pseudo-izing the tMOVCC instructions.

llvm-svn: 134197
2011-06-30 22:10:46 +00:00
Jim Grosbach 353da73186 Pseudo-ize the t2LDMIA_RET instruction.
It's just a t2LDMIA_UPD instruction with extra codegen properties, so it
doesn't need the encoding information. As a side-benefit, we now correctly
recognize for instruction printing as a 'pop' instruction.

llvm-svn: 134173
2011-06-30 18:25:42 +00:00
Eric Christopher c932173773 Fix a small thinko for constant i64 lock/orq optimization where we
we didn't have an opcode for 64-bit constant or expressions.

Fixes rdar://9692967

llvm-svn: 134121
2011-06-30 00:48:30 +00:00
Devang Patel 0eada03216 Revert r133953 for now.
llvm-svn: 134116
2011-06-29 23:50:13 +00:00
Cameron Zwarich 34c8f51d65 In the ARM global merging pass, allow extraneous alignment specifiers. This pass
already makes the assumption, which is correct on ARM, that a type's alignment is
less than its alloc size. This improves codegen with Clang (which inserts a lot of
extraneous alignment specifiers) and fixes <rdar://problem/9695089>.

llvm-svn: 134106
2011-06-29 22:24:25 +00:00
Benjamin Kramer d2a84f6a63 Don't depend on the optimization reverted in r134067.
llvm-svn: 134068
2011-06-29 14:07:18 +00:00
Benjamin Kramer 8665f8d916 Revert a part of r126557 which could create unschedulable DAGs.
llvm-svn: 134067
2011-06-29 13:47:25 +00:00
Jakob Stoklund Olesen 7297e7e223 Clean up the handling of the x87 fp stack to make it more robust.
Drop the FpMov instructions, use plain COPY instead.

Drop the FpSET/GET instruction for accessing fixed stack positions.
Instead use normal COPY to/from ST registers around inline assembly, and
provide a single new FpPOP_RETVAL instruction that can access the return
value(s) from a call. This is still necessary since you cannot tell from
the CALL instruction alone if it returns anything on the FP stack. Teach
fast isel to use this.

This provides a much more robust way of handling fixed stack registers -
we can tolerate arbitrary FP stack instructions inserted around calls
and inline assembly. Live range splitting could sometimes break x87 code
by inserting spill code in unfortunate places.

As a bonus we handle floating point inline assembly correctly now.

llvm-svn: 134018
2011-06-28 18:32:28 +00:00
Roman Divacky 4394e68c24 Implement ISD::VAARG lowering on PPC32.
llvm-svn: 134005
2011-06-28 15:30:42 +00:00
Jakob Stoklund Olesen 74dd400410 FileCheckize a couple of tests.
Also and add a test for popping dead return values and avoid testing the
spill precision.

llvm-svn: 133997
2011-06-28 06:25:03 +00:00
Chandler Carruth e2a1b16963 FileCheck-ize a test that had the strangest TCL quote I've seen yet: an
opening single quote with no closing single quote, and with {} quotes
"inside" of it. This broke some of our tools that scrape test cases.

Also, while here, make the test actually assert what the comment says it
asserts. This was essentially authored by Nick Lewycky, and merely typed
in by myself. Let me know if this is still missing the mark, but the
previous test only succeeded due to the improper quoting preventing
*anything* from matching the grep -- it had a '4(%...)' sequence in the
output!

llvm-svn: 133980
2011-06-28 02:03:10 +00:00
Evan Cheng b7d00313dc Remove the experimental (and unused) pre-ra splitting pass. Greedy regalloc can split live ranges.
llvm-svn: 133962
2011-06-27 23:40:45 +00:00
Devang Patel 4dc034df1d During bottom up fast-isel, instructions emitted to materalize registers are at top of basic block and do not have debug location. This may misguide debugger while entering the basic block and sometimes debugger provides semi useful view of current location to developer by picking up previous known location as current location. Assign a sensible location to the first instruction in a basic block, if it does not have one location derived from source file, so that debugger can provide meaningful user experience to developers in edge cases.
llvm-svn: 133953
2011-06-27 22:32:04 +00:00
Eric Christopher aff7ed55bc Allow lr in the register options here.
llvm-svn: 133935
2011-06-27 20:31:01 +00:00
Jakob Stoklund Olesen f68976071d Move all inline-asm-fpstack tests to a single file.
Also fix some of the tests that were actually testing wrong behavior -
An input operand in {st} is only popped by the inline asm when {st} is
also in the clobber list.

The original bug reports all had ~{st} clobbers as they should.

llvm-svn: 133916
2011-06-27 17:27:37 +00:00
Dan Bailey b49b736519 PTX: corrected tests that were failing
llvm-svn: 133875
2011-06-25 19:41:17 +00:00
Dan Bailey b7ee561399 PTX: Reverting implementation of i8.
The .b8 operations in PTX are far more limiting than I first thought. The mov operation isn't even supported, so there's no way of converting a .pred value into a .b8 without going via .b16, which is
not sensible. An improved implementation needs to use the fact that loads and stores automatically extend and truncate to implement support for EXTLOAD and TRUNCSTORE in order to correctly support
boolean values.

llvm-svn: 133873
2011-06-25 18:16:28 +00:00
Chad Rosier f3e11190f3 Test case for r133858 (tail call optimize in the presence of byval).
llvm-svn: 133863
2011-06-25 02:44:56 +00:00
Devang Patel f071d72c44 Handle debug info for i128 constants.
llvm-svn: 133821
2011-06-24 20:46:11 +00:00
Dan Bailey dd01c4ac7a PTX: Add support for i8 type and introduce associated .b8 registers
The i8 type is required for boolean values, but can only use ld, st and mov instructions. The i1 type continues to be used for predicates.

llvm-svn: 133814
2011-06-24 19:27:10 +00:00
Chad Rosier fa8d89327f The Neon VCVT (between floating-point and fixed-point, Advanced SIMD)
instructions can be used to match combinations of multiply/divide and VCVT 
(between floating-point and integer, Advanced SIMD).  Basically the VCVT 
immediate operand that specifies the number of fraction bits corresponds to a 
floating-point multiply or divide by the corresponding power of 2.

For example, VCVT (floating-point to fixed-point, Advanced SIMD) can replace a 
combination of VMUL and VCVT (floating-point to integer) as follows:

Example (assume d17 = <float 8.000000e+00, float 8.000000e+00>):
  vmul.f32        d16, d17, d16
  vcvt.s32.f32    d16, d16
becomes:
  vcvt.s32.f32    d16, d16, #3

Similarly, VCVT (fixed-point to floating-point, Advanced SIMD) can replace a 
combinations of VCVT (integer to floating-point) and VDIV as follows:

Example (assume d17 = <float 8.000000e+00, float 8.000000e+00>):
  vcvt.f32.s32    d16, d16
  vdiv.f32        d16, d17, d16
becomes:
  vcvt.f32.s32    d16, d16, #3

llvm-svn: 133813
2011-06-24 19:23:04 +00:00
Akira Hatanaka 35792089e7 Change the chain input of nodes that load the address of a function. This change
enables SelectionDAG::getLoad at MipsISelLowering.cpp:1914 to return a
pre-existing node instead of redundantly create a new node every time it is
called.

llvm-svn: 133811
2011-06-24 19:01:25 +00:00
Akira Hatanaka ca88b4abec Prevent generation of redundant addiu instructions that compute address of
static variables or functions. 

llvm-svn: 133803
2011-06-24 17:55:19 +00:00
Justin Holewinski 6cdd72a9ca PTX: Always use registers for return values, but use .param space for device
parameters if SM >= 2.0

- Update test cases to be more robust against register allocation changes
- Bump up the number of registers to 128 per type
- Include Python script to re-generate register file with any number of
  registers

llvm-svn: 133736
2011-06-23 18:10:13 +00:00
Justin Holewinski a4aecf3dc4 PTX: Fixup test cases for device param changes
llvm-svn: 133735
2011-06-23 18:10:08 +00:00
Andrew Trick 67ff0718a4 lit support for REQUIRES: asserts.
Take #2. Don't piggyback on the existing config.build_mode. Instead,
define a new lit feature for each build feature we need (currently
just "asserts"). Teach both autoconf'd and cmake'd Makefiles to define
this feature within test/lit.site.cfg. This doesn't require any lit
harness changes and should be more robust across build systems.

llvm-svn: 133664
2011-06-22 23:23:19 +00:00
Rafael Espindola 2496c1f1f8 Reenable tail duplication of bb with just an unconditional jump, but
don't remove blocks that have their address taken.

llvm-svn: 133659
2011-06-22 22:31:57 +00:00
Nick Lewycky 90e6a4e5d5 Needs a triple.
llvm-svn: 133634
2011-06-22 19:42:14 +00:00
Nick Lewycky 6208a2fd66 Emit trailing padding on constant vectors when TargetData says that the vector
is larger than the sum of the elements (including per-element padding).

llvm-svn: 133631
2011-06-22 18:55:03 +00:00
Justin Holewinski 6fafebfb6a PTX: Add signed integer comparisons
llvm-svn: 133599
2011-06-22 02:09:50 +00:00
Justin Holewinski 54e3c0f5d9 PTX: Add .address_size directive if PTX version >= 2.3
Patch by Wei-Ren Chen

llvm-svn: 133589
2011-06-22 00:43:56 +00:00
Devang Patel c93ef81e24 Test case for r133560.
llvm-svn: 133585
2011-06-22 00:03:42 +00:00
Bob Wilson 646dd0f4d1 Revert r133452: "Emit movq for 64-bit register to XMM register moves..."
This is breaking compiler-rt and llvm-gcc builds on MacOSX when not using
the integrated assembler.

llvm-svn: 133524
2011-06-21 17:35:13 +00:00
Anna Zaks 083f0b5a7e Add support for sadd.with.overflow and uadd.with.overflow intrinsics to the CBackend by emitting definitions for each intrinsic that occurs in the module.
llvm-svn: 133522
2011-06-21 17:18:15 +00:00
Evan Cheng 4c0bd9629d Teach dag combine to match halfword byteswap patterns.
1. (((x) & 0xFF00) >> 8) | (((x) & 0x00FF) << 8)
   => (bswap x) >> 16
2. ((x&0xff)<<8)|((x&0xff00)>>8)|((x&0xff000000)>>8)|((x&0x00ff0000)<<8))
   => (rotl (bswap x) 16)

This allows us to eliminate most of the def : Pat patterns for ARM rev16
revsh instructions. It catches many more cases for ARM and x86.

rdar://9609108

llvm-svn: 133503
2011-06-21 06:01:08 +00:00
Akira Hatanaka 4c406e7457 Re-apply 132758 and 132768 which were speculatively reverted in 132777.
llvm-svn: 133494
2011-06-21 00:40:49 +00:00
Justin Holewinski cd4484d25d PTX: Fix conversion between predicates and value types
llvm-svn: 133454
2011-06-20 18:42:48 +00:00
Nick Lewycky c7df192279 Emit movq for 64-bit register to XMM register moves, but continue to accept
movd when assembling.

llvm-svn: 133452
2011-06-20 18:33:26 +00:00
Roman Divacky 254f82112d Don't apply on PPC64 the 32bit ADDIC optimizations as there's no overflow
with 32bit values.

llvm-svn: 133439
2011-06-20 15:28:39 +00:00
Nadav Rotem d34ce4344b Fix PromoteIntRes_TRUNCATE: Add support for cases where the
source vector type is to be split while the target vector is to be promoted.
(eg: <4 x i64> -> <4 x i8> )

llvm-svn: 133424
2011-06-20 07:15:58 +00:00
Benjamin Kramer f3c6d6def5 Update test.
llvm-svn: 133390
2011-06-19 12:14:34 +00:00
Nadav Rotem 6d0036e259 Reduce the runtime of the test. Keep only the interesting cases.
llvm-svn: 133381
2011-06-19 08:12:43 +00:00
Chris Lattner 8936d2bfbc Remove support for parsing the "type i32" syntax for defining a numbered
top level type without a specified number.  This syntax isn't documented
and blocks forward progress.

llvm-svn: 133371
2011-06-19 00:03:46 +00:00
Chris Lattner 80ed9dc9e5 rip out a ton of intrinsic modernization logic from AutoUpgrade.cpp, which is
for pre-2.9 bitcode files.  We keep x86 unaligned loads, movnt, crc32, and the
target indep prefetch change.

As usual, updating the testsuite is a PITA.

llvm-svn: 133337
2011-06-18 06:05:24 +00:00
Jakob Stoklund Olesen 831ae0105a Switch ARM to using AltOrders instead of MethodBodies.
This slightly changes the GPR allocation order on Darwin where R9 is not
a callee-saved register:

Before: %R0 %R1 %R2 %R3 %R12 %R9 %LR %R4 %R5 %R6 %R8 %R10 %R11
After:  %R0 %R1 %R2 %R3 %R9 %R12 %LR %R4 %R5 %R6 %R8 %R10 %R11
llvm-svn: 133326
2011-06-18 01:14:46 +00:00
Galina Kistanova 197ea52d4b Moved to the right place.
llvm-svn: 133324
2011-06-18 00:59:37 +00:00
Eric Christopher e4a1266a9a Fix UMULO support for 2x register width to allow the full
range without a libcall to a new mulo<mode> libcall
that we'd have to create.

Finishes the rest of rdar://9090077 and rdar://9210061

llvm-svn: 133318
2011-06-18 00:09:57 +00:00
Nadav Rotem ea7822685a Fix a bug in the type-lowering of integer-promoted elements. Add a check that
the newly created simple type is valid before checking its legality.
Re-commit the test file.

llvm-svn: 133291
2011-06-17 20:54:12 +00:00
Evan Cheng 7552a62af5 Add an alternative rev16 pattern. We should figure out a better way to handle these complex rev patterns. rdar://9609108
llvm-svn: 133289
2011-06-17 20:47:21 +00:00
Eric Christopher 5bbb2bdb46 Lower multiply with overflow checking to __mulo<mode>
calls if we haven't been able to lower them any
other way.

Fixes rdar://9090077 and rdar://9210061

llvm-svn: 133288
2011-06-17 20:41:29 +00:00
Galina Kistanova ac6bc75030 est 2008-06-04-indirectmem.ll is X86-specific. Move to X86 folder.
llvm-svn: 133275
2011-06-17 18:26:23 +00:00
Chris Lattner 6bc5c89093 Stop accepting and ignoring attributes in function types. Attributes are applied
to functions and call/invokes, not to types.

llvm-svn: 133266
2011-06-17 17:37:13 +00:00
Roman Divacky d041962c20 Fix a few places where 32bit instructions/registerset were used on PPC64.
llvm-svn: 133260
2011-06-17 15:21:10 +00:00
Justin Holewinski 3604d9a421 PTX: Adjust rounding modes
* rounding modes for fp add, mul, sub now use .rn
* float -> int rounding correctly uses .rzi not .rni
* 32bit fdiv for sm13 uses div.rn (instead of div.approx)
* 32bit fdiv for sm10 now uses div (instead of div.approx)

Approx is not IEEE 754 compatible (and should be optionally set by a flag to the backend instead). The .rn rounding modifier is the PTX default anyway, but it's better to be explicit.

All these modifiers should be available by using __fmul_rz functions for example, but support will need to be added for this in the backend.

Patch by Dan Bailey

llvm-svn: 133253
2011-06-17 12:12:42 +00:00
Chris Lattner 5756c16cdf make the asmparser reject function and type redefinitions. 'Merging' hasn't been
needed since llvm-gcc 3.4 days.

llvm-svn: 133248
2011-06-17 07:06:44 +00:00
Chris Lattner 59345c8b65 remove asmparser support for the old getresult instruction, which has been subsumed by extractvalue.
llvm-svn: 133247
2011-06-17 06:57:15 +00:00
Chris Lattner 33de427cd6 remove parser support for the obsolete "multiple return values" syntax, which
was replaced with return of a "first class aggregate".

llvm-svn: 133245
2011-06-17 06:49:41 +00:00
Chris Lattner def1949c00 Remove support for using "foo" as symbols instead of %"foo". This is ancient
syntax and has been long obsolete.  As usual, updating the tests is the nasty
part of this.

llvm-svn: 133242
2011-06-17 06:36:20 +00:00
Chris Lattner b90ed2233c manually upgrade a bunch of tests to modern syntax, and remove some that
are either unreduced or only test old syntax.

llvm-svn: 133228
2011-06-17 03:14:27 +00:00
Cameron Zwarich 033026ffc0 Update an insertion point iterator after replacing a return instruction with a
tail call pseudoinstruction. This fixes <rdar://problem/9624333>.

llvm-svn: 133227
2011-06-17 02:16:43 +00:00
Jakob Stoklund Olesen c826df9506 Don't use register classes larger than TLI->getRegClassFor(VT).
In Thumb mode we cannot handle GPR virtual registers, even though some
instructions can. When isel is lowering a CopyFromReg, it should limit
itself to subclasses of getRegClassFor(VT).

<rdar://problem/9624323>

llvm-svn: 133210
2011-06-16 22:50:38 +00:00
Nick Lewycky 65c47187f2 There's no need to be so picky about the particular register.
llvm-svn: 133189
2011-06-16 21:00:00 +00:00
Justin Holewinski 7f191b2a3b PTX: Finish new calling convention implementation
llvm-svn: 133172
2011-06-16 17:50:00 +00:00
Bruno Cardoso Lopes bbf2ab990f Add AVX suport for fpextend.
Original patch by Syoyo Fujita with more comments by me.

llvm-svn: 133153
2011-06-16 07:03:21 +00:00
Eli Friedman 9b3d779537 FileCheck-ize test, and make it work on EABI hosts, like clang-native-arm-cortex-a9.
llvm-svn: 133139
2011-06-16 02:36:32 +00:00
Eli Friedman 575d0163bb Force a triple here so this test doesn't fail on EABI hosts (like clang-native-arm-cortex-a9).
llvm-svn: 133134
2011-06-16 01:49:31 +00:00
Nick Lewycky 27d604cbf3 Commit the right set of tests for r133124. Sorry 'bout that!
llvm-svn: 133133
2011-06-16 01:35:45 +00:00
Andrew Trick 41369d5e8a Reenabling this test with REQUIRES: Asserts
llvm-svn: 133132
2011-06-16 01:34:41 +00:00
Chad Rosier aed609da92 Typos.
llvm-svn: 133128
2011-06-16 01:24:24 +00:00
Chad Rosier 2730162bee Revision r128665 added an optimization to make use of NEON multiplier
accumulator forwarding.  Specifically (from SVN log entry):

Distribute (A + B) * C to (A * C) + (B * C) to make use of NEON multiplier
accumulator forwarding:
vadd d3, d0, d1
vmul d3, d3, d2
=>
vmul d3, d0, d2
vmla d3, d1, d2

Make sure it catches cases where operand 1 is add/fadd/sub/fsub, which was
intended in the original revision.

llvm-svn: 133127
2011-06-16 01:21:54 +00:00
Nick Lewycky 6d677cfdd8 Add a DAGCombine for (ext (binop (load x), cst)).
llvm-svn: 133124
2011-06-16 01:15:49 +00:00
Anna Zaks a56e84e439 Rename the test. Thanks Cameron! Use shorter/generic names.
llvm-svn: 133115
2011-06-16 00:34:10 +00:00
Anna Zaks 2c2aa9a9be Function::getNumBlockIDs() should be used instead of Function::size() to set the upper limit on the block IDs since basic blocks might get removed (simplified away) after being initially numbered. Plus the test case, in which SelectionDAGBuilder::visitBr() calls llvm::MachineFunction::removeFromMBBNumbering(), which introduces the hole in numbering leading to an assert in llc (prior to the fix).
llvm-svn: 133113
2011-06-16 00:03:21 +00:00
Rafael Espindola 10028230cf Testcase for previous commit.
llvm-svn: 133089
2011-06-15 21:18:51 +00:00
John McCall 4b7a8d68ae Add a new function attribute, nonlazybind, which inhibits lazy-loading
optimizations when emitting calls to the function;  instead those calls may
use faster relocations which require the function to be immediately resolved
upon loading the dynamic object featuring the call.  This is useful when it
is known that the function will be called frequently and pervasively and
therefore there is no merit in delaying binding of the function.

Currently only implemented for x86-64, where it turns into a call through
the global offset table.

Patch by Dan Gohman, who assures me that he's going to add LangRef documentation
for this once it's committed.

llvm-svn: 133080
2011-06-15 20:36:13 +00:00
Andrew Trick 967d584a3a Disabling this test until I can figure out the right lit flags.
llvm-svn: 133068
2011-06-15 18:25:38 +00:00
Jakob Stoklund Olesen 5977109f14 Remove custom allocation orders in SystemZ.
Note that this actually changes code generation, and someone who
understands this target better should check the changes.

- R12Q is now allocatable. I think it was omitted from the allocation
  order by mistake since it isn't reserved. It as apparently used as a
  GOT pointer sometimes, and it should probably be reserved if that is
  the case.

- The GR64 registers are allocated in a different order now. The
  register allocator will automatically put the CSRs last. There were
  other changes to the order that may have been significant.

The test fix is because r0 and r1 swapped places in the allocation order.

llvm-svn: 133067
2011-06-15 18:02:56 +00:00
Evan Cheng 678b691aa3 Another revsh pattern. rdar://9609059
llvm-svn: 133064
2011-06-15 17:17:48 +00:00
Andrew Trick 3013b6ae4a Added -stress-sched flag in the Asserts build.
Added a test case for handling physreg aliases during pre-RA-sched.

llvm-svn: 133063
2011-06-15 17:16:12 +00:00
Chad Rosier 19a1f425a7 TargetLoweringOpt is a struct used by DAGCombine, not a pass.
llvm-svn: 133062
2011-06-15 16:48:02 +00:00
Nadav Rotem 24c6558865 This test was failing on X86 machines which do not have SSE4. Fixed the test by
specifying that the target CPU is corei7.

llvm-svn: 133053
2011-06-15 12:26:53 +00:00
Evan Cheng 6d02d9044b PerformBFICombine - (bfi A, (and B, Mask1), Mask2) -> (bfi A, B, Mask2) iff
the bits being cleared by the AND are not demanded by the BFI.

The previous BFI dag combine rule was actually incorrect (or used to be
correct until BFI representation changed).

rdar://9609030

llvm-svn: 133034
2011-06-15 01:12:31 +00:00
Tanya Lattner e9e6705cf9 Add an optimization that looks for a specific pair-wise add pattern and generates a vpaddl instruction instead of scalarizing the add.
Includes a test case.

llvm-svn: 133027
2011-06-14 23:48:48 +00:00
Rafael Espindola 2efebb3610 Add triple.
llvm-svn: 133026
2011-06-14 23:47:36 +00:00
Chad Rosier 818e116723 When pattern matching during instruction selection make sure shl x,1 is not
converted to add x,x if x is a undef.  add undef, undef does not guarantee
that the resulting low order bit is zero.
Fixes <rdar://problem/9453156> and <rdar://problem/9487392>.

llvm-svn: 133022
2011-06-14 22:29:10 +00:00
Rafael Espindola 1bf96ac607 Check the llc output.
llvm-svn: 133021
2011-06-14 22:24:32 +00:00
Stuart Hastings f96281f4b7 Test case for x86 MMX inline asm. rdar://problem/8886707
llvm-svn: 133014
2011-06-14 21:51:38 +00:00
Rafael Espindola 761cee0785 Add a test for the recent regression.
llvm-svn: 133009
2011-06-14 20:38:50 +00:00
Dan Gohman 8355febbf4 This test is still failing. Delete the rest of it.
llvm-svn: 133001
2011-06-14 18:07:36 +00:00
Dan Gohman 92789eafe5 Revert r132991. This test is failing on the
llvm-gcc-x86_64-linux-selfhost buildbot and others.

llvm-svn: 133000
2011-06-14 18:03:11 +00:00
Rafael Espindola 3aeaf9e4c1 Add 132986 back, but avoid non-determinism if a bb address gets reused.
llvm-svn: 132995
2011-06-14 15:31:54 +00:00
Nadav Rotem 0e230bc7bb Add a testcase for #9623
llvm-svn: 132991
2011-06-14 13:23:10 +00:00
Rafael Espindola 06ba7a68de revert 132986 to see if the bots go green.
llvm-svn: 132988
2011-06-14 12:48:26 +00:00
Nadav Rotem a0da74677e This testcase cause a failure on some bots. Remove the failing test until
further investigation.

llvm-svn: 132986
2011-06-14 09:10:37 +00:00
Nadav Rotem 10193c830b Add a testcase for checking the integer-promotion of many different vector
types (with power of two types such as 8,16,32 .. 512).

Fix a bug in the integer promotion of bitcast nodes. Enable integer expanding
only if the target of the conversion is an integer (when the type action is
scalarize).

Add handling to the legalization of vector load/store in cases where the saved
vector is integer-promoted.

llvm-svn: 132985
2011-06-14 08:11:52 +00:00
Rafael Espindola 844485af13 Implement Jakob's suggestion on how to detect fall thought without calling
AnalyzeBranch.

llvm-svn: 132981
2011-06-14 06:08:32 +00:00
Bruno Cardoso Lopes 29386fb10d Since ARM's prefetch implementation predicted the presence of a instruction
cache prefetch and now that the info from "prefetch" to "ARMPreload" is present,
only add a testcase for PLI.

llvm-svn: 132978
2011-06-14 05:11:46 +00:00
Bruno Cardoso Lopes dc9ff3a4b1 Add one more argument to the prefetch intrinsic to indicate whether it's a data
or instruction cache access. Update the targets to match it and also teach
autoupgrade.

llvm-svn: 132976
2011-06-14 04:58:37 +00:00
Rafael Espindola da24f2f8e1 Make the threshold used by branch folding softer. Before we would get a
sharp all or nothing transition when one extra predecessor was added. Now
we still test first ones for merging.

llvm-svn: 132974
2011-06-14 04:41:17 +00:00
Bill Wendling e712449688 Heuristic: If the number of operands in the alias are more than the number of
operands in the aliasee, don't print the alias.

llvm-svn: 132963
2011-06-14 03:17:20 +00:00
Jakob Stoklund Olesen fb03a92c33 Be less aggressive about hinting in RAFast.
In particular, don't spill dirty registers only to satisfy a hint. It is
not worth it.

The attached test case provides an example where the fast allocator
would spill a register when other registers are available.

llvm-svn: 132900
2011-06-13 03:26:46 +00:00
Rafael Espindola 2f3c2fe7c5 Really fix the fall-through logic.
Add a triple to the tests.

llvm-svn: 132885
2011-06-12 05:57:01 +00:00
Rafael Espindola cb55e752ed Test for the previous commit.
llvm-svn: 132884
2011-06-12 05:35:39 +00:00
Rafael Espindola defd4b0875 AnalyzeBranch doesn't change which successors a bb has, just the order
we try to branch to them.

Before we were creating successor lists with duplicated entries. Fixing that
found a bug in isBlockOnlyReachableByFallthrough that would causes it to
return the wrong answer for

-----------
...
jne foo
jmp bar

foo:
----------

llvm-svn: 132882
2011-06-12 03:20:32 +00:00
Eli Friedman cd2124a3f0 Add full x86 fast-isel support for memcpy and memset.
rdar://9431466

llvm-svn: 132864
2011-06-10 23:39:36 +00:00
Eli Friedman 82818e4d95 Add -mattr=+sse2 to make the buildbots happy.
llvm-svn: 132839
2011-06-10 08:26:26 +00:00
Chad Rosier 19d5253f60 Adding a test case for revision 132825.
llvm-svn: 132830
2011-06-10 02:44:19 +00:00
Eli Friedman e3944cd825 Add a simple test which makes sure folding immediate float zero to a memory operand works.
llvm-svn: 132824
2011-06-10 00:30:08 +00:00
Cameron Zwarich 361548d4b4 A CCState was being created without setting whether it is in the Call or Prologue state,
causing an assertion failure downstream. This fixes <rdar://problem/9562908>.

This really seems like it should always be set at CCState creation time, so mistakes like
this can never happen. I'll take a look at doing that.

llvm-svn: 132811
2011-06-09 22:30:07 +00:00
Eli Friedman 1877ac9937 Change this DAGCombine to build AND of SHR instead of SHR of AND; this matches the ordering we prefer in instcombine. Part of rdar://9562809.
The potential DAGCombine which enforces this more generally messes up some other very fragile patterns, so I'm leaving that alone, at least for now.

llvm-svn: 132809
2011-06-09 22:14:44 +00:00
Eric Christopher f15601f19a Speculatively revert 132758 and 132768 to try to fix the Windows buildbots.
llvm-svn: 132777
2011-06-09 16:03:19 +00:00
Eric Christopher cafa08cbf3 Recommit r132764 since it didn't cause the windows buildbot failures.
llvm-svn: 132776
2011-06-09 15:39:01 +00:00
Eric Christopher 76fd742d16 Temporarily revert 132764 to see if it fixes the Windows buildbot.
llvm-svn: 132771
2011-06-09 06:29:54 +00:00
Akira Hatanaka 0683a7212e Initial support for inline asm memory operand constraints.
llvm-svn: 132768
2011-06-09 03:31:05 +00:00
Eric Christopher 11edab6a46 If the alignment of the byval argument is greater than the alignment
of the frame then increase the maximum alignment of the frame to
match.

Fixes PR6965

llvm-svn: 132764
2011-06-09 00:15:19 +00:00
Akira Hatanaka 4e9af454f7 Fix bug in lowering of DYNAMIC_STACKALLOC nodes. The correct offset of the
dynamically allocated stack area was not set.

llvm-svn: 132758
2011-06-08 21:28:09 +00:00
Cameron Zwarich 2e252de512 Fix an issue where the two-address conversion pass incorrectly rewrites untied
operands to an early clobber register. This fixes <rdar://problem/9566076>.

llvm-svn: 132738
2011-06-07 23:54:00 +00:00
Rafael Espindola c85e0d81e4 Fix a silly error I introduce in r131951.
Fixes PR10095.

llvm-svn: 132735
2011-06-07 23:26:45 +00:00
Stuart Hastings 7ae360f2e1 Tweak this test for ARM-hosted 'bot.
llvm-svn: 132711
2011-06-07 15:23:11 +00:00
Nadav Rotem d1e8f9a1e0 Move the legalizer tests to the X86 directory because the test uses the x86
codegen. Thanks Galina.

llvm-svn: 132706
2011-06-07 05:23:58 +00:00
Akira Hatanaka 08b7a779ef Add test case for C++ exception handling and fix the following mistakes in MipsFrameLowering::emitPrologue:
- cfi directives are not inserted at the right location or in the right order.
- The source MachineLocation for the cfi directive that changes the cfa register
  to $fp should be MachineLocation::VirtualFP.
- A PROLOG_LABEL that marks the beginning of cfi_offset directives for
  callee-saved register is emitted even when no callee-saved registers are
  saved.
- When a callee-saved double precision register is saved, two cfi_offset
  directives, one for each of the paired single precision registers, should be
  emitted.
 
 

llvm-svn: 132703
2011-06-07 02:17:21 +00:00
Jakob Stoklund Olesen df476270eb Simplify local live range splitting's safeguard to fix PR10070.
When local live range splitting creates a live range with the same
number of instructions as the old range, mark it as RS_Local. When such
a range is seen again, require that it be split in a way that reduces
the number of instructions. That guarantees we are making progress while
still being able to perform 3 -> 2+3 splits as required by PR10070.

This also means that the PrevSlot map is no longer needed. This was also
used to estimate new spill weights, but that is no longer necessary
after slotIndexes::insertMachineInstrInMaps() got the extra Late
insertion argument.

llvm-svn: 132697
2011-06-06 23:55:20 +00:00
Stuart Hastings e0d3426e1a Followup to 132458, omit unnecessary stack copy when x87 input is a
load.  rdar://problem/6373334

llvm-svn: 132696
2011-06-06 23:15:58 +00:00
Nadav Rotem c807fa5687 Add methods to support the integer-promotion of vector types. Methods to
legalize SDNodes such as BUILD_VECTOR, EXTRACT_VECTOR_ELT, etc.

llvm-svn: 132689
2011-06-06 20:55:56 +00:00
Stuart Hastings 2f7f64f9e1 Test case for PR10085.
llvm-svn: 132682
2011-06-06 20:03:22 +00:00
Eli Friedman bd375f1a3f PR10077: fix fast-isel of extractvalue of aggregate constants.
llvm-svn: 132676
2011-06-06 05:46:34 +00:00
Benjamin Kramer 59652d36a5 Harden tests for windows path separators.
llvm-svn: 132671
2011-06-05 18:20:05 +00:00
Jakob Stoklund Olesen 38080e8700 Fix a test that keeps breaking when allocation orders change.
Who said FileCheck couldn't handle arbitrarily complex conditions?

llvm-svn: 132654
2011-06-04 23:34:40 +00:00
Nadav Rotem 06bd6d304e TypeLegalizer: Add support for passing of vector-promoted types in registers (copyFromParts/copyToParts).
llvm-svn: 132649
2011-06-04 20:58:08 +00:00
Stuart Hastings be605494ac Reapply 132424 with fixes. This fixes PR10068.
rdar://problem/5993888

llvm-svn: 132606
2011-06-03 23:53:54 +00:00
Jakob Stoklund Olesen 496fa5556f Fix some tests that depend on register allocation.
llvm-svn: 132602
2011-06-03 22:45:21 +00:00
Eric Christopher 1e3e8933ed Another possible bug. Stopgap until we can autogenerate tables and
constraint lengths.

Part of rdar://9037836 and rdar://9119939

llvm-svn: 132598
2011-06-03 22:09:12 +00:00
Eric Christopher 761a5d4280 Fix an off by one error.
Part of rdar://9037836 and rdar://9119939

llvm-svn: 132590
2011-06-03 20:44:52 +00:00
Jakob Stoklund Olesen b8bf3c0f8b Switch AllocationOrder to using RegisterClassInfo instead of a BitVector
of reserved registers.

Use RegisterClassInfo in RABasic as well. This slightly changes som
allocation orders because RegisterClassInfo puts CSR aliases last.

llvm-svn: 132581
2011-06-03 20:34:53 +00:00
Eric Christopher 354b2a25f3 Make the Uv constraint a memory operand. This doesn't solve the
addressing mode problem mentioned in r132559.

Backend part of rdar://9037836 and part of rdar://9119939

llvm-svn: 132561
2011-06-03 17:24:37 +00:00
Roman Divacky a4a59aebd9 Fix wrong usages of CTR/MCTR where CTR8/MCTR8 was meant.
- Check for MTCTR8 in addition to MTCTR when looking up a hazard.

- When lowering an indirect call use CTR8 when targeting 64bit.

- Introduce BCTR8 that uses CTR8 and use it on 64bit when expanding ISD::BRIND.

The last change fixes PR8487. With those changes, we are able to compile a
running "ls" and "sh" on FreeBSD/PowerPC64.

llvm-svn: 132552
2011-06-03 15:47:49 +00:00
Eli Friedman 86585798af Add ARM fast-isel support for materializing the address of a global in cases where the global uses an indirect symbol.
rdar://9431157

llvm-svn: 132522
2011-06-03 01:13:19 +00:00
Devang Patel e5feef0fe1 During post RA scheduling, do not try to chase reg defs. to preserve DBG_VALUEs. This approach has several downsides, for example, it does not work when dbg value is a constant integer, it does not work if reg is defined more than once, it places end of debug value range markers in the wrong place. It even causes misleading incorrect debug info when duplicate DBG_VALUE instructions point to same reg def.
Instead, use simpler approach and let DBG_VALUE follow its predecessor instruction. After live debug value analysis pass, all DBG_VALUE instruction are placed at the right place. Thanks Jakob for the hint!

llvm-svn: 132483
2011-06-02 20:07:12 +00:00
Rafael Espindola e37b939793 Add test for PR10068.
llvm-svn: 132482
2011-06-02 20:02:48 +00:00
Rafael Espindola aa318ae495 Revert 132424 to fix PR10068.
llvm-svn: 132479
2011-06-02 19:57:47 +00:00
Stuart Hastings 351422bdc8 Andy pointed out a dumb omission in this test case. Thanks Andy!
llvm-svn: 132477
2011-06-02 19:26:49 +00:00
Stuart Hastings e239a6920e Jakob pointed out a dumb omission in this test case. Thanks Jakob!
llvm-svn: 132472
2011-06-02 18:44:05 +00:00
Stuart Hastings 8d530ad22a Omit unnecessary stack copy when x87 input is a load.
rdar://problem/6373334

llvm-svn: 132458
2011-06-02 15:57:11 +00:00
Stuart Hastings 7f25c32d5b Tweak testcase for ARM bot. rdar://problem/5993888
llvm-svn: 132454
2011-06-02 05:05:39 +00:00
Akira Hatanaka 2446869410 Detect FI|cst pattern in MipsDAGToDAGISel::SelectAddr. Patch by Sasa Stankovic.
llvm-svn: 132448
2011-06-02 01:03:14 +00:00
Akira Hatanaka d84c76f2a7 Test case for r132444.
llvm-svn: 132445
2011-06-02 00:25:53 +00:00
Devang Patel 324f843107 Do not drop constant values when a variable's content is described using .debug_loc entries.
llvm-svn: 132427
2011-06-01 22:03:25 +00:00
Stuart Hastings 7adc95f69e Recommit 132404 with fixes. rdar://problem/5993888
llvm-svn: 132424
2011-06-01 21:33:14 +00:00
Eric Christopher 690030c116 Allow bitcasts between valid types of the same size and vector
types if the vector type is legal.

Fixes rdar://9306086

llvm-svn: 132420
2011-06-01 19:55:10 +00:00
Stuart Hastings aab130d995 Revert 132404 to appease a buildbot. rdar://problem/5993888
llvm-svn: 132419
2011-06-01 19:52:20 +00:00
Stuart Hastings 41b1aa466d Cleanup test case. rdar://problem/5660695
llvm-svn: 132408
2011-06-01 18:23:14 +00:00
Stuart Hastings 7b7c102f2c Add support for x86 CMPEQSS and friends. These instructions do a
floating-point comparison, generate a mask of 0s or 1s, and generally
DTRT with NaNs.  Only profitable when the user wants a materialized 0
or 1 at runtime.  rdar://problem/5993888

llvm-svn: 132404
2011-06-01 17:17:45 +00:00
Stuart Hastings 6f89e2ffaa A forthcoming SSE patch will break this test; since the test is also
valid for x87, re-target to x87.  rdar://problem/5993888

llvm-svn: 132401
2011-06-01 16:13:09 +00:00
Stuart Hastings 4d2fe66dc0 Test case for 132396. rdar://problem/5660695
llvm-svn: 132399
2011-06-01 15:50:29 +00:00
Nadav Rotem 8b24a731f2 This patch is another step in the direction of adding vector select. In this
patch we add a flag to enable a new type legalization decision - to promote
integer elements in vectors. Currently, the rest of the codegen does not support
this kind of legalization.  This flag will be removed when the transition is
complete.

llvm-svn: 132394
2011-06-01 12:51:46 +00:00
Richard Osborne 2f14b0bb1d Add XCore intrinsic for crc8.
llvm-svn: 132340
2011-05-31 16:24:49 +00:00
Richard Osborne 542f9a2bcf Add XCore intrinsic for crc32.
llvm-svn: 132336
2011-05-31 14:47:36 +00:00
Richard Osborne 36d027f7f6 Convert test to FileCheck.
llvm-svn: 132335
2011-05-31 14:00:05 +00:00
Bruno Cardoso Lopes 98fc4c8bbc This patch implements atomic intrinsics atomic.load.add (sub,and,or,xor,
nand), atomic.swap and atomic.cmp.swap, all in i8, i16 and i32 versions.
The intrinsics are implemented by creating pseudo-instructions, which are
then expanded in the method MipsTargetLowering::EmitInstrWithCustomInserter.

Patch by Sasa Stankovic.

llvm-svn: 132323
2011-05-31 02:54:07 +00:00
Bruno Cardoso Lopes bf3c1251e0 This patch implements the thread local storage. Implemented are General
Dynamic, Initial Exec and Local Exec TLS models.

Patch by Sasa Stankovic

llvm-svn: 132322
2011-05-31 02:53:58 +00:00
Rafael Espindola 08600bcf65 Use the dwarf->llvm mapping to print register names in the cfi
directives.

Fixes PR9826.

llvm-svn: 132317
2011-05-30 20:20:15 +00:00
Jakob Stoklund Olesen dd6fcc4e46 Fix PR10046 by updating LiveVariables kill info when splitting live ranges.
This only affects targets like Mips where branch instructions may kill virtual
registers. Most other targets branch on flag values, so virtual registers are
not involved.

The problem is that MachineBasicBlock::updateTerminator deletes branches and
inserts new ones while LiveVariables keeps a list of pointers to instructions
that kill virtual registers. That list wasn't properly updated in
MBB::SplitCriticalEdge.

llvm-svn: 132298
2011-05-29 20:10:28 +00:00
John McCall 7d84ece09b On Darwin ARM, set the UNWIND_RESUME libcall to _Unwind_SjLj_Resume.
This is important for the correct lowering of unwind instructions
(which doesn't matter at all) and llvm.eh.resume calls (which does).

Take 2, now with more basic competence.

llvm-svn: 132295
2011-05-29 19:50:32 +00:00
John McCall e64371b932 I didn't mean to commit these residues of a personal project.
llvm-svn: 132293
2011-05-29 19:41:56 +00:00
John McCall 085d891d80 On Darwin ARM, set the UNWIND_RESUME libcall to _Unwind_SjLj_Resume.
This is important for the correct lowering of unwind instructions
(which doesn't matter at all) and llvm.eh.resume calls (which does).

llvm-svn: 132291
2011-05-29 19:39:04 +00:00
Bruno Cardoso Lopes 325110f30d Add support for ARM ldrexd/strexd intrinsics. They both use i32 register pairs
to load/store i64 values. Since there's no current support to explicitly
declare such restrictions, implement it by using specific hardcoded register
pairs during isel.

llvm-svn: 132248
2011-05-28 04:07:29 +00:00
Eric Christopher d00e8ad803 Implement the 'M' output modifier for arm inline asm. This is fairly
register allocation dependent and will occasionally break. WIP in the
register allocator to model paired/etc registers.

rdar://9119939

llvm-svn: 132242
2011-05-28 01:40:44 +00:00
Akira Hatanaka b406843fe5 Define a wrapper node for target constant nodes (tglobaladdr, etc.).
Need this to prevent emitting illegal conditional move instructions. 

llvm-svn: 132240
2011-05-28 01:07:07 +00:00
Cameron Zwarich 1d553a2cc4 Fix the remaining atomic intrinsics to use the right register classes on Thumb2,
and add some basic tests for them.

llvm-svn: 132235
2011-05-27 23:54:00 +00:00
Eli Friedman 873106a932 Force a triple to make this test pass on Darwin.
llvm-svn: 132228
2011-05-27 23:12:48 +00:00
Cameron Zwarich 75d99e4b70 Add a GR32_NOREX_NOSP register class and fix a bug where getMatchingSuperRegClass()
was saying that the matching superregister class of GR32_NOREX in GR64_NOREX_NOSP
is GR64_NOREX, which drops the NOSP constraint. This fixes PR10032.

llvm-svn: 132225
2011-05-27 22:26:04 +00:00
Rafael Espindola d23bfb8a7a Make size computation less brittle.
llvm-svn: 132222
2011-05-27 22:05:41 +00:00
Jakob Stoklund Olesen 2348f3133f Make room for register allocation to improve.
llvm-svn: 132213
2011-05-27 20:15:06 +00:00
Evan Cheng 518bcd0ef4 Don't use movw / movt for iOS static codegen for now to workaround some tools issues. rdar://9514789
llvm-svn: 132211
2011-05-27 20:11:27 +00:00
Jakob Stoklund Olesen 63a9cef5c2 Delete a test that is no longer relevant.
According to PR2536, the old spiller had trouble with the IMPLICIT_DEF in this
code:

  %reg1028<def> = MOV16rm %reg0, 1, %reg0, <ga:g_5>, Mem:LD(2,2) [g_5 + 0]
  %reg1039<def> = IMPLICIT_DEF
  %reg1038<def> = INSERT_SUBREG %reg1039, %reg1028, 2
  %reg1025<def> = AND32ri %reg1038, 65534, %%EFLAGS<imp-def>

However, today we emit a zero-extending load instead:

  %vreg10<def> = MOVZX32rm16 %noreg, 1, %noreg, <ga:@g_5>, %noreg; %mem:LD2[@g_5] GR32:%vreg10
  %vreg0<def> = AND32ri %vreg10, 65534, %%EFLAGS<imp-def,dead>; %GR32:%vreg0,%vreg10

This makes the test pointless since it no longer creates the spiller hazard.

llvm-svn: 132210
2011-05-27 20:02:42 +00:00
Evan Cheng 97c9f84f68 Add iOS test
llvm-svn: 132203
2011-05-27 19:04:21 +00:00
Eli Friedman 3a8d9625b0 And fix the test in r132194.
llvm-svn: 132196
2011-05-27 18:14:28 +00:00
Eli Friedman fe84bd659c Fix a silly mistake (which trips over an assertion) in r132099. rdar://9515076
llvm-svn: 132194
2011-05-27 18:02:04 +00:00
Devang Patel 3c6aed2d98 Select DW_AT_const_value size based on variable size.
llvm-svn: 132193
2011-05-27 16:45:18 +00:00
Cameron Zwarich 34ef49dc74 Fix PR10029 - VerifyCoalescing failure on patterns_dfa.c of 445.gobmk.
llvm-svn: 132181
2011-05-27 05:04:51 +00:00
Chad Rosier b362884ca9 Renamed llvm.x86.sse42.crc32 intrinsics; crc64 doesn't exist.
crc32.[8|16|32] have been renamed to .crc32.32.[8|16|32] and
crc64.[8|16|32] have been renamed to .crc32.64.[8|64].

llvm-svn: 132163
2011-05-26 23:13:19 +00:00
Devang Patel 42ddaa10d3 During branch folding avoid inserting redundant DBG_VALUE machine instructions.
llvm-svn: 132148
2011-05-26 21:47:59 +00:00
Akira Hatanaka aa560006ed Add support for C++ exception handling.
llvm-svn: 132131
2011-05-26 18:59:03 +00:00
Eli Friedman c48f7c212e Fix test on Windows.
llvm-svn: 132126
2011-05-26 18:00:32 +00:00
Stuart Hastings 493a12bf5e Reverting 132105: it broke some LLVM-GCC DejaGNU tests.
llvm-svn: 132108
2011-05-26 04:09:49 +00:00
Stuart Hastings 276f231c2f Correctly handle a one-word struct passed byval on x86_64.
rdar://problem/6920088

llvm-svn: 132105
2011-05-26 02:44:56 +00:00
Eli Friedman c70355195c Rewrite fast-isel integer cast handling to handle more cases, and to be simpler and more consistent.
The practical effects here are that x86-64 fast-isel can now handle trunc from i8 to i1, and ARM fast-isel can handle many more constructs involving integers narrower than 32 bits (including loads, stores, and many integer casts).

rdar://9437928 .

llvm-svn: 132099
2011-05-25 23:49:02 +00:00
Akira Hatanaka fa63d3096d Define WeakRefDirective.
llvm-svn: 132098
2011-05-25 23:30:30 +00:00
Eric Christopher 8c5e4192e6 Implement the 'm' modifier. Note that it only works for memory operands.
Part of rdar://9119939

llvm-svn: 132081
2011-05-25 20:51:58 +00:00
Akira Hatanaka 44eba3ac49 Custom-lower FCOPYSIGN nodes.
llvm-svn: 132074
2011-05-25 19:32:07 +00:00
Cameron Zwarich 3088e0a179 Make tTAILJMPr/tTAILJMPrND emit a tBX without a preceding MOV of PC to LR. This
fixes <rdar://problem/9495913>

llvm-svn: 132042
2011-05-25 04:45:27 +00:00
Rafael Espindola fc9bae6f8b Replace the -unwind-tables option with a per function flag. This is more
LTO friendly as we can now correctly merge files compiled with or without
-fasynchronous-unwind-tables.

llvm-svn: 132033
2011-05-25 03:44:17 +00:00
Akira Hatanaka aac670c1c8 Fix lowering of DYNAMIC_STACKALLOC nodes.
llvm-svn: 132030
2011-05-25 02:20:00 +00:00
Eric Christopher 1b724948e9 Implement the arm 'L' asm modifier.
Part of rdar://9119939

llvm-svn: 132024
2011-05-24 23:27:13 +00:00
Eric Christopher b1dda56ac2 Implement the immediate part of the 'B' modifier.
Part of rdar://9119939

llvm-svn: 132023
2011-05-24 23:15:43 +00:00
Eric Christopher 7617883ce3 Add support for the arm 'y' asm modifier.
Fixes part of rdar://9444657

llvm-svn: 132011
2011-05-24 22:10:34 +00:00
Akira Hatanaka 2486729839 Test case for r132003.
llvm-svn: 132005
2011-05-24 21:28:18 +00:00
Akira Hatanaka ce4037ebcf Fix test case.
llvm-svn: 131988
2011-05-24 19:37:15 +00:00
Akira Hatanaka 0f30561bae Revision 131986 test case.
llvm-svn: 131987
2011-05-24 19:29:37 +00:00
Rafael Espindola 0f33be1b87 Fix the defaults for .eh_frame. We were marking it as writable.
llvm-svn: 131951
2011-05-24 02:50:20 +00:00
Evan Cheng 88f9137fd7 - Teach SelectionDAG::isKnownNeverZero to return true (op x, c) when c is
non-zero.
- Teach X86 cmov optimization to eliminate the cmov from ctlz, cttz extension
  when the source of X86ISD::BSR / X86ISD::BSF is proven to be non-zero.

rdar://9490949

llvm-svn: 131948
2011-05-24 01:48:22 +00:00
Akira Hatanaka 6af5bd2537 Add pattern for double-to-integer conversion. Patch by Sasa Stankovic.
llvm-svn: 131927
2011-05-23 22:16:43 +00:00
Dan Gohman 6c4a319088 When checking for signed multiplication overflow, watch out for INT_MIN and -1.
This fixes PR9845.

llvm-svn: 131919
2011-05-23 21:07:39 +00:00
Akira Hatanaka f9e5750fc8 Change StackDirection from StackGrowsUp to StackGrowsDown.
The following improvements are accomplished as a result of applying this patch:
- Fixed frame objects' offsets (relative to either the virtual frame pointer or
  the stack pointer) are set before instruction selection is completed. There is
  no need to wait until Prologue/Epilogue Insertion is run to set them.
- Calculation of final offsets of fixed frame objects is straightforward. It is
  no longer necessary to assign negative offsets to fixed objects for incoming
  arguments in order to distinguish them from the others.
- Since a fixed object has its relative offset set during instruction
  selection, there is no need to conservatively set its alignment to 4.
- It is no longer necessary to reorder non-fixed frame objects in 
  MipsFrameLowering::adjustMipsStackFrame.

llvm-svn: 131915
2011-05-23 20:16:59 +00:00
Devang Patel 9987d3098b Test case for r131908.
llvm-svn: 131909
2011-05-23 17:49:29 +00:00
Devang Patel c4d9a84159 While replacing all uses of a SDValue with another value, do not forget to transfer SDDbgValue.
llvm-svn: 131907
2011-05-23 17:35:08 +00:00
Cameron Zwarich bc90690b24 Fix <rdar://problem/9476260> by having tail calls always generate 32-bit branches
in Darwin Thumb2 code. Tail calls are already disabled on Thumb1.

llvm-svn: 131894
2011-05-23 01:57:17 +00:00
Renato Golin 4cd5187f5b RTABI chapter 4.3.4 specifies __eabi_mem* calls. Specifically, __eabi_memset accepts parameters (ptr, size, value) in a different order than GNU's memset (ptr, value, size), therefore the special lowering in AAPCS mode. Implementation by Evzen Muller.
llvm-svn: 131868
2011-05-22 21:41:23 +00:00
Benjamin Kramer 2fd48f2730 Implement mulo x, 2 -> addo x, x in DAGCombiner.
llvm-svn: 131800
2011-05-21 18:31:55 +00:00
Benjamin Kramer e08fb1dce9 Merge and FileCheckize test cases.
llvm-svn: 131799
2011-05-21 18:31:48 +00:00
Eli Friedman 60afcc2a6f Add fast-isel support for byval calls on x86.
llvm-svn: 131764
2011-05-20 22:21:04 +00:00
Stuart Hastings 91f1d24736 Re-commit 131641 with fixes; de-pseudoize MOVSX16rr8 and friends.
rdar://problem/8614450

llvm-svn: 131746
2011-05-20 19:04:40 +00:00
Akira Hatanaka 43407fe633 Make $fp and $ra callee-saved registers and let PrologEpilogInserter handle
saving and restoring them.

llvm-svn: 131745
2011-05-20 18:39:33 +00:00
Chad Rosier ad00f3d0b9 Fixed regression due to commit 131709, which disables vararg tail call optimizations on Win64
llvm-svn: 131740
2011-05-20 17:49:39 +00:00
Benjamin Kramer 0bf26746d9 Rename the "sandybridge" subtarget to "corei7-avx", for GCC compatibility.
llvm-svn: 131730
2011-05-20 15:11:26 +00:00
Cameron Zwarich e0a52df6e5 Fix PR9960 by teaching SimpleRegisterCoalescing::AdjustCopiesBackFrom() to preserve
the phikill flag.

llvm-svn: 131717
2011-05-20 03:54:04 +00:00
Akira Hatanaka fe4f9d5977 Fix bug in which nodes that write to argument registers do not get glued with the JALR node. Patch by Sasa Stankovic
llvm-svn: 131714
2011-05-20 02:30:51 +00:00
Chad Rosier 552f8c4819 Don't attempt to tail call optimize for Win64.
llvm-svn: 131709
2011-05-20 00:59:28 +00:00
Evan Cheng e8d2e9eb35 Revert r131664 and fix it in instcombine instead. rdar://9467055
llvm-svn: 131708
2011-05-20 00:54:37 +00:00
Eli Friedman 22da799428 Add fast-isel support for zeroext and signext ret instructions on x86.
llvm-svn: 131689
2011-05-19 22:16:13 +00:00
Eric Christopher 4014e5e208 Oddly people want to use the 'r' constraint for fp constants on x86.
Fixes rdar://9218925
Fixes PR9601

llvm-svn: 131682
2011-05-19 21:33:47 +00:00
Eli Friedman e53a77d3a6 Fix up this test to use explicit triples (Win64 passes a different number of arguments in registers).
llvm-svn: 131676
2011-05-19 21:13:08 +00:00
Akira Hatanaka 9e6a8cca5d Align i64 arguments to 64 bit boundaries.
llvm-svn: 131668
2011-05-19 20:29:48 +00:00
Evan Cheng 2b9bd38678 crc32 with 64-bit output zeros upper 32-bits. rdar://9467055
llvm-svn: 131664
2011-05-19 18:57:12 +00:00
Stuart Hastings ae012a7525 Move test to Transforms/InstCombine.
llvm-svn: 131634
2011-05-19 05:53:22 +00:00
Tanya Lattner 1d11720ae4 Handle perfect shuffle case that generates a vrev for vectors of floats.
Add test case.

llvm-svn: 131582
2011-05-18 21:44:54 +00:00
Chad Rosier f4e832b14e Enables vararg functions that pass all arguments via registers to be optimized into tail-calls when possible.
llvm-svn: 131560
2011-05-18 19:59:50 +00:00
Stuart Hastings 51d696766c An imminent fix to the x86_64 byval logic will expose a flaw in the
x86_64 sibcall logic.  I've filed PR9943 for the sibcall problem, and
this patch alters the testcase to work around the flaw.  When PR9943
is fixed, this patch should be reverted.

llvm-svn: 131557
2011-05-18 19:19:17 +00:00
Eli Friedman 3f46c3e702 Force a triple on a couple of tests; we don't support fast-isel of ret on Win64.
llvm-svn: 131540
2011-05-18 17:16:37 +00:00
Stuart Hastings 38849debb5 Merge pmovzx test case into existing file.
llvm-svn: 131539
2011-05-18 17:02:04 +00:00
Justin Holewinski bbdcd17d44 PTX: add flag to disable mad/fma selection
Patch by Dan Bailey

llvm-svn: 131537
2011-05-18 15:42:23 +00:00
Tanya Lattner 48b182c3a4 In r131488 I misunderstood how VREV works. It splits the vector in half and splits each half. Therefore, the real problem was that we were using a VREV64 for a 4xi16, when we should have been using a VREV32.
Updated test case and reverted change to the PerfectShuffle Table.

llvm-svn: 131529
2011-05-18 06:42:21 +00:00
Eli Friedman 7d7ad8374f Make some of the fast-isel tests actually test fast-isel (and fix test failures).
llvm-svn: 131510
2011-05-18 00:00:10 +00:00
Stuart Hastings 5bd18b6638 X86 pmovsx/pmovzx ignore the upper half of their inputs.
rdar://problem/6945110

llvm-svn: 131493
2011-05-17 22:13:31 +00:00
Tanya Lattner c7e291b354 vrev is incorrectly defined in the perfect shuffle table. The ordering is backwards (should be 0x3210 versus 0x1032) which exposed a bug when doing a shuffle on a 4xi16. I've attached a test case.
llvm-svn: 131488
2011-05-17 20:48:40 +00:00
Galina Kistanova dd45646a47 Move test for appropriate directory.
llvm-svn: 131477
2011-05-17 19:06:43 +00:00
Eli Friedman 7b27942fe7 Add x86 fast-isel for calls returning first-class aggregates. rdar://9435872.
This is r131438 with a couple small fixes.

llvm-svn: 131474
2011-05-17 18:29:03 +00:00
Eli Friedman 7335e8a720 Back out r131444 and r131438; they're breaking nightly tests. I'll look into
it more tomorrow.

llvm-svn: 131451
2011-05-17 02:36:59 +00:00
Eli Friedman e5f7f26df0 Fix test.
llvm-svn: 131444
2011-05-17 00:39:14 +00:00
Evan Cheng 54459240e3 Add target triple so test doesn't fail on Windows machines.
llvm-svn: 131439
2011-05-17 00:15:58 +00:00
Eli Friedman 83ba150f3a Add x86 fast-isel for calls returning first-class aggregates. rdar://9435872.
llvm-svn: 131438
2011-05-17 00:13:47 +00:00
Jakob Stoklund Olesen 4edf17d91f Teach LiveInterval::isZeroLength about null SlotIndexes.
When instructions are deleted, they leave tombstone SlotIndex entries.
The isZeroLength method should ignore these null indexes.

This causes RABasic to sometimes spill a callee-saved register in the
abi-isel.ll test, so don't run that test with -regalloc=basic.  Prioritizing
register allocation according to spill weight can cause more registers to be
used.

llvm-svn: 131436
2011-05-16 23:50:05 +00:00
Eli Friedman d4a3609d30 Remove dead code. Fix associated test to use FileCheck.
llvm-svn: 131424
2011-05-16 21:28:22 +00:00
Eli Friedman a4d4a0162d Make fast-isel work correctly s/uadd.with.overflow intrinsics.
llvm-svn: 131420
2011-05-16 21:06:17 +00:00
Eli Friedman 9ac944774f Basic fast-isel of extractvalue. Not too helpful on its own, given the IR clang generates for cases like this, but it should become more useful soon.
llvm-svn: 131417
2011-05-16 20:27:46 +00:00
Rafael Espindola df9db7ed92 Don't produce a vmovntdq if we don't have AVX support.
llvm-svn: 131330
2011-05-14 00:30:01 +00:00
Rafael Espindola e53b7d1a11 Make codegen able to handle values of empty types. This is one way
to fix PR9900. I will keep it open until sable is able to comment on it.

llvm-svn: 131294
2011-05-13 15:18:06 +00:00
Stuart Hastings aa02c0847d Since I can't reproduce the failures from 131261, re-trying with a
simplified version.  <rdar://problem/9298790>

llvm-svn: 131274
2011-05-13 00:51:54 +00:00
Stuart Hastings 8d57d8ea64 Revert 131266 and 131261 due to buildbot complaints.
rdar://problem/9298790

llvm-svn: 131269
2011-05-13 00:15:17 +00:00
Stuart Hastings ef4940254f Tweak 131261 (thumb2-cbnz.ll) to generate the intended cbnz.
rdar://problem/9298790

llvm-svn: 131266
2011-05-13 00:10:03 +00:00
Stuart Hastings 89f1b47e3a Non-fast-isel followup to 129634; correctly handle branches controlled
by non-CMP expressions.  The executable test case (129821) would test
this as well, if we had an "-O0 -disable-arm-fast-isel" LLVM-GCC
tester.  Alas, the ARM assembly would be very difficult to check with
FileCheck.

The thumb2-cbnz.ll test is affected; it generates larger code (tst.w
vs. cmp #0), but I believe the new version is correct.
rdar://problem/9298790

llvm-svn: 131261
2011-05-12 23:36:41 +00:00
Galina Kistanova 9e56e51fab Correction. Use explicit target triple in the test.
llvm-svn: 131252
2011-05-12 21:55:34 +00:00
Evan Cheng 43054e6159 Re-enable branchfolding common code hoisting optimization. Fixed a liveness test bug and also taught it to update liveins.
llvm-svn: 131241
2011-05-12 20:30:01 +00:00
Stuart Hastings 114ecbd0f4 Move this test to CodeGen/Thumb. rdar://problem/9416774
llvm-svn: 131196
2011-05-11 19:41:28 +00:00
Devang Patel 34a6620748 Identify end of prologue (and beginning of function body) using DW_LNS_set_prologue_end line table opcode.
llvm-svn: 131194
2011-05-11 19:22:19 +00:00
Nadav Rotem 8a7beb80f0 Fixes a bug in the DAGCombiner. LoadSDNodes have two values (data, chain).
If there is a store after the load node, then there is a chain, which means
that there is another user. Thus, asking hasOneUser would fail. Instead we
ask hasNUsesOfValue on the 'data' value.

llvm-svn: 131183
2011-05-11 14:40:50 +00:00
Nadav Rotem 8f971c27fb Add custom lowering of X86 vector SRA/SRL/SHL when the shift amount is a splat vector.
llvm-svn: 131179
2011-05-11 08:12:09 +00:00
Rafael Espindola 2a09d65979 Revert 131172 as it is causing clang to miscompile itself. I will try
to provide a reduced testcase.

llvm-svn: 131176
2011-05-11 03:27:17 +00:00
Evan Cheng 05fc35e275 Add a late optimization to BranchFolding that hoist common instruction sequences
at the start of basic blocks to their common predecessor. It's actually quite
common (e.g. about 50 times in JM/lencod) and has shown to be a nice code size
benefit. e.g.

        pushq   %rax
        testl   %edi, %edi
        jne     LBB0_2
## BB#1:
        xorb    %al, %al
        popq    %rdx
        ret
LBB0_2:
        xorb    %al, %al
        callq   _foo
        popq    %rdx
        ret

=>

        pushq   %rax
        xorb    %al, %al
        testl   %edi, %edi
        je      LBB0_2
## BB#1:
        callq   _foo
LBB0_2:
        popq    %rdx
        ret

rdar://9145558

llvm-svn: 131172
2011-05-11 01:03:01 +00:00
Rafael Espindola 19c1a56287 Produce a __debug_frame section on darwin ARM when appropriate.
llvm-svn: 131151
2011-05-10 21:04:45 +00:00
Justin Holewinski 3c0447259c PTX: add test cases for cvt, fneg, and selp
Patch by Dan Bailey

llvm-svn: 131128
2011-05-10 14:53:13 +00:00
Benjamin Kramer d724a590e5 X86: Add a bunch of peeps for add and sub of SETB.
"b + ((a < b) ? 1 : 0)" compiles into
	cmpl	%esi, %edi
	adcl	$0, %esi
instead of
	cmpl	%esi, %edi
	sbbl	%eax, %eax
	andl	$1, %eax
	addl	%esi, %eax

This saves a register, a false dependency on %eax
(Intel's CPUs still don't ignore it) and it's shorter.

llvm-svn: 131070
2011-05-08 18:36:07 +00:00
Jakob Stoklund Olesen a5c889982a Emit a proper error message when register allocators run out of registers.
This can't be just an assertion, users can always write impossible inline
assembly. Such an assembly statement should be included in the error message.

llvm-svn: 131024
2011-05-06 21:58:30 +00:00
Justin Holewinski 11d70b6b32 PTX: add PTX 2.3 language target
Patch by Wei-Ren Chen

llvm-svn: 130980
2011-05-06 11:40:36 +00:00
Eli Friedman 5401962643 Re-revert r130877; it's apparently causing a regression on 197.parser,
possibly related to cbnz formation.

llvm-svn: 130977
2011-05-06 05:23:07 +00:00
Rafael Espindola a4982bddf3 Don't produce a __debug_frame.
I tested both gdb on a bootstrapped clang and and the gdb testsuite on OS X (snow leopard)
and both are happy using __eh_frame.

llvm-svn: 130937
2011-05-05 18:43:39 +00:00
Eli Friedman 441a01a2b8 Avoid extra vreg copies for arguments passed in registers. Specifically, this can make MachineCSE more effective in some cases (especially in small functions). PR8361 / part of rdar://problem/8259436 .
llvm-svn: 130928
2011-05-05 16:53:34 +00:00
Jakob Stoklund Olesen 17d4f9bbcc Prepare remaining tests for -join-physreg going away.
llvm-svn: 130893
2011-05-04 23:54:59 +00:00
Jakob Stoklund Olesen 369bddf5ad Fix a batch of x86 tests to be coalescer independent.
Most of these tests require a single mov instruction that can come either before
or after a 2-addr instruction. -join-physregs changes the behavior, but the
results are equivalent.

llvm-svn: 130891
2011-05-04 23:54:51 +00:00
Dan Gohman dd550305e6 Give this test an explicit register allocator, so that it can work even if
the default register allocator is changed.

llvm-svn: 130883
2011-05-04 23:14:02 +00:00
Bill Wendling 2a40131f6b SjLj EH could produce a machine basic block that legitimately has more than one
landing pad as its successor.

SjLj exception handling jumps to the correct landing pad via a switch statement
that's generated right before code-gen. Loosen the constraint in the machine
instruction verifier to allow for this. Note, this isn't the most rigorous check
since we cannot determine where that switch statement came from. But it's
marginally better than turning this check off when SjLj exceptions are used.
<rdar://problem/9187612>

llvm-svn: 130881
2011-05-04 22:54:05 +00:00
Eli Friedman 0fe4608af2 Re-commit r130862 with a minor change to avoid an iterator running off the edge in some cases.
Original message:

Teach MachineCSE how to do simple cross-block CSE involving physregs.  This allows, for example, eliminating duplicate cmpl's on x86. Part of rdar://problem/8259436 .

llvm-svn: 130877
2011-05-04 22:10:36 +00:00
Galina Kistanova e53ae508ec This test fails on ARM. The test shouldn't explicitly specify alignment (and alignment 4 is wrong) and requires hard-float.
llvm-svn: 130875
2011-05-04 21:57:44 +00:00
Eli Friedman 3bd79ba856 Back out r130862; it appears to be breaking bootstrap.
llvm-svn: 130867
2011-05-04 20:48:42 +00:00
Eli Friedman a16fc2fec0 Teach MachineCSE how to do simple cross-block CSE involving physregs. This allows, for example, eliminating duplicate cmpl's on x86. Part of rdar://problem/8259436 .
llvm-svn: 130862
2011-05-04 19:54:24 +00:00
Jakob Stoklund Olesen 28a93a49bb Fix more register and coalescing dependencies.
llvm-svn: 130859
2011-05-04 19:02:11 +00:00
Jakob Stoklund Olesen d7fd7bfc31 Explicitly request physreg coalesing for a bunch of Thumb2 unit tests.
These tests all follow the same pattern:

	mov	r2, r0
	movs	r0, #0
	$CMP	r2, r1
	it	eq
	moveq	r0, #1
	bx	lr

The first 'mov' can be eliminated by rematerializing 'movs r0, #0' below the
test instruction:

	$CMP	r0, r1
	mov.w	r0, #0
	it	eq
	moveq	r0, #1
	bx	lr

So far, only physreg coalescing can do that. The register allocators won't yet
split live ranges just to eliminate copies. They can learn, but this particular
problem is not likely to show up in real code. It only appears because r0 is
used for both the function argument and return value.

llvm-svn: 130858
2011-05-04 19:02:07 +00:00
Jakob Stoklund Olesen e7528c45ea FileCheckize and break dependence on coalescing order.
llvm-svn: 130856
2011-05-04 19:02:01 +00:00
Jakob Stoklund Olesen 067ba3c23c Explicitly request -join-physregs for some tests that depend on it.
llvm-svn: 130855
2011-05-04 19:01:59 +00:00
Devang Patel 39ecf816c5 Do not emit location expression size twice.
llvm-svn: 130854
2011-05-04 19:00:57 +00:00
Akira Hatanaka 3bace5d223 Remove LLVM IR metadata in test case committed in r130847.
llvm-svn: 130849
2011-05-04 18:28:36 +00:00
Akira Hatanaka 23e8ecf125 Prevent instructions using $gp from being placed between a jalr and the instruction that restores the clobbered $gp.
llvm-svn: 130847
2011-05-04 17:54:27 +00:00
Jakob Stoklund Olesen f1b401800a Don't depend on the physreg coalescing order.
llvm-svn: 130818
2011-05-04 01:01:47 +00:00
Jakob Stoklund Olesen 5b5abb4ea1 Don't run this test through -regalloc=basic.
The basic allocator is really bad about hinting, so it doesn't eliminate all
copies when physreg joining is disabled.

llvm-svn: 130817
2011-05-04 01:01:44 +00:00
Jakob Stoklund Olesen d3b2f44c9d Fix register-dependent XCore tests
llvm-svn: 130816
2011-05-04 01:01:41 +00:00
Jakob Stoklund Olesen 7f7fc82141 Fix register-dependent test in MSP430.
llvm-svn: 130815
2011-05-04 01:01:39 +00:00
Jakob Stoklund Olesen 51b35f7bb1 Fix a bunch of ARM tests to be register allocation independent.
llvm-svn: 130800
2011-05-03 22:31:21 +00:00
Bill Wendling db0996c822 Replace the "movnt" intrinsics with a native store + nontemporal metadata bit.
<rdar://problem/8460511>

llvm-svn: 130791
2011-05-03 21:11:17 +00:00
Evan Cheng 93b5cdc5ab Make the test less likely to fail with minor changes.
llvm-svn: 130778
2011-05-03 19:09:32 +00:00
Bob Wilson c5242b0e78 Remove test for iOS divmod function, since that is disabled for now.
llvm-svn: 130769
2011-05-03 17:54:49 +00:00
Bruno Cardoso Lopes 168c9005b5 Add a few ARM coprocessor intrinsics. Testcases included
llvm-svn: 130763
2011-05-03 17:29:22 +00:00
Dan Gohman 6136e94897 Add an unfolded offset field to LSR's Formula record. This is used to
model constants which can be added to base registers via add-immediate
instructions which don't require an additional register to materialize
the immediate.

llvm-svn: 130743
2011-05-03 00:46:49 +00:00
Rafael Espindola 5164e6e8b2 Add 130690 back.
llvm-svn: 130693
2011-05-02 15:58:16 +00:00
Rafael Espindola a392475865 Revert while I debug the tests that use march but not mtriple.
llvm-svn: 130691
2011-05-02 15:42:31 +00:00
Rafael Espindola c2aad4f2a3 Move ppc OS X to cfi too. I am building it on an old ppc mini, but it will take some time.
llvm-svn: 130690
2011-05-02 15:00:52 +00:00
Rafael Espindola fc8223670a Add r130623 back now that ELF has been fixed to work with -fno-dwarf2-cfi-asm.
llvm-svn: 130658
2011-05-01 15:44:13 +00:00
Rafael Espindola 750cb61553 GCC uses a different encoding of pointers in the FDE when using
-fno-dwarf2-cfi-asm. Implement the same behavior.

llvm-svn: 130637
2011-05-01 04:49:54 +00:00
Rafael Espindola b7c2286055 Revert the previous patch while I figure out how to make llvm-gcc
less agressive about disabling cfi on linux :-(

llvm-svn: 130626
2011-04-30 23:03:44 +00:00
Rafael Espindola 5265bc483e Enable CFI on OS X.
Currently the output should be almost identical to the one produced by CodeGen
to make the transition easier.

The only two differences I know of are:

* Some files get an extra advance loc of size 0. This will be fixed when
relaxations are enabled.
* The optimization of declaring an EH symbol as an external variable is not
implemented. This is a subset of adding the nounwind attribute, so we if really
this at -O0 we should probably do it at the IL level.

llvm-svn: 130623
2011-04-30 22:29:54 +00:00
Jakob Stoklund Olesen f5eaa8dc62 Allow folded spills in test.
llvm-svn: 130599
2011-04-30 08:00:50 +00:00
Jakob Stoklund Olesen edfabc9aad Weekly fix of register allocation dependent unit tests.
llvm-svn: 130567
2011-04-30 01:37:52 +00:00
Eli Friedman 4105ed1523 Make FastEmit_ri_ try a bit harder to succeed for supported operations; FastEmit_i can fail for non-Thumb2 ARM. Makes ARMSimplifyAddress work correctly, and reduces the number of fast-isel bailouts on non-Thumb ARM.
llvm-svn: 130560
2011-04-29 23:34:52 +00:00
Eli Friedman 328bad02fa Switch to ImmLeaf (which can be used by FastISel) for a few more common ARM/Thumb2 patterns.
llvm-svn: 130552
2011-04-29 22:48:03 +00:00
Eli Friedman dd937843d3 Fix run-line, again. :(
llvm-svn: 130540
2011-04-29 21:33:03 +00:00
Eli Friedman 86caced370 Re-committing r130454, which does not in fact break anything.
Fix a rather obscure crash caused by ARM fast-isel generating code which redefines a register.
rdar://problem/9338332 .

llvm-svn: 130539
2011-04-29 21:22:56 +00:00
Eric Christopher 8d46b47787 Add trunc->branch support, this won't help with clang's i8->i1 truncations
for bools, but is a start.

llvm-svn: 130534
2011-04-29 20:02:39 +00:00
Rafael Espindola 697edc89a5 Change DwarfCFIException's member variables to track what it actually
emmits: .cfi_personality, .cfi_lsda and the moves.

llvm-svn: 130503
2011-04-29 14:48:51 +00:00
Andrew Trick e794e17524 Teach Thumb2 isel to fold and->rotr ==> ROR.
Generalization of Nate Begeman's patch!

llvm-svn: 130502
2011-04-29 14:18:15 +00:00
Andrew Trick 65266ed4d7 Combine thumb2-ror tests.
llvm-svn: 130498
2011-04-29 14:02:41 +00:00
Eli Friedman 517728b1ae Revert r130454; apparently this doesn't actually work.
llvm-svn: 130462
2011-04-28 23:55:14 +00:00
Eli Friedman 37b9ede969 Fix runline.
llvm-svn: 130455
2011-04-28 23:12:24 +00:00
Eli Friedman e4ecd42926 Fix a rather obscure crash caused by ARM fast-isel generating code which redefines a register.
rdar://problem/9338332 .

llvm-svn: 130454
2011-04-28 23:03:25 +00:00
Eli Friedman 7cd5101ad3 fast-isel sret calls, try 2. We actually do need to do something on x86-32. rdar://problem/9303592 .
llvm-svn: 130429
2011-04-28 20:19:12 +00:00
Eli Friedman 3cf6d4032a Actually revert r130348 correctly.
llvm-svn: 130418
2011-04-28 18:20:24 +00:00
Eli Friedman d5a80ca3c8 Revert r130348; causing buildbot issues on x86-32.
llvm-svn: 130412
2011-04-28 18:06:10 +00:00
Devang Patel 3e021533cd Teach dwarf writer to handle complex address expression for .debug_loc entries.
This fixes clang generated blocks' variables' debug info.
Radar 9279956.

llvm-svn: 130373
2011-04-28 02:22:40 +00:00
Eli Friedman 33c133919a Fix a silly mistake in r130338.
llvm-svn: 130360
2011-04-28 00:42:03 +00:00
Justin Holewinski 18e6ac83ea PTX: support for bitwise operations on predicates
- selection of bitwise preds (AND, OR, XOR)
- new bitwise.ll test

Patch by Dan Bailey

llvm-svn: 130353
2011-04-28 00:19:51 +00:00
Eli Friedman 8bd572fc58 fast-isel sret. We actually don't need to do anything special on x86. :) rdar://problem/9303592 .
llvm-svn: 130348
2011-04-27 23:58:52 +00:00
Eli Friedman 406c471b69 Make the fast-isel code for literal 0.0 a bit shorter/faster, since 0.0 is common. rdar://problem/9303592 .
llvm-svn: 130338
2011-04-27 22:41:55 +00:00
Evan Cheng 9808d31b9e If converter was being too cute. It look for root BBs (which don't have
successors) and use inverse depth first search to traverse the BBs. However
that doesn't work when the CFG has infinite loops. Simply do a linear
traversal of all BBs work just fine.

rdar://9344645

llvm-svn: 130324
2011-04-27 19:32:43 +00:00
Jakob Stoklund Olesen 71d3b895ba Also add <imp-def> operands for defined and dead super-registers when rewriting.
We cannot rely on the <imp-def> operands added by LiveIntervals in all cases as
demonstrated by the test case.

llvm-svn: 130313
2011-04-27 17:42:31 +00:00
Eli Friedman 0eea0293d9 Fix an edge case involving branches in fast-isel on x86.
rdar://problem/9303306 .

llvm-svn: 130272
2011-04-27 01:34:27 +00:00
Evan Cheng 1355bbdd11 Be careful about scheduling nodes above previous calls. It increase usages of
more callee-saved registers and introduce copies. Only allows it if scheduling
a node above calls would end up lessen register pressure.

Call operands also has added ABI restrictions for register allocation, so be
extra careful with hoisting them above calls.

rdar://9329627

llvm-svn: 130245
2011-04-26 21:31:35 +00:00
Evan Cheng dbb86b8108 This test should be in MC. It breaks with changes to scheduling / register allocation so it's being removed.
llvm-svn: 130243
2011-04-26 21:09:04 +00:00
Benjamin Kramer 1d4c835089 Force a triple on this test to unbreak windows buildbots.
llvm-svn: 130226
2011-04-26 18:47:43 +00:00
Dan Gohman 7da91aee83 Fast-isel support for simple inline asms.
llvm-svn: 130205
2011-04-26 17:18:34 +00:00
Rafael Espindola 580eebaa20 Add test for PR9743.
llvm-svn: 130198
2011-04-26 14:17:42 +00:00
Chris Lattner 189ca1498f don't emit the symbol name twice for local bss and common
symbols.  For example, don't emit:
        .comm   _i,4,2                  ## @i
                                        ## @i

instead emit:
        .comm   _i,4,2                  ## @i

llvm-svn: 130192
2011-04-26 06:14:13 +00:00
Eric Christopher 238a21f2d5 Make this test disable fast isel as it's not needed.
llvm-svn: 130165
2011-04-25 22:39:46 +00:00
Akira Hatanaka 0e7ee666b7 Lower BlockAddress node when relocation-model is static.
llvm-svn: 130131
2011-04-25 17:10:45 +00:00
Devang Patel 734f2218ac A dbg.declare may not be in entry block, even if it is referring to an incoming argument. However, It is appropriate to emit DBG_VALUE referring to this incoming argument in entry block in MachineFunction.
llvm-svn: 130129
2011-04-25 16:33:52 +00:00
Benjamin Kramer ba446cc12a Make tests more useful.
lit needs a linter ...

llvm-svn: 130126
2011-04-25 10:12:01 +00:00
Andrew Trick 76dca78cb4 Accidental function name mangling.
llvm-svn: 130050
2011-04-23 04:08:15 +00:00
Andrew Trick 0ed5778a1e Thumb2 and ARM add/subtract with carry fixes.
Fixes Thumb2 ADCS and SBCS lowering: <rdar://problem/9275821>.
t2ADCS/t2SBCS are now pseudo instructions, consistent with ARM, so the
assembly printer correctly prints the 's' suffix.

Fixes Thumb2 adde -> SBC matching to check for live/dead carry flags.

Fixes the internal ARM machine opcode mnemonic for ADCS/SBCS.
Fixes ARM SBC lowering to check for live carry (potential bug).

llvm-svn: 130048
2011-04-23 03:55:32 +00:00
Andrew Trick 1a1f8d4640 whitespace
llvm-svn: 130046
2011-04-23 03:24:11 +00:00
NAKAMURA Takumi 576273cf56 test/CodeGen/X86/shrink-compare.ll: Relax expressions for Win64.
llvm-svn: 130039
2011-04-23 00:15:45 +00:00
Chris Lattner 6d277517d1 Recommit the fix for rdar://9289512 with a couple tweaks to
fix bugs exposed by the gcc dejagnu testsuite:
1. The load may actually be used by a dead instruction, which
   would cause an assert.
2. The load may not be used by the current chain of instructions,
   and we could move it past a side-effecting instruction. Change
   how we process uses to define the problem away.

llvm-svn: 130018
2011-04-22 21:59:37 +00:00
Benjamin Kramer 341c11da3b DAGCombine: fold "(zext x) == C" into "x == (trunc C)" if the trunc is lossless.
On x86 this allows to fold a load into the cmp, greatly reducing register pressure.
  movzbl	(%rdi), %eax
  cmpl	$47, %eax
->
  cmpb	$47, (%rdi)

This shaves 8k off gcc.o on i386. I'll leave applying the patch in README.txt to Chris :)

llvm-svn: 130005
2011-04-22 18:47:44 +00:00
Benjamin Kramer 4c81624735 X86: Try to use a smaller encoding by transforming (X << C1) & C2 into (X & (C2 >> C1)) & C1. (Part of PR5039)
This tends to happen a lot with bitfield code generated by clang. A simple example for x86_64 is
uint64_t foo(uint64_t x) { return (x&1) << 42; }
which used to compile into bloated code:
	shlq	$42, %rdi               ## encoding: [0x48,0xc1,0xe7,0x2a]
	movabsq	$4398046511104, %rax    ## encoding: [0x48,0xb8,0x00,0x00,0x00,0x00,0x00,0x04,0x00,0x00]
	andq	%rdi, %rax              ## encoding: [0x48,0x21,0xf8]
	ret                             ## encoding: [0xc3]

with this patch we can fold the immediate into the and:
	andq	$1, %rdi                ## encoding: [0x48,0x83,0xe7,0x01]
	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
	shlq	$42, %rax               ## encoding: [0x48,0xc1,0xe0,0x2a]
	ret                             ## encoding: [0xc3]

It's possible to save another byte by using 'andl' instead of 'andq' but I currently see no way of doing
that without making this code even more complicated. See the TODOs in the code.

llvm-svn: 129990
2011-04-22 15:30:40 +00:00
Evan Cheng c0d2004e3c In Thumb2 mode, lower frame indix references to:
add <rd>, sp, #<imm8>
ldr <rd>, [sp, #<imm8>]
When the offset from sp is multiple of 4 and in range of 0-1020.
This saves code size by utilizing 16-bit instructions.

rdar://9321541

llvm-svn: 129971
2011-04-22 01:42:52 +00:00
Devang Patel 94ad6ac13c Fix DWARF description of Q registers.
llvm-svn: 129952
2011-04-21 23:22:35 +00:00
Devang Patel 3712c14be9 Fix DWARF description of S registers.
llvm-svn: 129947
2011-04-21 22:48:26 +00:00
Devang Patel be22131c28 Test case for r129922
llvm-svn: 129934
2011-04-21 20:16:43 +00:00
Daniel Dunbar 6309828206 Revert r1296656, "Fix rdar://9289512 - not folding load into compare at -O0...",
which broke a couple GCC test suite tests at -O0.

llvm-svn: 129914
2011-04-21 16:14:46 +00:00
Che-Liang Chiou 14c48e5d66 ptx: fix parameter ordering
This patch depends on the prior fix r129908 that changes to use std::find,
rather than std::binary_search, on unordered array.

Patch by Dan Bailey

llvm-svn: 129909
2011-04-21 10:56:58 +00:00
Evan Cheng 5f1ba4cd2d Remove -use-divmod-libcall. Let targets opt in when they are available.
llvm-svn: 129884
2011-04-20 22:20:12 +00:00
Stuart Hastings 1b06a10d62 Un-XFAIL this test for ARM. <rdar://problem/7662569>
llvm-svn: 129875
2011-04-20 21:47:45 +00:00
Justin Holewinski 7d8895e767 PTX: Add intrinsics to list of built-in intrinsics, which allows them to be
used by Clang.  To help Clang integration, the PTX target has been split
     into two targets: ptx32 and ptx64, depending on the desired pointer size.

- Add GCCBuiltin class to all intrinsics
- Split PTX target into ptx32 and ptx64

llvm-svn: 129851
2011-04-20 15:37:17 +00:00
Eric Christopher bcaedb5ce0 Rewrite the expander for umulo/smulo to remember to sign extend the input
manually and pass all (now) 4 arguments to the mul libcall. Add a new
ExpandLibCall for just this (copied gratuitously from type legalization).

Fixes rdar://9292577

llvm-svn: 129842
2011-04-20 01:19:45 +00:00
Daniel Dunbar ed3d5496dc llc: Eliminate a use of getDarwinMajorNumber().
- As before, there is a minor semantic change here (evidenced by the test
   change) for Darwin triples that have no version component. I debated changing
   the default behavior of isOSVersionLT, but decided it made more sense for
   triples to be explicit.

llvm-svn: 129805
2011-04-19 20:46:13 +00:00
Daniel Dunbar 4a7783b0c2 CodeGen: Eliminate a use of getDarwinMajorNumber().
- There is a minor semantic change here (evidenced by the test change) for
   Darwin triples that have no version component. I debated changing the default
   behavior of isOSVersionLT, but decided it made more sense for triples to be
   explicit.

llvm-svn: 129802
2011-04-19 20:32:39 +00:00
Bob Wilson 0858c3aaed This patch combines several changes from Evan Cheng for rdar://8659675.
Making use of VFP / NEON floating point multiply-accumulate / subtraction is
difficult on current ARM implementations for a few reasons.
1. Even though a single vmla has latency that is one cycle shorter than a pair
   of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause
   additional pipeline stall. So it's frequently better to single codegen
   vmul + vadd.
2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to
   stall for 4 cycles. We need to schedule them apart.
3. A vmla followed vmla is a special case. Obvious issuing back to back RAW
   vmla + vmla is very bad. But this isn't ideal either:
     vmul
     vadd
     vmla
   Instead, we want to expand the second vmla:
     vmla
     vmul
     vadd
   Even with the 4 cycle vmul stall, the second sequence is still 2 cycles
   faster.

Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough
but it isn't the optimial solution. This patch attempts to make it possible to
use vmla / vmls in cases where it is profitable.

A. Add missing isel predicates which cause vmla to be codegen'ed.
B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to
   compute a fmul and a fmla.
C. Add additional isel checks for vmla, avoid cases where vmla is feeding into
   fp instructions (except for the #3 exceptional case).
D. Add ARM hazard recognizer to model the vmla / vmls hazards.
E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the
   vmla / vmls will trigger one of the special hazards.

Enable these fp vmlx codegen changes for Cortex-A9.

llvm-svn: 129775
2011-04-19 18:11:57 +00:00
Bob Wilson d04a83f8f2 Add -mcpu=cortex-a9-mp. It's cortex-a9 with MP extension. rdar://8648637.
llvm-svn: 129774
2011-04-19 18:11:52 +00:00
Bob Wilson a2881ee8a4 Avoid some 's' 16-bit instruction which partially update CPSR
(and add false dependency) when it isn't dependent on last CPSR defining
instruction. rdar://8928208

llvm-svn: 129773
2011-04-19 18:11:49 +00:00
Bob Wilson df612ba006 Avoid write-after-write issue hazards for Cortex-A9.
Add a avoidWriteAfterWrite() target hook to identify register classes that
suffer from write-after-write hazards. For those register classes, try to avoid
writing the same register in two consecutive instructions.

This is currently disabled by default.  We should not spill to avoid hazards!
The command line flag -avoid-waw-hazard can be used to enable waw avoidance.

llvm-svn: 129772
2011-04-19 18:11:45 +00:00
Eli Friedman ee92a6b332 Add support for FastISel'ing varargs calls.
llvm-svn: 129765
2011-04-19 17:22:22 +00:00
Jakob Stoklund Olesen fb1249548f Tighten test case a bit.
Ideally, we would match an S-register to its containing D-register, but that
requires arithmetic (divide by 2).

llvm-svn: 129756
2011-04-19 06:14:45 +00:00
Chris Lattner 91328b317b Implement support for x86 fastisel of small fixed-sized memcpys, which are generated
en-mass for C++ PODs.  On my c++ test file, this cuts the fast isel rejects by 10x 
and shrinks the generated .s file by 5%

llvm-svn: 129755
2011-04-19 05:52:03 +00:00
Chris Lattner 5f4b783426 Implement support for fast isel of calls of i1 arguments, even though they are illegal,
when they are a truncate from something else.  This eliminates fully half of all the 
fastisel rejections on a test c++ file I'm working with, which should make a substantial
improvement for -O0 compile of c++ code.

This fixed rdar://9297003 - fast isel bails out on all functions taking bools

llvm-svn: 129752
2011-04-19 05:09:50 +00:00
Chris Lattner d7f7c93914 Handle i1/i8/i16 constant integer arguments to calls by prepromoting them.
Before we would bail out on i1 arguments all together, now we just bail on
non-constant ones.  Also, we used to emit extraneous code.  e.g. test12 was:

	movb	$0, %al
	movzbl	%al, %edi
	callq	_test12

and test13 was:
	movb	$0, %al
	xorl	%edi, %edi
	movb	%al, 7(%rsp)
	callq	_test13f

Now we get:

	movl	$0, %edi
	callq	_test12
and:
	movl	$0, %edi
	callq	_test13f

llvm-svn: 129751
2011-04-19 04:42:38 +00:00
Chris Lattner c59290a34c be layout aware, to produce:
testb	$1, %al
	je	LBB0_2
## BB#1:                                ## %if.then
	movb	$0, %al

instead of:

	testb	$1, %al
	jne	LBB0_1
	jmp	LBB0_2
LBB0_1:                                 ## %if.then
	movb	$0, %al

how 'bout that.

llvm-svn: 129749
2011-04-19 04:26:32 +00:00
Chris Lattner 2c8a4c3b1b fix rdar://9297006 - fast isel bails out on trunc to i1 -> bools cry,
a common cause of fast isel rejects on c++ code.

llvm-svn: 129748
2011-04-19 04:22:17 +00:00
Jakob Stoklund Olesen bf78618db6 Make tests register allocation independent again.
llvm-svn: 129739
2011-04-19 00:14:43 +00:00
Evan Cheng 4079133796 Do not lose mem_operands while lowering VLD / VST intrinsics.
llvm-svn: 129738
2011-04-19 00:04:03 +00:00
Eric Christopher c37aa0b26a Fix a bug where we were counting the alias sets as completely used
registers for fast allocation a different way. This has us updating
used registers only when we're using that exact register.

Fixes rdar://9207598

llvm-svn: 129711
2011-04-18 19:26:25 +00:00
Chris Lattner 48f75ad678 while we're at it, handle 'sdiv exact' of a power of 2 also,
this fixes a few rejects on c++ iterator loops.

llvm-svn: 129694
2011-04-18 07:00:40 +00:00
Chris Lattner 562d6e82bd fix rdar://9297011 - udiv by power of two causing fast-isel rejects
llvm-svn: 129693
2011-04-18 06:55:51 +00:00
Chris Lattner 07add49a4b Implement major new fastisel functionality: the matcher can now handle immediates with
value constraints on them (when defined as ImmLeaf's).  This is particularly important
for X86-64, where almost all reg/imm instructions take a i64immSExt32 immediate operand,
which has a value constraint.  Before this patch we ended up iseling the examples into
such amazing code as:

	movabsq	$7, %rax
	imulq	%rax, %rdi
	movq	%rdi, %rax
	ret

now we produce:

	imulq	$7, %rdi, %rax
	ret

This dramatically shrinks the generated code at -O0 on x86-64.

llvm-svn: 129691
2011-04-18 06:22:33 +00:00
Chris Lattner 353fda159d relax this test to just check that the lock prefix is encoded properly,
and to not rely on the register allocator's arbitrary operand choices.

llvm-svn: 129690
2011-04-18 06:15:35 +00:00
Chris Lattner b53ccb8e36 1. merge fast-isel-shift-imm.ll into fast-isel-x86-64.ll
2. implement rdar://9289501 - fast isel should fold trivial multiplies to shifts
3. teach tblgen to handle shift immediates that are different sizes than the 
   shifted operands, eliminating some code from the X86 fast isel backend.
4. Have FastISel::SelectBinaryOp use (the poorly named) FastEmit_ri_ function
   instead of FastEmit_ri to simplify code.

llvm-svn: 129666
2011-04-17 20:23:29 +00:00
Chris Lattner eb729d48ff fix an x86 fast isel issue where we'd completely give up on folding an address
when we have a global variable base an an index.  Instead, just give up on
folding the global variable.

Before we'd geenrate:

_test:                                  ## @test
## BB#0:
	movq	_rtx_length@GOTPCREL(%rip), %rax
	leaq	(%rax), %rax
	addq	%rdi, %rax
	movzbl	(%rax), %eax
	ret

now we generate:

_test:                                  ## @test
## BB#0:
	movq	_rtx_length@GOTPCREL(%rip), %rax
	movzbl	(%rax,%rdi), %eax
	ret

The difference is even more significant when there is a scale
involved.

This fixes rdar://9289558 - total fail with addr mode formation at -O0/x86-64

llvm-svn: 129664
2011-04-17 17:47:38 +00:00
Chris Lattner 4832660b4d fix an oversight which caused us to compile the testcase (and other
less trivial things) into a dummy lea.  Before we generated:

_test:                                  ## @test
	movq	_G@GOTPCREL(%rip), %rax
	leaq	(%rax), %rax
	ret

now we produce:

_test:                                  ## @test
	movq	_G@GOTPCREL(%rip), %rax
	ret

This is part of rdar://9289558

llvm-svn: 129662
2011-04-17 17:12:08 +00:00
Chris Lattner 045c43855c Fix rdar://9289512 - not folding load into compare at -O0
The basic issue here is that bottom-up isel is matching the branch
and compare, and was failing to fold the load into the branch/compare
combo.  Fixing this (by allowing folding into any instruction of a
sequence that is selected) allows us to produce things like:


cmpb    $0, 52(%rax)
je      LBB4_2

instead of:

movb    52(%rax), %cl
cmpb    $0, %cl
je      LBB4_2

This makes the generated -O0 code run a bit faster, but also speeds up
compile time by putting less pressure on the register allocator and 
generating less code.

This was one of the biggest classes of missing load folding.  Implementing
this shrinks 176.gcc's c-decl.s (as a random example) by about 4% in (verbose-asm)
line count.

llvm-svn: 129656
2011-04-17 06:35:44 +00:00
Eli Friedman 55f7bf3289 Remove working entry from README.
llvm-svn: 129654
2011-04-17 02:36:27 +00:00
Chris Lattner fba7ca63cc fix rdar://9289583 - fast isel should handle non-canonical commutative binops
allowing us to fold the immediate into the 'and' in this case:

int test1(int i) {
  return 8&i;
}

llvm-svn: 129653
2011-04-17 01:16:47 +00:00
Eli Friedman 55b0acd624 PR9055: extend the fix to PR4050 (r70179) to apply to zext and anyext.
Returning a new node makes the code try to replace the old node, which
in the included testcase is killed by CSE.

llvm-svn: 129650
2011-04-16 23:25:34 +00:00
Evan Cheng b14ce09fca Fix divmod libcall lowering. Convert to {S|U}DIVREM first and then expand the node to a libcall. rdar://9280991
llvm-svn: 129633
2011-04-16 03:08:26 +00:00
Akira Hatanaka 2cb3aa30dd Re-enable test o32_cc_vararg.ll.
llvm-svn: 129616
2011-04-15 22:23:09 +00:00
Cameron Zwarich 9c65e4d69c Add ORR and EOR to the CMP peephole optimizer. It's hard to get isel to generate
a case involving EOR, so I only added a test for ORR.

llvm-svn: 129610
2011-04-15 21:24:38 +00:00
Rafael Espindola 9fef721830 Add this test back for Darwin.
llvm-svn: 129607
2011-04-15 21:06:27 +00:00
Cameron Zwarich 0829b3065a The AND instruction leaves the V flag unmodified, so it falls victim to the same
problem as all of the other instructions we fold with CMPs.

llvm-svn: 129602
2011-04-15 20:45:00 +00:00
Cameron Zwarich 93eae1571c Add missing register forms of instructions to the ARM CMP-folding code. This
fixes <rdar://problem/9287901>.

llvm-svn: 129599
2011-04-15 20:28:28 +00:00
Akira Hatanaka 279169771b Add pass that expands pseudo instructions into target instructions after register allocation. Define pseudos that get expanded into mtc1 or mfc1 instructions.
llvm-svn: 129594
2011-04-15 19:52:08 +00:00
Rafael Espindola a01cdb0e37 Add 129518 back with a fix for when we are producing eh just because of debug info.
Change ELF systems to use CFI for producing the EH tables. This reduces the
size of the clang binary in Debug builds from 690MB to 679MB.

llvm-svn: 129571
2011-04-15 15:11:06 +00:00
NAKAMURA Takumi b5e3e9dd27 Revert r129518, "Change ELF systems to use CFI for producing the EH tables. This reduces the"
It broke several builds.

llvm-svn: 129557
2011-04-15 03:35:57 +00:00
Evan Cheng 12bb05b75b Fix another fcopysign lowering bug. If src is f64 and destination is f32, don't
forget to right shift the source by 32 first. rdar://9287902

llvm-svn: 129556
2011-04-15 01:31:00 +00:00
Michael J. Spencer 30088ba110 Add 3DNow! intrinsics.
llvm-svn: 129551
2011-04-15 00:32:41 +00:00
Evan Cheng 44887f9c7e Follow up on r127913. Fix Thumb revsh isel. rdar://9286766
llvm-svn: 129548
2011-04-14 23:27:44 +00:00
Rafael Espindola aa2a7cd828 Change ELF systems to use CFI for producing the EH tables. This reduces the
size of the clang binary in Debug builds from 690MB to 679MB.

llvm-svn: 129518
2011-04-14 15:18:53 +00:00
Andrew Trick bfbd972b1f In the pre-RA scheduler, maintain cmp+br proximity.
This is done by pushing physical register definitions close to their
use, which happens to handle flag definitions if they're not glued to
the branch. This seems to be generally a good thing though, so I
didn't need to add a target hook yet.

The primary motivation is to generate code closer to what people
expect and rule out missed opportunity from enabling macro-op
fusion. As a side benefit, we get several 2-5% gains on x86
benchmarks. There is one regression:
SingleSource/Benchmarks/Shootout/lists slows down be -10%. But this is
an independent scheduler bug that will be tracked separately.
See rdar://problem/9283108.

Incidentally, pre-RA scheduling is only half the solution. Fixing the
later passes is tracked by:
<rdar://problem/8932804> [pre-RA-sched] on x86, attempt to schedule CMP/TEST adjacent with condition jump

Fixes:
<rdar://problem/9262453> Scheduler unnecessary break of cmp/jump fusion

llvm-svn: 129508
2011-04-14 05:15:06 +00:00
Bill Wendling 410ec4aad1 As Dan pointed out, movzbl, movsbl, and friends are nicer than their alias
(movzx/movsx) because they give more information. Revert that part of the patch.

llvm-svn: 129498
2011-04-14 01:46:37 +00:00
Bill Wendling 7e07d6fb69 Have the X86 back-end emit the alias instead of what's being aliased. In most
cases, it's much nicer and more informative reading the alias.

llvm-svn: 129497
2011-04-14 01:11:51 +00:00
Cameron Zwarich 415b5e8341 Fix a typo in an ARM-specific DAG combine. This fixes <rdar://problem/9278274>.
llvm-svn: 129468
2011-04-13 21:01:19 +00:00
Cameron Zwarich 9398197ef1 Fix a regression caused by r102515 where explicit alignment on globals is
ignored. There was a test to catch this, but it was just blindly updated in
a large change. This fixes another part of <rdar://problem/9275290>.

llvm-svn: 129466
2011-04-13 20:36:04 +00:00
Cameron Zwarich 70be27e913 Fix an obvious problem with an alignment computation. AsmPrinter actually does
the max itself, so it is not easy to write a test case for this, but I added a
test case that would fail if the code in AsmPrinter were removed.

llvm-svn: 129432
2011-04-13 09:02:43 +00:00
Cameron Zwarich cdf59f7016 If a global variable has a specified alignment that is less than the preferred
alignment for its type, use the minimum of the specified alignment and the ABI
alignment. This fixes <rdar://problem/9275290>.

llvm-svn: 129428
2011-04-13 06:03:16 +00:00
Andrew Trick b53a00d2cb Recommit r129383. PreRA scheduler heuristic fixes: VRegCycle, TokenFactor latency.
Additional fixes:
Do something reasonable for subtargets with generic
itineraries by handle node latency the same as for an empty
itinerary. Now nodes default to unit latency unless an itinerary
explicitly specifies a zero cycle stage or it is a TokenFactor chain.

Original fixes:
UnitsSharePred was a source of randomness in the scheduler: node
priority depended on the queue data structure. I rewrote the recent
VRegCycle heuristics to completely replace the old heuristic without
any randomness. To make the ndoe latency adjustments work, I also
needed to do something a little more reasonable with TokenFactor. I
gave it zero latency to its consumers and always schedule it as low as
possible.

llvm-svn: 129421
2011-04-13 00:38:32 +00:00
Bill Wendling b902f1dd88 Reapply r129401 with patch for clang.
llvm-svn: 129419
2011-04-13 00:36:11 +00:00
Eric Christopher 28f4c729f7 Temporarily revert r129408 to see if it brings the bots back.
llvm-svn: 129417
2011-04-13 00:20:59 +00:00
Eric Christopher d829f43c06 Fix a bug where we were counting the alias sets as completely used
registers for fast allocation.

Fixes rdar://9207598

llvm-svn: 129408
2011-04-12 23:23:14 +00:00
Bill Wendling dbfde42468 Revert r129401 for now. Clang is using the old way of doing things.
llvm-svn: 129403
2011-04-12 22:59:27 +00:00
Bill Wendling 47c24875a1 Remove the unaligned load intrinsics in favor of using native unaligned loads.
Now that we have a first-class way to represent unaligned loads, the unaligned
load intrinsics are superfluous.

First part of <rdar://problem/8460511>.

llvm-svn: 129401
2011-04-12 22:46:31 +00:00
Andrew Trick 1b60ad6644 Revert 129383. It causes some targets to hit a scheduler assert.
llvm-svn: 129385
2011-04-12 20:14:07 +00:00
Andrew Trick c5dd24a542 PreRA scheduler heuristic fixes: VRegCycle, TokenFactor latency.
UnitsSharePred was a source of randomness in the scheduler: node
priority depended on the queue data structure. I rewrote the recent
VRegCycle heuristics to completely replace the old heuristic without
any randomness. To make these heuristic adjustments to node latency work,
I also needed to do something a little more reasonable with TokenFactor. I
gave it zero latency to its consumers and always schedule it as low as
possible.

llvm-svn: 129383
2011-04-12 19:54:36 +00:00
Cameron Zwarich fbcd69b96a Split a store of a VMOVDRR into two integer stores to avoid mixing NEON and ARM
stores of arguments in the same cache line. This fixes the second half of
<rdar://problem/8674845>.

llvm-svn: 129345
2011-04-12 02:24:17 +00:00
Wesley Peck 1914c39bd4 Add scheduling information for the MBlaze backend.
llvm-svn: 129311
2011-04-11 22:31:52 +00:00
Evan Cheng ef42bea704 Look pass copies when determining whether hoisting would end up inserting more copies. rdar://9266679
llvm-svn: 129297
2011-04-11 21:09:18 +00:00
Chris Lattner 214f114aa7 look for the verboten argument slot access in any order, thanks to Frits
for pointing this out

llvm-svn: 129217
2011-04-09 17:00:34 +00:00
Chris Lattner af1bccec68 Fix a bug where RecursivelyDeleteTriviallyDeadInstructions could
delete the instruction pointed to by CGP's current instruction
iterator, leading to a crash on the testcase.  This fixes PR9578.

llvm-svn: 129200
2011-04-09 07:05:44 +00:00
Chris Lattner 418b1037b0 fix two completely broken tests, which were matching due to PR9629.
llvm-svn: 129195
2011-04-09 06:34:38 +00:00
Chris Lattner ea6afab4b0 remove a bunch of CHECK lines that aren't checking what
they thought they were, because alternation was expanding
wrong in {{}}'s.

llvm-svn: 129194
2011-04-09 06:31:06 +00:00
Chris Lattner 41c80e89f3 have dag combine zap "store undef", which can be formed during call lowering
with undef arguments.

llvm-svn: 129185
2011-04-09 02:32:02 +00:00
Chris Lattner 1c42a4d159 don't test for codegen of 'store undef'
llvm-svn: 129184
2011-04-09 02:31:26 +00:00
Evan Cheng 74d92c1924 Change -arm-trap-func= into a non-arm specific option. Now Intrinsic::trap is lowered into a call to the specified trap function at sdisel time.
llvm-svn: 129152
2011-04-08 21:37:21 +00:00
Evan Cheng 9a3f2772f0 Add option to emit @llvm.trap as a function call instead of a trap instruction. rdar://9249183.
llvm-svn: 129107
2011-04-07 20:31:12 +00:00
Andrew Trick 2ad0b37318 Added a check in the preRA scheduler for potential interference on a
induction variable. The preRA scheduler is unaware of induction vars,
so we look for potential "virtual register cycles" instead.

Fixes <rdar://problem/8946719> Bad scheduling prevents coalescing

llvm-svn: 129100
2011-04-07 19:54:57 +00:00
Akira Hatanaka d6f1c58914 Fix handling of functions with internal linkage.
llvm-svn: 129099
2011-04-07 19:51:44 +00:00
Tanya Lattner 266792a55a Prevent ARM DAG Combiner from doing an AND or OR combine on an illegal vector type (vectors of size 3). Also included test cases.
llvm-svn: 129074
2011-04-07 15:24:20 +00:00
Evan Cheng a7c7b54dde Change -arm-divmod-libcall to a target neutral option.
llvm-svn: 129045
2011-04-07 00:58:44 +00:00
Owen Anderson bdff1c997a Teach the ARM peephole optimizer that RSB, RSC, ADC, and SBC can be used for folded comparisons, just like ADD and SUB.
llvm-svn: 129038
2011-04-06 23:35:59 +00:00
Jakob Stoklund Olesen 1ec41e2bd9 These tests no longer require linear scan because reserved register coalescing is now universal.
llvm-svn: 128936
2011-04-05 21:40:41 +00:00
Jakob Stoklund Olesen 6aa0fbf4c0 Run LiveDebugVariables in RegAllocBasic and RegAllocGreedy.
llvm-svn: 128935
2011-04-05 21:40:37 +00:00
Jakob Stoklund Olesen e20fec7732 Fix one more batch of X86 tests to be register allocation dependent.
llvm-svn: 128919
2011-04-05 20:20:30 +00:00
Jakob Stoklund Olesen 18fd84c79a When dead code elimination removes all but one use, try to fold the single def into the remaining use.
Rematerialization can leave single-use loads behind that we might as well fold whenever possible.

llvm-svn: 128918
2011-04-05 20:20:26 +00:00
Johnny Chen 293875ef55 Fix test-llvm failures.
llvm-svn: 128906
2011-04-05 18:41:40 +00:00
Stuart Hastings 345094777f ARM doesn't support byval yet. XFAIL this test until it does.
llvm-svn: 128891
2011-04-05 17:16:21 +00:00
Jakob Stoklund Olesen 76ad3debab Ensure all defs referring to a virtual register are marked dead by addRegisterDead().
There can be multiple defs for a single virtual register when they are defining
sub-registers.

The missing <dead> flag was stopping the inline spiller from eliminating dead
code after rematerialization.

llvm-svn: 128888
2011-04-05 16:53:50 +00:00
Rafael Espindola 7dd4d6e2e8 Print visibility info for external variables.
llvm-svn: 128887
2011-04-05 15:51:32 +00:00
Eric Christopher f392a69ff7 Fix up testcase for previous commit.
llvm-svn: 128870
2011-04-05 00:56:01 +00:00
Jakob Stoklund Olesen bd09d45489 Fix register-dependent X86 tests.
llvm-svn: 128867
2011-04-05 00:32:44 +00:00
Jakob Stoklund Olesen 2e85396509 Allow coalescing with reserved physregs in certain cases:
When a virtual register has a single value that is defined as a copy of a
reserved register, permit that copy to be joined. These virtual register are
usually copies of the stack pointer:

  %vreg75<def> = COPY %ESP; GR32:%vreg75
  MOV32mr %vreg75, 1, %noreg, 0, %noreg, %vreg74<kill>
  MOV32mi %vreg75, 1, %noreg, 8, %noreg, 0
  MOV32mi %vreg75<kill>, 1, %noreg, 4, %noreg, 0
  CALLpcrel32 ...

Coalescing these virtual registers early decreases register pressure.
Previously, they were coalesced by RALinScan::attemptTrivialCoalescing after
register allocation was completed.

The lower register pressure causes the mcinst-lowering-cmp0.ll test case to fail
because it depends on linear scan spilling a particular register.

I am deleting 2008-08-05-SpillerBug.ll because it is counting the number of
instructions emitted, and its revision history shows the 'correct' count being
edited many times.

llvm-svn: 128845
2011-04-04 21:00:03 +00:00
Jakob Stoklund Olesen 8296e30627 Disable the PowerPC/Atomics-64 test.
The code inserted by PPCTargetLowering::EmitInstrWithCustomInserter for ppc64 is
wrong, and I don't know how to fix it. It seems to be using the correct register
classes for pointers, but it inserts all 32-bit instructions.

llvm-svn: 128835
2011-04-04 17:57:26 +00:00
Jakob Stoklund Olesen 218661346a Fix PowerPC tests to be register allocator independent.
llvm-svn: 128827
2011-04-04 17:07:03 +00:00
Che-Liang Chiou e34b271718 ptx: support setp's 4-operand format
llvm-svn: 128767
2011-04-02 08:51:39 +00:00
Cameron Zwarich 6fe5c29430 Do some peephole optimizations to remove pointless VMOVs from Neon to integer
registers that arise from argument shuffling with the soft float ABI. These
instructions are particularly slow on Cortex A8. This fixes one half of
<rdar://problem/8674845>.

llvm-svn: 128759
2011-04-02 02:40:43 +00:00
Jim Grosbach 360c369967 LDRD/STRD instructions should print both Rt and Rt2 in the asm string.
llvm-svn: 128736
2011-04-01 20:26:57 +00:00
Akira Hatanaka 93f898f643 Add code for analyzing FP branches. Clean up branch Analysis functions.
llvm-svn: 128718
2011-04-01 17:39:08 +00:00
Evan Cheng a6a992a662 Add test case.
llvm-svn: 128707
2011-04-01 06:27:25 +00:00
Evan Cheng 0f86d6de50 FileCheck'ify test.
llvm-svn: 128706
2011-04-01 03:36:33 +00:00
Jakob Stoklund Olesen 100f53fd25 Fix Thumb and Thumb2 tests to be register allocator independent.
llvm-svn: 128690
2011-03-31 23:31:50 +00:00
Jakob Stoklund Olesen 0709342652 Provide a legal pointer register class when targeting thumb1.
The LocalStackSlotAllocation pass was creating illegal registers.

llvm-svn: 128687
2011-03-31 23:02:15 +00:00
Jakob Stoklund Olesen 903baeac27 Fix SystemZ tests
llvm-svn: 128686
2011-03-31 23:02:12 +00:00
Jakob Stoklund Olesen 0888bcf542 Fix ARM tests to be register allocator independent.
llvm-svn: 128680
2011-03-31 22:14:03 +00:00
Evan Cheng 38bf5adcea Distribute (A + B) * C to (A * C) + (B * C) to make use of NEON multiplier
accumulator forwarding:
vadd d3, d0, d1
vmul d3, d3, d2
=>
vmul d3, d0, d2
vmla d3, d1, d2

llvm-svn: 128665
2011-03-31 19:38:48 +00:00
Jakob Stoklund Olesen f4c9754d5c Fix Mips, Sparc, and XCore tests that were dependent on register allocation.
Add an extra run with -regalloc=basic to keep them honest.

llvm-svn: 128654
2011-03-31 18:42:43 +00:00
Akira Hatanaka a535270d91 Added support for FP conditional move instructions and fixed bugs in handling of FP comparisons.
llvm-svn: 128650
2011-03-31 18:26:17 +00:00
Jakob Stoklund Olesen e6e6750670 Don't completely eliminate identity copies that also modify super register liveness.
Turn them into noop KILL instructions instead. This lets the scavenger know when
super-registers are killed and defined.

llvm-svn: 128645
2011-03-31 17:55:25 +00:00
Jakob Stoklund Olesen 9a78835414 Mark all uses as <undef> when joining a copy.
This way, shrinkToUses() will ignore the instruction that is about to be
deleted, and we avoid leaving invalid live ranges that SplitKit doesn't like.

Fix a misunderstanding in MachineVerifier about <def,undef> operands. The
<undef> flag is valid on def operands where it has the same meaning as <undef>
on a use operand. It only applies to sub-register defines which also read the
full register.

llvm-svn: 128642
2011-03-31 17:23:25 +00:00
Richard Osborne 9a827b30ab Add XCore intrinsics for initializing / starting / synchronizing threads.
llvm-svn: 128633
2011-03-31 15:13:13 +00:00
Jakob Stoklund Olesen ae044c06bf Pick a conservative register class when creating a small live range for remat.
The rematerialized instruction may require a more constrained register class
than the register being spilled. In the test case, the spilled register has been
inflated to the DPR register class, but we are rematerializing a load of the
ssub_0 sub-register which only exists for DPR_VFP2 registers.

The register class is reinflated after spilling, so the conservative choice is
only temporary.

llvm-svn: 128610
2011-03-31 03:54:44 +00:00
Evan Cheng ee9d45dd55 Don't try to create zero-sized stack objects.
llvm-svn: 128586
2011-03-30 23:44:13 +00:00
Cameron Zwarich 53dd03d537 Add a ARM-specific SD node for VBSL so that forms with a constant first operand
can be recognized. This fixes <rdar://problem/9183078>.

llvm-svn: 128584
2011-03-30 23:01:21 +00:00
Evan Cheng 18381b4257 Add intrinsics @llvm.arm.neon.vmulls and @llvm.arm.neon.vmullu.* back. Frontends
was lowering them to sext / uxt + mul instructions. Unfortunately the
optimization passes may hoist the extensions out of the loop and separate them.
When that happens, the long multiplication instructions can be broken into
several scalar instructions, causing significant performance issue.

Note the vmla and vmls intrinsics are not added back. Frontend will codegen them
as intrinsics vmull* + add / sub. Also note the isel optimizations for catching
mul + sext / zext are not changed either.

First part of rdar://8832507, rdar://9203134

llvm-svn: 128502
2011-03-29 23:06:19 +00:00
Cameron Zwarich 143f9aea2b Add Neon SINT_TO_FP and UINT_TO_FP lowering from v4i16 to v4f32. Fixes
<rdar://problem/8875309> and <rdar://problem/9057191>.

llvm-svn: 128492
2011-03-29 21:41:55 +00:00
Rafael Espindola 6b2fac21ca Reduce test case.
llvm-svn: 128445
2011-03-29 02:18:54 +00:00
Evan Cheng e2086e740f Optimizing (zext A + zext B) * C, to (VMULL A, C) + (VMULL B, C) during
isel lowering to fold the zero-extend's and take advantage of no-stall
back to back vmul + vmla:
 vmull q0, d4, d6
 vmlal q0, d5, d6
is faster than
 vaddl q0, d4, d5
 vmovl q1, d6                                                                                                                                                                             
 vmul  q0, q0, q1

This allows us to vmull + vmlal for:
    f = vmull_u8(   vget_high_u8(s), c);
    f = vmlal_u8(f, vget_low_u8(s),  c);

rdar://9197392

llvm-svn: 128444
2011-03-29 01:56:09 +00:00
Bill Wendling 96f962fdff In some cases, the "fail BB dominator" may be null after the BB was split (and
becomes reachable when before it wasn't). Check to make sure that it's not null
before trying to use it.

llvm-svn: 128434
2011-03-28 23:02:18 +00:00
Jakob Stoklund Olesen 9a624fa993 Collect and coalesce DBG_VALUE instructions before emitting the function.
Correctly terminate the range of register DBG_VALUEs when the register is
clobbered or when the basic block ends.

The code is now ready to deal with variables that are sometimes in a register
and sometimes on the stack. We just need to teach emitDebugLoc to say 'stack
slot'.

llvm-svn: 128327
2011-03-26 02:19:36 +00:00
Eric Christopher d553096688 Fix the bfi handling for or (and a mask) (and b mask). We need the two
masks to match inversely for the code as is to work. For the example given
we actually want:

bfi r0, r2, #1, #1

not #0, however, given the way the pattern is written it's not possible
at the moment.

Fixes rdar://9177502

llvm-svn: 128320
2011-03-26 01:21:03 +00:00
Jakob Stoklund Olesen 1886a4c823 Emit less labels for debug info and stop emitting .loc directives for DBG_VALUEs.
The .dot directives don't need labels, that is a leftover from when we created
line number info manually.

Instructions following a DBG_VALUE can share its label since the DBG_VALUE
doesn't produce any code.

llvm-svn: 128284
2011-03-25 17:20:59 +00:00
Devang Patel 71536de752 Move test in x86 specific area.
llvm-svn: 128245
2011-03-24 22:39:09 +00:00
Devang Patel e01b75cb89 Keep track of directory namd and fIx regression caused by Rafael's patch r119613.
A better approach would be to move source id handling inside MC.

llvm-svn: 128233
2011-03-24 20:30:50 +00:00
NAKAMURA Takumi 521eb7c11e Target/X86: [PR8777][PR8778] Tweak alloca/chkstk for Windows targets.
FIXME: Some cleanups would be needed.
llvm-svn: 128206
2011-03-24 07:07:00 +00:00
Cameron Zwarich 4649f17db1 Do early taildup of ret in CodeGenPrepare for potential tail calls that have a
void return type. This fixes PR9487.

llvm-svn: 128197
2011-03-24 04:52:10 +00:00
Devang Patel abc77347a7 Enable GlobalMerge on darwin.
llvm-svn: 128183
2011-03-23 23:34:19 +00:00
Andrew Trick 4ab9a16569 Revert r128175.
I'm backing this out for the second time. It was supposed to be fixed by r128164, but the mingw self-host must be defeating the fix.

llvm-svn: 128181
2011-03-23 23:11:02 +00:00
Evan Cheng 425489d397 Cmp peephole optimization isn't always safe for signed arithmetics.
int tries = INT_MAX;    
while (tries > 0) {
      tries--;
}

The check should be:
        subs    r4, #1
        cmp     r4, #0
        bgt     LBB0_1

The subs can set the overflow V bit when r4 is INT_MAX+1 (which loop
canonicalization apparently does in this case). cmp #0 would have cleared
it while not changing the N and Z bits. Since BGT is dependent on the V
bit, i.e. (N == V) && !Z, it is not safe to eliminate the cmp #0.

rdar://9172742

llvm-svn: 128179
2011-03-23 22:52:04 +00:00
Eli Friedman 4c192305bf PR9535: add support for splitting and scalarizing vector ISD::FP_ROUND.
Also cleaning up some duplicated code while I'm here.

llvm-svn: 128176
2011-03-23 22:18:48 +00:00
Andrew Trick 4046a0de91 Reapply Eli's r127852 now that the pre-RA scheduler can spill EFLAGS.
(target-specific branchless method for double-width relational comparisons on x86)

llvm-svn: 128175
2011-03-23 22:16:02 +00:00
Jakob Stoklund Olesen ec0ac3ca40 Reapply r128045 and r128051 with fixes.
This will extend the ranges of debug info variables in registers until they are
clobbered.

Fix 1: Don't mistake DBG_VALUE instructions referring to incoming arguments on
the stack with DBG_VALUE instructions referring to variables in the frame
pointer. This fixes the gdb test-suite failure.

Fix 2: Don't trace through copies to physical registers setting up call
arguments. These registers are call clobbered, and the source register is more
likely to be a callee-saved register that can be extended through the call
instruction.

llvm-svn: 128114
2011-03-22 22:33:08 +00:00
Andrew Trick b0f98bb5e9 Revert r128045 and r128051, debug info enhancements.
Temporarily reverting these to see if we can get llvm-objdump to link. Hopefully this is not the problem.

llvm-svn: 128097
2011-03-22 19:18:42 +00:00
Che-Liang Chiou 7413080cea ptx: add analyze/insert/remove branch
llvm-svn: 128084
2011-03-22 14:12:00 +00:00
Jakob Stoklund Olesen 9c057ee440 Dont emit 'DBG_VALUE %noreg, ...' to terminate user variable ranges.
These ranges get completely jumbled by the post-ra scheduler, and it is not
really reasonable to expect it to make sense of them.

Instead, teach DwarfDebug to notice when user variables in registers are
clobbered, and terminate the ranges there.

llvm-svn: 128045
2011-03-22 00:21:41 +00:00
Dan Gohman c1783b31a4 Fix fast-isel address mode folding to avoid folding instructions
outside of the current basic block. This fixes PR9500, rdar://9156159.

llvm-svn: 128041
2011-03-22 00:04:35 +00:00
Rafael Espindola 1557fd6d39 Write the section table and the section data in the same order that
gun as does. This makes it a lot easier to compare the output of both
as the addresses are now a lot closer.

llvm-svn: 127972
2011-03-20 18:44:20 +00:00
Daniel Dunbar 327cd36f74 Revert r127953, "SimplifyCFG has stopped duplicating returns into predecessors
to canonicalize IR", it broke a lot of things.

llvm-svn: 127954
2011-03-19 21:47:14 +00:00
Evan Cheng 824a711305 SimplifyCFG has stopped duplicating returns into predecessors to canonicalize IR
to have single return block (at least getting there) for optimizations. This
is general goodness but it would prevent some tailcall optimizations.
One specific case is code like this:
int f1(void);
int f2(void);
int f3(void);
int f4(void);
int f5(void);
int f6(void);
int foo(int x) {
  switch(x) {
  case 1: return f1();
  case 2: return f2();
  case 3: return f3();
  case 4: return f4();
  case 5: return f5();
  case 6: return f6();
  }
}

=>
LBB0_2:                                 ## %sw.bb
  callq   _f1
  popq    %rbp
  ret
LBB0_3:                                 ## %sw.bb1
  callq   _f2
  popq    %rbp
  ret
LBB0_4:                                 ## %sw.bb3
  callq   _f3
  popq    %rbp
  ret

This patch teaches codegenprep to duplicate returns when the return value
is a phi and where the phi operands are produced by tail calls followed by
an unconditional branch:

sw.bb7:                                           ; preds = %entry
  %call8 = tail call i32 @f5() nounwind
  br label %return
sw.bb9:                                           ; preds = %entry
  %call10 = tail call i32 @f6() nounwind
  br label %return
return:
  %retval.0 = phi i32 [ %call10, %sw.bb9 ], [ %call8, %sw.bb7 ], ... [ 0, %entry ]
  ret i32 %retval.0

This allows codegen to generate better code like this:

LBB0_2:                                 ## %sw.bb
        jmp     _f1                     ## TAILCALL
LBB0_3:                                 ## %sw.bb1
        jmp     _f2                     ## TAILCALL
LBB0_4:                                 ## %sw.bb3
        jmp     _f3                     ## TAILCALL

rdar://9147433

llvm-svn: 127953
2011-03-19 17:17:39 +00:00
Nadav Rotem e7a101ccab Add support for legalizing UINT_TO_FP of vectors on platforms which do
not have native support for this operation (such as X86).
The legalized code uses two vector INT_TO_FP operations and is faster
than scalarizing.

llvm-svn: 127951
2011-03-19 13:09:10 +00:00
Andrew Trick e7537a0187 FileCheckize a test.
(one-by-one until valgrind is happy)

llvm-svn: 127925
2011-03-19 00:41:39 +00:00
Evan Cheng dc1d626a3d Match a few more obvious patterns to revsh. rdar://9147637.
llvm-svn: 127913
2011-03-18 21:52:42 +00:00
Eli Friedman 59721e3238 Revert r127852; it's apparently causing an ICE on mingw.
llvm-svn: 127909
2011-03-18 21:12:29 +00:00
Justin Holewinski 0984dcc077 PTX: Fix various codegen issues
- Emit mad instead of mad.rn for shader model 1.0
- Emit explicit mov.u32 instructions for reading global variables
- (most PTX instructions cannot take global variable immediates)

llvm-svn: 127895
2011-03-18 19:24:28 +00:00
Che-Liang Chiou b1df0fe1cc ptx: fix parameter order that is reversed
llvm-svn: 127874
2011-03-18 11:23:56 +00:00
Che-Liang Chiou ff9d938e33 ptx: add unconditional and conditional branch
llvm-svn: 127873
2011-03-18 11:08:52 +00:00