Commit Graph

15803 Commits

Author SHA1 Message Date
NAKAMURA Takumi 0e60839e11 llvm/test/CodeGen/X86/zext-fold.ll: Relax an expression in stack offset.
llvm-svn: 147928
2012-01-11 07:34:22 +00:00
NAKAMURA Takumi 9e823f6ae1 llvm/test/CodeGen/X86/sub-with-overflow.ll: Add explicit -mtriple=i686-linux.
llvm-svn: 147927
2012-01-11 07:34:14 +00:00
Rafael Espindola 647841b181 Add big endian mips support. Based on a patch by Jack Carter.
llvm-svn: 147924
2012-01-11 04:04:14 +00:00
Rafael Espindola 870c4e92b9 Add the skeleton of an asm parser for mips.
llvm-svn: 147923
2012-01-11 03:56:41 +00:00
Andrew Trick 642f0f6a40 ARM Ld/St Optimizer fix.
Allow LDRD to be formed from pairs with different LDR encodings. This was the original intention of the pass. Somewhere along the way, the LDR opcodes were refined which broke the optimization. We really don't care what the original opcodes are as long as they both map to the same LDRD and the immediate still fits.

Fixes rdar://10435045 ARMLoadStoreOptimization cannot handle mixed LDRi8/LDRi12

llvm-svn: 147922
2012-01-11 03:56:08 +00:00
Jakob Stoklund Olesen 05ff7f06fb Disable test that seems to expose an unrelated Linux issue.
llvm-svn: 147921
2012-01-11 03:42:27 +00:00
Jakob Stoklund Olesen 8b1d023a4a Detect when a value is undefined on an edge to a landing pad.
Consider this code:

int h() {
  int x;
  try {
    x = f();
    g();
  } catch (...) {
    return x+1;
  }
  return x;
}

The variable x is undefined on the first edge to the landing pad, but it
has the f() return value on the second edge to the landing pad.

SplitAnalysis::getLastSplitPoint() would assume that the return value
from f() was live into the landing pad when f() throws, which is of
course impossible.

Detect these cases, and treat them as if the landing pad wasn't there.
This allows spill code to be inserted after the function call to f().

<rdar://problem/10664933>

llvm-svn: 147912
2012-01-11 02:07:05 +00:00
Bill Wendling c79155192d If the global variable is removed by the linker, then don't constant merge it
with other symbols.

An object in the __cfstring section is suppoed to be filled with CFString
objects, which have a pointer to ___CFConstantStringClassReference followed by a
pointer to a __cstring. If we allow the object in the __cstring section to be
merged with another global, then it could end up in any section. Because the
linker is going to remove these symbols in the final executable, we shouldn't
bother to merge them.
<rdar://problem/10564621>

llvm-svn: 147899
2012-01-11 00:13:08 +00:00
Eric Christopher 43a1182975 Don't avoid recursing for pointer types, just reference types. Expand on
the comment.

Fixes constvars.exp on the gdb test builder.

llvm-svn: 147897
2012-01-11 00:01:29 +00:00
Chad Rosier a415140a2c Add test case for r147881.
llvm-svn: 147891
2012-01-10 23:09:53 +00:00
Joerg Sonnenberger 96cd35cf6d Default stack alignment for 32bit x86 should be 4 Bytes, not 8 Bytes.
Add a test that checks the stack alignment of a simple function for
Darwin, Linux and NetBSD for 32bit and 64bit mode.

llvm-svn: 147888
2012-01-10 22:43:53 +00:00
Jakob Stoklund Olesen 20f1dd5faf Consider unknown alignment caused by OptimizeThumb2Instructions().
This function runs after all constant islands have been placed, and may
shrink some instructions to their 2-byte forms.  This can actually cause
some constant pool entries to move out of range because of growing
alignment padding.

Treat instructions that may be shrunk the same as inline asm - they
erode the known alignment bits.

Also reinstate an old assertion in verify(). It is correct now that
basic block offsets include alignments.

Add a single large test case that will hopefully exercise many parts of
the constant island pass.

<rdar://problem/10670199>

llvm-svn: 147885
2012-01-10 22:32:14 +00:00
Jim Grosbach 74ac7d50a1 ARM updating VST2 pseudo-lowering fixed vs. register update.
rdar://10663487

llvm-svn: 147876
2012-01-10 21:11:12 +00:00
Kevin Enderby 8d4a2204b7 Various crash reporting tools have a problem with the dwarf generated for
assembly source when it generates the TAG_subprogram dwarf debug info for
the labels that have nothing between them as in this bit of assembly source:

% cat ZeroLength.s 
_func1:
_func2:
 nop

One solution would be to not emit the subsequent labels with the same address
and use the next label with a different address or the end of the section for
the AT_high_pc value of the TAG_subprogram.

Turns out in llvm-mc it is not possible in all cases to determine of two
symbols have the same value at the point we put out the TAG_subprogram dwarf
debug info.

So we will have llvm-mc instead of putting out TAG_subprogram's put out
DW_TAG_label's.  And the DW_TAG_label does not have a AT_high_pc value which
avoids the problem.

This commit is only the functional change to make the diffs clear as to what is
really being changed.  The next commit will be to clean up the names of such
things like MCGenDwarfSubprogramEntry to something like MCGenDwarfLabelEntry.

rdar://10666925

llvm-svn: 147860
2012-01-10 17:52:29 +00:00
Nadav Rotem 61bdf79035 Fix a bug in the legalization of shuffle vectors. When we emulate shuffles using BUILD_VECTORS we may be using a BV of different type. Make sure to cast it back.
llvm-svn: 147851
2012-01-10 14:28:46 +00:00
Craig Topper 430f3f1bd6 Fix a crash in AVX2 when trying to broadcast a double into a 128-bit vector. There is no vbroadcastsd xmm, but we do need to support 64-bit integers broadcasted into xmm. Also factor the AVX check into the isVectorBroadcast function. This makes more sense since the AVX2 check was already inside.
llvm-svn: 147844
2012-01-10 08:23:59 +00:00
Evan Cheng 0be4144a68 Allow machine-cse to look across MBB boundary when cse'ing instructions that
define physical registers. It's currently very restrictive, only catching
cases where the CE is in an immediate (and only) predecessor. But it catches
a surprising large number of cases.

rdar://10660865

llvm-svn: 147827
2012-01-10 02:02:58 +00:00
Andrew Trick d5d2db9af9 Enable LSR IV Chains with sufficient heuristics.
These heuristics are sufficient for enabling IV chains by
default. Performance analysis has been done for i386, x86_64, and
thumbv7. The optimization is rarely important, but can significantly
speed up certain cases by eliminating spill code within the
loop. Unrolled loops are prime candidates for IV chains. In many
cases, the final code could still be improved with more target
specific optimization following LSR. The goal of this feature is for
LSR to make the best choice of induction variables.

Instruction selection may not completely take advantage of this
feature yet. As a result, there could be cases of slight code size
increase.

Code size can be worse on x86 because it doesn't support postincrement
addressing. In fact, when chains are formed, you may see redundant
address plus stride addition in the addressing mode. GenerateIVChains
tries to compensate for the common cases.

On ARM, code size increase can be mitigated by using postincrement
addressing, but downstream codegen currently misses some opportunities.

llvm-svn: 147826
2012-01-10 01:45:08 +00:00
Andrew Trick 248d410e3e Adding IV chain generation to LSR.
After collecting chains, check if any should be materialized. If so,
hide the chained IV users from the LSR solver. LSR will only solve for
the head of the chain. GenerateIVChains will then materialize the
chained IV users by computing the IV relative to its previous value in
the chain.

In theory, chained IV users could be exposed to LSR's solver. This
would be considerably complicated to implement and I'm not aware of a
case where we need it. In practice it's more important to
intelligently prune the search space of nontrivial loops before
running the solver, otherwise the solver is often forced to prune the
most optimal solutions. Hiding the chained users does this well, so
that LSR is more likely to find the best IV for the chain as a whole.

llvm-svn: 147801
2012-01-09 21:18:52 +00:00
Benjamin Kramer f9d0cc0160 InstCombine: Teach foldLogOpOfMaskedICmpsHelper that sign bit tests are bit tests.
This subsumes several other transforms while enabling us to catch more cases.

llvm-svn: 147777
2012-01-09 17:23:27 +00:00
Chandler Carruth ea27c16adc Cleanup and FileCheck-ize a test.
llvm-svn: 147772
2012-01-09 09:44:26 +00:00
Craig Topper ef7f5bf8c9 Clean up patterns for MOVNT*. Not sure why there were floating point types on MOVNTPS and MOVNTDQ. And v4i64 was completely missing.
llvm-svn: 147767
2012-01-09 06:52:46 +00:00
Rafael Espindola f28213ca01 Don't print an unused label before .cfi_endproc.
llvm-svn: 147763
2012-01-09 00:17:29 +00:00
Craig Topper 744f6311d3 Don't disable MMX support when AVX is enabled. Fix predicates for MMX instructions that were added along with SSE instructions to check for AVX in addition to SSE level.
llvm-svn: 147762
2012-01-09 00:11:29 +00:00
Benjamin Kramer 6609f741b9 Tweak my last commit to be less conservative about uses.
We still save an instruction when just the "and" part is replaced.
Also change the code to match comments more closely.

llvm-svn: 147753
2012-01-08 21:12:51 +00:00
Benjamin Kramer da37e15345 InstCombine: If we have a bit test and a sign test anded/ored together, merge the sign bit into the bit test.
This is common in bit field code, e.g. checking if the first or the last bit of a bit field is set.

llvm-svn: 147749
2012-01-08 18:32:24 +00:00
Victor Umansky 540651cf59 Reverted commit #147601 upon Evan's request.
llvm-svn: 147748
2012-01-08 17:20:33 +00:00
Rafael Espindola 382412032c Don't print a label before .cfi_startproc when we don't need to. This makes
the produce assembly when using CFI just a bit more readable.

llvm-svn: 147743
2012-01-07 22:42:19 +00:00
Jakob Stoklund Olesen 8cdce7e690 Use getRegForValue() to materialize the address of ARM globals.
This enables basic local CSE, giving us 20% smaller code for
consumer-typeset in -O0 builds.

<rdar://problem/10658692>

llvm-svn: 147720
2012-01-07 04:07:22 +00:00
Andrew Trick 732ad80dbb LSR: Don't optimize loops if an outer loop has no preheader.
LoopSimplify may not run on some outer loops, e.g. because of indirect
branches. SCEVExpander simply cannot handle outer loops with no preheaders.
Fixes rdar://10655343 SCEVExpander segfault.

llvm-svn: 147718
2012-01-07 03:16:50 +00:00
Rafael Espindola 0708209642 Split Finish into Finish and FinishImpl to have a common place to do end of
file error checking. Use that to error on an unfinished cfi_startproc.

The error is not nice, but is already better than a segmentation fault.

llvm-svn: 147717
2012-01-07 03:13:18 +00:00
Evan Cheng 00b1a3cd7e Added a late machine instruction copy propagation pass. This catches
opportunities that only present themselves after late optimizations
such as tail duplication .e.g.
## BB#1:
        movl    %eax, %ecx
        movl    %ecx, %eax
        ret

The register allocator also leaves some of them around (due to false
dep between copies from phi-elimination, etc.)

This required some changes in codegen passes. Post-ra scheduler and the
pseudo-instruction expansion passes have been moved after branch folding
and tail merging. They were before branch folding before because it did
not always update block livein's. That's fixed now. The pass change makes
independently since we want to properly schedule instructions after
branch folding / tail duplication.

rdar://10428165
rdar://10640363

llvm-svn: 147716
2012-01-07 03:02:36 +00:00
Jakob Stoklund Olesen 68f034ee1a Use movw+movt in ARMFastISel::ARMMaterializeGV.
This eliminates a lot of constant pool entries for -O0 builds of code
with many global variable accesses.

This speeds up -O0 codegen of consumer-typeset by 2x because the
constant island pass no longer has to look at thousands of constant pool
entries.

<rdar://problem/10629774>

llvm-svn: 147712
2012-01-07 01:47:05 +00:00
Andrew Trick 5adedf5d47 Extended replaceCongruentPhis to handle mixed phi types.
llvm-svn: 147707
2012-01-07 01:12:09 +00:00
Eric Christopher c206d46709 Make the 'x' constraint work for AVX registers as well.
Fixes rdar://10614894

llvm-svn: 147704
2012-01-07 01:02:09 +00:00
Andrew Trick cbf2fe066a comment typo
llvm-svn: 147701
2012-01-07 00:29:20 +00:00
Jakob Stoklund Olesen 68a922c0e9 Enable aligned NEON spilling by default.
Experiments show this to be a small speedup for modern ARM cores.

llvm-svn: 147689
2012-01-06 22:19:37 +00:00
Dan Gohman 5ab9c0a927 Fix SpeculativelyExecuteBB to either speculate all or none of the phis
present in the bottom of the CFG triangle, as the transformation isn't
ever valuable if the branch can't be eliminated.

Also, unify some heuristics between SimplifyCFG's multiple
if-converters, for consistency.

This fixes rdar://10627242.

llvm-svn: 147630
2012-01-05 23:58:56 +00:00
Eli Friedman 55fa49f32d PR11705, part 2: globalopt shouldn't put inttoptr/ptrtoint operations into global initializers if there's an implied extension or truncation.
llvm-svn: 147625
2012-01-05 23:03:32 +00:00
Rafael Espindola 23f8d64b58 Link symbols with different visibilities according to the rules in the
System V Application Binary Interface. This lets us use
-fvisibility-inlines-hidden with LTO.
Fixes PR11697.

llvm-svn: 147624
2012-01-05 23:02:01 +00:00
Dan Gohman 5267211899 Revert r56315. When the instruction to speculate is a load, this
code can incorrectly move the load across a store. This never
happens in practice today, but only because the current
heuristics accidentally preclude it.

llvm-svn: 147623
2012-01-05 22:54:35 +00:00
Chandler Carruth e041a30bb9 Prevent a DAGCombine from firing where there are two uses of
a combined-away node and the result of the combine isn't substantially
smaller than the input, it's just canonicalized. This is the first part
of a significant (7%) performance gain for Snappy's hot decompression
loop.

llvm-svn: 147604
2012-01-05 11:05:55 +00:00
Chandler Carruth 6bc151f5d4 Cleanup and FileCheck-ize a test.
llvm-svn: 147603
2012-01-05 11:05:47 +00:00
Victor Umansky 9255b6d9fe Peephole optimization of ptest-conditioned branch in X86 arch. Performs instruction combining of sequences generated by ptestz/ptestc intrinsics to ptest+jcc pair for SSE and AVX.
Testing: passed 'make check' including LIT tests for all sequences being handled (both SSE and AVX)

Reviewers: Evan Cheng, David Blaikie, Bruno Lopes, Elena Demikhovsky, Chad Rosier, Anton Korobeynikov
llvm-svn: 147601
2012-01-05 08:46:19 +00:00
Benjamin Kramer aca1885695 FileCheck hygiene.
llvm-svn: 147580
2012-01-05 00:43:34 +00:00
Jakob Stoklund Olesen d110e2a83f Reapply r146997, "Heed spill slot alignment on ARM."
Now that canRealignStack() understands frozen reserved registers, it is
safe to use it for aligned spill instructions.

It will only return true if the registers reserved at the beginning of
register allocation allow for dynamic stack realignment.

<rdar://problem/10625436>

llvm-svn: 147579
2012-01-05 00:26:57 +00:00
Nick Lewycky 0c48afa0ed Teach instcombine all sorts of great stuff about shifts that have exact, nuw or
nsw bits on them.

llvm-svn: 147528
2012-01-04 09:28:29 +00:00
NAKAMURA Takumi 91a3f886ef test/CodeGen/X86/jump_sign.ll: Add -mcpu=pentiumpro for non-x86 hosts. It uses "cmov".
llvm-svn: 147521
2012-01-04 03:52:23 +00:00
Akira Hatanaka c669d7a6db Have getRegForInlineAsmConstraint return the correct register class when target
is Mips64.

llvm-svn: 147516
2012-01-04 02:45:01 +00:00
Evan Cheng 801d98b3f0 Fix more places which should be checking for iOS, not darwin.
llvm-svn: 147513
2012-01-04 01:55:04 +00:00
Evan Cheng 104dbb0fd1 For x86, canonicalize max
(x > y) ? x : y
=>
(x >= y) ? x : y

So for something like
(x - y) > 0 : (x - y) ? 0
It will be
(x - y) >= 0 : (x - y) ? 0

This makes is possible to test sign-bit and eliminate a comparison against
zero. e.g.
subl   %esi, %edi
testl  %edi, %edi
movl   $0, %eax
cmovgl %edi, %eax
=>
xorl   %eax, %eax
subl   %esi, $edi
cmovsl %eax, %edi

rdar://10633221

llvm-svn: 147512
2012-01-04 01:41:39 +00:00
Kostya Serebryany 842ae27ae3 [asan] one more test for asan instrumentation: (*a)++ should be instrumented only once.
llvm-svn: 147509
2012-01-04 01:02:14 +00:00
Jakob Stoklund Olesen 1b7f2a7638 Revert r146997, "Heed spill slot alignment on ARM."
This patch caused a miscompilation of oggenc because a frame pointer was
suddenly needed halfway through register allocation.

<rdar://problem/10625436>

llvm-svn: 147487
2012-01-03 22:34:35 +00:00
Nadav Rotem 6d31bac85e Revert 147426 because it caused pr11696.
llvm-svn: 147485
2012-01-03 22:19:42 +00:00
Nadav Rotem 1e7dda13c8 Fix incorrect widening of the bitcast sdnode in case the incoming operand is integer-promoted.
llvm-svn: 147484
2012-01-03 22:12:28 +00:00
Chad Rosier 493c1b3152 Enhance DAGCombine for transforming 128->256 casts into a vmovaps, rather
then a vxorps + vinsertf128 pair if the original vector came from a load.
rdar://10594409

llvm-svn: 147481
2012-01-03 21:05:52 +00:00
Elena Demikhovsky 8ec21a2801 Fixed a bug in SelectionDAG.cpp.
The failure seen on win32, when i64 type is illegal.
It happens on stage of conversion VECTOR_SHUFFLE to BUILD_VECTOR.

The failure message is:
llc: SelectionDAG.cpp:784: void VerifyNodeCommon(llvm::SDNode*): Assertion `(I->getValueType() == EltVT || (EltVT.isInteger() && I->getValueType().isInteger() && EltVT.bitsLE(I->getValueType()))) && "Wrong operand type!"' failed.

I added a special test that checks vector shuffle on win32.

llvm-svn: 147445
2012-01-03 11:59:04 +00:00
Andrew Trick cbcc98fb50 Fix SCEVExpander to handle loops with no preheader when LSR gives it a
"phony" insertion point.

Fixes rdar://10619599: "SelectionDAGBuilder shouldn't visit PHI nodes!" assert

llvm-svn: 147439
2012-01-02 21:25:10 +00:00
Nadav Rotem 6c7a0e6c8b Optimize the sequence blend(sign_extend(x)) to blend(shl(x)) since SSE blend instructions only look at the highest bit.
llvm-svn: 147426
2012-01-02 08:05:46 +00:00
Craig Topper b910984458 Allow CRC32 instructions to be selected when AVX is enabled.
llvm-svn: 147411
2012-01-01 19:51:58 +00:00
Craig Topper 1c064e0a89 Fix sfence, lfence, mfence, and clflush to be able to be selected when AVX is enabled. Fix monitor and mwait to require SSE3 or AVX, previously they worked even if SSE3 was disabled. Make prefetch instructions not set the execution domain since they don't use XMM registers.
llvm-svn: 147409
2012-01-01 19:40:22 +00:00
Rafael Espindola d3df940169 Revert 147399. It broke CodeGen/ARM/vext.ll.
llvm-svn: 147400
2012-01-01 17:36:23 +00:00
Elena Demikhovsky 67f80c3432 Fixed a bug in SelectionDAG.cpp.
The failure seen on win32, when i64 type is illegal.
It happens on stage of conversion VECTOR_SHUFFLE to BUILD_VECTOR.

The failure message is:
llc: SelectionDAG.cpp:784: void VerifyNodeCommon(llvm::SDNode*): Assertion `(I->getValueType() == EltVT || (EltVT.isInteger() && I->getValueType().isInteger() && EltVT.bitsLE(I->getValueType()))) && "Wrong operand type!"' failed.

I added a special test that checks vector shuffle on win32.

llvm-svn: 147399
2012-01-01 16:22:47 +00:00
Craig Topper d51092d93a Add patterns for integer forms of SHUFPD/VSHUFPD with a memory load.
llvm-svn: 147393
2011-12-31 23:24:49 +00:00
Craig Topper 0e796fee11 Fix typo in a SHUFPD and VSHUFPD pattern that prevented SHUFPD/VSHUFPD with a load from being selected.
llvm-svn: 147392
2011-12-31 23:15:11 +00:00
Nick Lewycky b59008c694 Make use of the exact bit when optimizing '(X >>exact 3) << 1' to eliminate the
'and' that would zero out the trailing bits, and to produce an exact shift
ourselves.

llvm-svn: 147391
2011-12-31 21:30:22 +00:00
Craig Topper 2ba766ae84 Add disassembler support for VPERMIL2PD and VPERMIL2PS.
llvm-svn: 147368
2011-12-30 06:23:39 +00:00
Craig Topper 03a0beda88 Add FMA4 instructions to disassembler.
llvm-svn: 147367
2011-12-30 05:20:36 +00:00
Craig Topper 6c08930c5e Change FMA4 memory forms to use memopv* instead of alignedloadv*. No need to force alignment on these instructions. Add a couple testcases for memory forms.
llvm-svn: 147361
2011-12-30 02:18:36 +00:00
Craig Topper 2ca79b9d4b Fix load size for FMA4 SS/SD instructions. They need to use f32 and f64 size, but with the special handling to be compatible with the intrinsic expecting a vector. Similar handling is already used elsewhere.
llvm-svn: 147360
2011-12-30 01:49:53 +00:00
Hal Finkel 692d1fb355 Cleanup stack/frame register define/kill states. This fixes two bugs:
1. The ST*UX instructions that store and update the stack pointer did not set define/kill on R1. This became a problem when I activated post-RA scheduling (and had incorrectly adjusted the Frames-large test).

2. eliminateFrameIndex did not kill its scavenged temporary register, and this could cause the scavenger to exhaust all available registers (and its emergency spill slot) when there were a lot of CR values to spill. The 2010-02-12-saveCR test has been adjusted to check for this.

llvm-svn: 147359
2011-12-30 00:34:00 +00:00
Rafael Espindola 4ea99816ef Implement cfi_restore. Patch by Brian Anderson!
llvm-svn: 147356
2011-12-29 21:43:03 +00:00
Craig Topper d773607eee Fix execution domains for PS/PD FMA3 instructions. Add SS/SD forms o FMA3 instructions.
llvm-svn: 147353
2011-12-29 20:43:40 +00:00
Rafael Espindola ef4aa35164 Implement .cfi_escape. Patch by Brian Anderson!
llvm-svn: 147352
2011-12-29 20:24:47 +00:00
Craig Topper 8cab06a214 Expose FMA3 instructions to the disassembler.
llvm-svn: 147351
2011-12-29 20:03:14 +00:00
Nick Lewycky 4c378a4453 Change CaptureTracking to pass a Use* instead of a Value* when a value is
captured. This allows the tracker to look at the specific use, which may be
especially interesting for function calls.

Use this to fix 'nocapture' deduction in FunctionAttrs. The existing one does
not iterate until a fixpoint and does not guarantee that it produces the same
result regardless of iteration order. The new implementation builds up a graph
of how arguments are passed from function to function, and uses a bottom-up walk
on the argument-SCCs to assign nocapture. This gets us nocapture more often, and
does so rather efficiently and independent of iteration order.

llvm-svn: 147327
2011-12-28 23:24:21 +00:00
Eli Friedman 3a01ddb7e9 Fix type-checking for load transformation which is not legal on floating-point types. PR11674.
llvm-svn: 147323
2011-12-28 21:24:44 +00:00
Nadav Rotem 3c3dd6e588 PR11662.
Promotion of the mask operand needs to be done using PromoteTargetBoolean, and not padded with garbage.

llvm-svn: 147309
2011-12-28 13:08:20 +00:00
Elena Demikhovsky b3515a8d4b Fixed a bug in LowerVECTOR_SHUFFLE and LowerBUILD_VECTOR.
Matching MOVLP mask for AVX (265-bit vectors) was wrong.
The failure was detected by conformance tests.

llvm-svn: 147308
2011-12-28 08:14:01 +00:00
Nick Lewycky a8e84fb56b Turn cos(-x) into cos(x). Patch by Alexander Malyshev!
llvm-svn: 147291
2011-12-27 18:25:50 +00:00
Nick Lewycky c554a9b58e Teach simplifycfg to recompute branch weights when merging some branches, and
to discard weights when appropriate. Still more to do (and a new TODO), but
it's a start!

llvm-svn: 147286
2011-12-27 04:31:52 +00:00
Eli Friedman e96286cdf2 Make sure DAGCombiner doesn't introduce multiple loads from the same memory location. PR10747, part 2.
llvm-svn: 147283
2011-12-26 22:49:32 +00:00
Nick Lewycky 8d302df4a4 Update the branch weight metadata when reversing the order of a branch.
llvm-svn: 147280
2011-12-26 20:54:14 +00:00
Chandler Carruth 8b7e71ffd6 Add an explicit test that we now fold cttz.i32(..., true) >> 5 -> 0.
This is a result of Benjamin's work on ValueTracking.

llvm-svn: 147259
2011-12-24 22:34:15 +00:00
Benjamin Kramer b16bd77bd2 InstCombine: Add a combine that turns (2^n)-1 ^ x back into (2^n)-1 - x iff x is smaller than 2^n and it fuses with a following add.
This was intended to undo the sub canonicalization in cases where it's not profitable, but it also
finds some cases on it's own.

llvm-svn: 147256
2011-12-24 17:31:53 +00:00
Benjamin Kramer 4ee5747fdd ComputeMaskedBits: Make knownzero computation more aggressive for ctlz with undef zero.
unsigned foo(unsigned x) { return 31 - __builtin_clz(x); }
now compiles into a single "bsrl" instruction on x86.

llvm-svn: 147255
2011-12-24 17:31:46 +00:00
Benjamin Kramer 010337c838 InstCombine: Canonicalize (2^n)-1 - x into (2^n)-1 ^ x iff x is known to be smaller than 2^n.
This has the obvious advantage of being commutable and is always a win on x86 because
const - x wastes a register there. On less weird architectures this may lead to
a regression because other arithmetic doesn't fuse with it anymore. I'll address that
problem in a followup.

llvm-svn: 147254
2011-12-24 17:31:38 +00:00
Chandler Carruth a3d54fe0ae Use standard promotion for i8 CTTZ nodes and i8 CTLZ nodes when the
LZCNT instructions are available. Force promotion to i32 to get
a smaller encoding since the fix-ups necessary are just as complex for
either promoted type

We can't do standard promotion for CTLZ when lowering through BSR
because it results in poor code surrounding the 'xor' at the end of this
instruction. Essentially, if we promote the entire CTLZ node to i32, we
end up doing the xor on a 32-bit CTLZ implementation, and then
subtracting appropriately to get back to an i8 value. Instead, our
custom logic just uses the knowledge of the incoming size to compute
a perfect xor. I'd love to know of a way to fix this, but so far I'm
drawing a blank. I suspect the legalizer could be more clever and/or it
could collude with the DAG combiner, but how... ;]

llvm-svn: 147251
2011-12-24 12:12:34 +00:00
Chandler Carruth 38ce24455d Add systematic testing for cttz as well, and fix the bug I spotted by
inspection earlier.

llvm-svn: 147250
2011-12-24 11:46:10 +00:00
Chandler Carruth 103ca80f59 Add i8 and i64 testing for ctlz on x86. Also simplify the i16 test.
llvm-svn: 147249
2011-12-24 11:26:59 +00:00
Chandler Carruth 44cf07228b Tidy up this rather crufty test. Put the declarations at the top to make
my C-brain happy. Remove the unnecessary bits of pedantic IR fluff like
nounwind. Remove stray uses comments. Name things semantically rather
than tN so that adding a new test in the middle doesn't cause pain, and
so that new tests can be grouped semantically.

This exposes how little systematic testing is going on here. I noticed
this by finding several bugs via inspection and wondering why this test
wasn't catching any of them. =[

llvm-svn: 147248
2011-12-24 11:26:57 +00:00
Chandler Carruth c9fcde2347 Expand more when we have a nice 'tzcnt' instruction, to avoid generating
'bsf' instructions here.

This one is actually debatable to my eyes. It's not clear that any chip
implementing 'tzcnt' would have a slow 'bsf' for any reason, and unless
EFLAGS or a zero input matters, 'tzcnt' is just a longer encoding.
Still, this restores the old behavior with 'tzcnt' enabled for now.

llvm-svn: 147246
2011-12-24 11:11:38 +00:00
Chandler Carruth eeb3a1ce3e Tidy up some of these tests.
llvm-svn: 147245
2011-12-24 11:11:36 +00:00
Chandler Carruth 7e9453e916 Switch the lowering of CTLZ_ZERO_UNDEF from a .td pattern back to the
X86ISelLowering C++ code. Because this is lowered via an xor wrapped
around a bsr, we want the dagcombine which runs after isel lowering to
have a chance to clean things up. In particular, it is very common to
see code which looks like:

  (sizeof(x)*8 - 1) ^ __builtin_clz(x)

Which is trying to compute the most significant bit of 'x'. That's
actually the value computed directly by the 'bsr' instruction, but if we
match it too late, we'll get completely redundant xor instructions.

The more naive code for the above (subtracting rather than using an xor)
still isn't handled correctly due to the dagcombine getting confused.

Also, while here fix an issue spotted by inspection: we should have been
expanding the zero-undef variants to the normal variants when there is
an 'lzcnt' instruction. Do so, and test for this. We don't want to
generate unnecessary 'bsr' instructions.

These two changes fix some regressions in encoding and decoding
benchmarks. However, there is still a *lot* to be improve on in this
type of code.

llvm-svn: 147244
2011-12-24 10:55:54 +00:00
Chandler Carruth 15075d4b19 Cleanup this test a bit, sorting things and grouping them more clearly.
llvm-svn: 147243
2011-12-24 10:55:42 +00:00
Akira Hatanaka 79329ce425 Test case for r147232.
llvm-svn: 147233
2011-12-24 03:05:43 +00:00
Nick Lewycky 854c869c36 Move this test from date-name to feature-name, and port it to FileCheck.
llvm-svn: 147223
2011-12-23 18:41:31 +00:00
Jakob Stoklund Olesen 0965585cb1 Experimental support for aligned NEON spills.
ARM targets with NEON units have access to aligned vector loads and
stores that are potentially faster than unaligned operations.

Add support for spilling the callee-saved NEON registers to an aligned
stack area using 16-byte aligned NEON loads and store.

This feature is off by default, controlled by an -align-neon-spills
command line option.

llvm-svn: 147211
2011-12-23 00:36:18 +00:00
Jim Grosbach ea2319112f ARM VFP assembly parsing and encoding for VCVT(float <--> fixed point).
rdar://10558523

llvm-svn: 147189
2011-12-22 22:19:05 +00:00
Rafael Espindola 6ca42c5be3 Fix incorrect relocation generation. Patch by Kristof Beyls.
Fixes PR11214.

llvm-svn: 147180
2011-12-22 21:36:43 +00:00
Chad Rosier 388769427d Reinstate r146578; it doesn't appear to be the cause of some recent execution-
time regressions.  In general, it is beneficial to compile-time.

Original commit message:
Fix for bug #11429: Wrong behaviour for switches. Small improvement for code
size heuristics.

llvm-svn: 147175
2011-12-22 21:06:36 +00:00
Jim Grosbach 12ccf45bbb ARM assembler should accept shift-by-zero for any shifted-immediate operand.
Just treat it as-if the shift wasn't there at all. 'as' compatibility.

rdar://10604767

llvm-svn: 147153
2011-12-22 18:04:04 +00:00
Benjamin Kramer f1fd6e394d Give string constants generated by IRBuilder private linkage.
Fixes PR11640.

llvm-svn: 147144
2011-12-22 14:22:14 +00:00
Chandler Carruth b024aa021d Make the unreachable probability much much heavier. The previous
probability wouldn't be considered "hot" in some weird loop structures
or other compounding probability patterns. This makes it much harder to
confuse, but isn't really a principled fix. I'd actually like it if we
could model a zero probability, as it would make this much easier to
reason about. Suggestions for how to do this better are welcome.

llvm-svn: 147142
2011-12-22 09:26:37 +00:00
Chad Rosier 1b7e2baf47 Speculatively revert r146578 to determine if it is the cause of a number of
performance regressions (both execution-time and compile-time) on our
nightly testers.

Original commit message:
Fix for bug #11429: Wrong behaviour for switches. Small improvement for code
size heuristics.

llvm-svn: 147131
2011-12-22 02:40:57 +00:00
Akira Hatanaka e2eed9649e Local dynamic TLS model for direct object output. Create the correct TLS MIPS
ELF relocations.

Patch by Jack Carter.

llvm-svn: 147118
2011-12-22 01:05:17 +00:00
Jim Grosbach 7869d8c01e ARM VFP optional data type on VMOV GPR<-->SPR.
llvm-svn: 147104
2011-12-21 23:24:15 +00:00
Jim Grosbach 8c59bbc1ed Thumb2 assembly parsing of 'mov rd, rn, rrx'.
Maps to the RRX instruction. Missed this case earlier.

rdar://10615373

llvm-svn: 147096
2011-12-21 21:04:19 +00:00
Jim Grosbach b3ef713e44 Thumb2 assembly parsing of 'mov(register shifted register)' aliases.
These map to the ASR, LSR, LSL, ROR instruction definitions.

rdar://10615373

llvm-svn: 147094
2011-12-21 20:54:00 +00:00
Jim Grosbach c80a264386 ARM NEON assmebly parsing for VLD2 to all lanes instructions.
llvm-svn: 147069
2011-12-21 19:40:55 +00:00
Chad Rosier 7248bda595 Fix a couple of copy-n-paste bugs. Noticed by George Russell!
llvm-svn: 147064
2011-12-21 18:56:22 +00:00
Nick Lewycky b4039f633c Make some intrinsics safe to speculatively execute.
llvm-svn: 147036
2011-12-21 05:52:02 +00:00
Evan Cheng dc8a1aaea6 Fix a couple of copy-n-paste bugs. Noticed by George Russell.
llvm-svn: 147032
2011-12-21 03:04:10 +00:00
Jim Grosbach c5af54ec89 ARM NEON VLD2 assembly parsing for structure to all lanes, non-writeback.
llvm-svn: 147025
2011-12-21 00:38:54 +00:00
Akira Hatanaka 964c891e61 Fix bug in zero-store peephole pattern reported in pr11615.
The patch and test case were originally written by Mans Rullgard.

llvm-svn: 147024
2011-12-21 00:31:10 +00:00
Akira Hatanaka 1d8efaba7e Expand 64-bit CTLZ nodes if target architecture does not support it. Add test
case for DCLO and DCLZ.

llvm-svn: 147022
2011-12-21 00:20:27 +00:00
Akira Hatanaka bd95275f7a Test case for r147017.
llvm-svn: 147018
2011-12-20 23:58:36 +00:00
Jim Grosbach 6ac54afeba Enable and fix a test.
llvm-svn: 147011
2011-12-20 23:20:00 +00:00
Akira Hatanaka cb2a85bc22 Add function MipsDAGToDAGISel::SelectMULT and factor out code that generates
nodes needed for multiplication. Add code for selecting 64-bit MULHS and MULHU
nodes.

llvm-svn: 147008
2011-12-20 23:10:57 +00:00
Akira Hatanaka cf10f08825 64-bit data directive.
llvm-svn: 147005
2011-12-20 22:52:19 +00:00
Akira Hatanaka 494fdf1499 32-to-64-bit sext_inreg pattern.
llvm-svn: 147004
2011-12-20 22:40:40 +00:00
Akira Hatanaka dac1d48d8d Add code in MipsDAGToDAGISel for selecting constant +0.0.
MIPS64 can generate constant +0.0 with a single DMTC1 instruction. 

llvm-svn: 146999
2011-12-20 22:25:50 +00:00
Jakob Stoklund Olesen b95c102c2f Heed spill slot alignment on ARM.
Use the spill slot alignment as well as the local variable alignment to
determine when the stack needs to be realigned. This works now that the
ARM target can always realign the stack by using a base pointer.

Still respect the ARMBaseRegisterInfo::canRealignStack() function
vetoing a realigned stack.  Don't use aligned spill code in that case.

llvm-svn: 146997
2011-12-20 22:15:04 +00:00
Jim Grosbach 2c59052984 ARM assembly parsing and encoding for VST2 single-element, double spaced.
llvm-svn: 146990
2011-12-20 20:46:29 +00:00
Jim Grosbach c8ebeff9a1 ARM enable a few more tests.
llvm-svn: 146985
2011-12-20 20:03:00 +00:00
Jim Grosbach 75e2ab5db2 ARM assembly parsing and encoding for VLD2 single-element, double spaced.
llvm-svn: 146983
2011-12-20 19:21:26 +00:00
Evan Cheng 68132d8093 ARM target code clean up. Check for iOS, not Darwin where it makes sense.
llvm-svn: 146981
2011-12-20 18:26:50 +00:00
Elena Demikhovsky ec7e6e0946 This is the second fix related to VZEXT_MOVL node.
The failure that I see in the current version is:

LLVM ERROR: Cannot select: 0x18b8f70: v4i64 = X86ISD::VZEXT_MOVL 0x18beee0 [ID=14]
  0x18beee0: v4i64 = insert_subvector 0x18b8c70, 0x18b9170, 0x18b9570 [ID=13]
    0x18b8c70: v4i64 = insert_subvector 0x18b9870, 0x18bf4e0, 0x18b9970 [ID=12]
      0x18b9870: v4i64 = undef [ID=4]
      0x18bf4e0: v2i64 = bitcast 0x18bf3e0 [ID=10]
        0x18bf3e0: v4i32 = BUILD_VECTOR 0x18b9770, 0x18b9770, 0x18b9770, 0x18b9770 [ID=8]
          0x18b9770: i32 = TargetConstant<0> [ID=6]
          0x18b9770: i32 = TargetConstant<0> [ID=6]
          0x18b9770: i32 = TargetConstant<0> [ID=6]
          0x18b9770: i32 = TargetConstant<0> [ID=6]
      0x18b9970: i32 = Constant<0> [ID=3]
    0x18b9170: v2i64 = undef [ORD=1] [ID=1]
    0x18b9570: i32 = Constant<2> [ID=5]

llvm-svn: 146975
2011-12-20 13:34:28 +00:00
Chandler Carruth 24680c24d8 Begin teaching the X86 target how to efficiently codegen patterns that
use the zero-undefined variants of CTTZ and CTLZ. These are just simple
patterns for now, there is more to be done to make real world code using
these constructs be optimized and codegen'ed properly on X86.

The existing tests are spiffed up to check that we no longer generate
unnecessary cmov instructions, and that we generate the very important
'xor' to transform bsr which counts the index of the most significant
one bit to the number of leading (most significant) zero bits. Also they
now check that when the variant with defined zero result is used, the
cmov is still produced.

llvm-svn: 146974
2011-12-20 11:19:37 +00:00
Andrew Trick a34a8c45b4 Unit test for r146950: LSR postinc expansion, PR11571.
llvm-svn: 146951
2011-12-20 01:43:20 +00:00
Bob Wilson 75f12cc3fe Mark ARM eh_sjlj_dispatchsetup as clobbering all registers. Radar 10567930.
We used to rely on the *eh_sjlj_setjmp instructions to mark that a function
with setjmp/longjmp exception handling clobbers all the registers.  But with
the recent reorganization of ARM EH, those eh_sjlj_setjmp instructions are
expanded away earlier, before PEI can see them to determine what registers to
save and restore.  Mark the dispatchsetup instruction in the same way, since
that instruction cannot be expanded early.  This also more accurately reflects
when the registers are clobbered.

llvm-svn: 146949
2011-12-20 01:29:27 +00:00
Jim Grosbach e2ca9e5b5f ARM assembly shifts by zero should be plain 'mov' instructions.
"mov r1, r2, lsl #0" should assemble as "mov r1, r2" even though it's
not strictly legal UAL syntax. It's a common extension and the friendly
thing to do.

rdar://10604663

llvm-svn: 146937
2011-12-20 00:59:38 +00:00
Chris Lattner 9eb3f00406 Now that PR11464 is fixed, reapply the patch to fix PR11464,
merging types by name when we can.  We still don't guarantee type name linkage
but we do it when obviously the right thing to do.  This makes LTO type names 
easier to read, for example.

llvm-svn: 146932
2011-12-20 00:12:26 +00:00
Chris Lattner 5e3bd9727a fix PR11464 by preventing the linker from mapping two different struct types from the source module onto the same opaque destination type. An opaque type can only be resolved to one thing or another after all.
llvm-svn: 146929
2011-12-20 00:03:52 +00:00
Evan Cheng 3bfaefe9e7 Move tests to FileCheck.
llvm-svn: 146923
2011-12-19 23:26:44 +00:00
Jim Grosbach 8648c10184 ARM assembly parsing and encoding support for LDRD(label).
rdar://9932658

llvm-svn: 146921
2011-12-19 23:06:24 +00:00
Akira Hatanaka 37c45db189 Add a test case for r146900.
llvm-svn: 146901
2011-12-19 20:24:28 +00:00
Akira Hatanaka db47e0c49d Add patterns for matching immediates whose lower 16-bit is cleared. These
patterns emit a single LUi instruction instead of a pair of LUi and ORi.

llvm-svn: 146900
2011-12-19 20:21:18 +00:00
Jim Grosbach 64f4de29e0 ARM NEON two-operand aliases for VPADD.
rdar://10602276

llvm-svn: 146895
2011-12-19 19:51:03 +00:00
Akira Hatanaka 2a232d81f6 Remove definitions of double word shift plus 32 instructions. Assembler or
direct-object emitter should emit the appropriate shift instruction depending
on the shift amount.

llvm-svn: 146893
2011-12-19 19:44:09 +00:00
Akira Hatanaka 3c9f336361 Remove the restriction on the first operand of the add node in SelectAddr.
This change reduces the number of instructions generated.

For example, 
(load (add (sub $n0, $n1), (MipsLo got(s))))

results in the following sequence of instructions:
1. sub $n2, $n0, $n1
2. lw got(s)($n2)

Previously, three instructions were needed.
1. sub $n2, $n0, $n1
2. addiu $n3, $n2, got(s)
3. lw 0($n3)

llvm-svn: 146888
2011-12-19 19:28:37 +00:00
Jim Grosbach 9ae4fc035b ARM NEON implied destination aliases for VMAX/VMIN.
llvm-svn: 146885
2011-12-19 18:57:38 +00:00
Jim Grosbach cef98cddbe ARM NEON relax parse time diagnostics for alignment specifiers.
There's more variation that we need to handle. Error checking will need
to be on operand predicates.

llvm-svn: 146884
2011-12-19 18:31:43 +00:00
Joerg Sonnenberger d6cb7649d8 Allow inlining of functions with returns_twice calls, if they have the
attribute themselve.

llvm-svn: 146851
2011-12-18 20:35:43 +00:00
Chad Rosier 5e5bee4c52 Revert 146728 as it's causing failures on some of the external bots as well as
internal nightly testers.  Original commit message:

By popular demand, link up types by name if they are isomorphic and one is an
autorenamed version of the other.   This makes the IR easier to read, because
we don't end up with random renamed versions of the types after LTO'ing a large
app.

llvm-svn: 146838
2011-12-17 22:19:53 +00:00
Kevin Enderby 8b3deabd2d Revert r146822 at Pete Cooper's request as it broke clang self hosting.
Hope I did this correctly :)

llvm-svn: 146834
2011-12-17 19:48:52 +00:00
Pete Cooper eadf124d2b SimplifyCFG now predicts some conditional branches to true or false depending on previous branch on same comparison operands.
For example, 

if (a == b) {
    if (a > b) // this is false
    
Fixes some of the issues on <rdar://problem/10554090>

llvm-svn: 146822
2011-12-17 06:32:38 +00:00
Manuel Klimek 3c2848ea31 Deleting the json-bench-test until I understand why it is flaky.
llvm-svn: 146821
2011-12-17 06:29:32 +00:00
Evan Cheng 903231bc58 Fix a CPSR liveness tracking bug introduced when I converted IT block to bundle.
llvm-svn: 146805
2011-12-17 01:25:34 +00:00
Rafael Espindola d3df3d3527 Add back the MC bits of 126425. Original patch by Nathan Jeffords. I added the
asm parsing and testcase.

llvm-svn: 146801
2011-12-17 01:14:52 +00:00
Lang Hames da07b3ad42 Make sure that the lower bits on the VSELECT condition are properly set.
llvm-svn: 146800
2011-12-17 01:08:46 +00:00
Dan Gohman 518cda42b9 The powers that be have decided that LLVM IR should now support 16-bit
"half precision" floating-point with a first-class type.

This patch adds basic IR support (but not codegen support).

llvm-svn: 146786
2011-12-17 00:04:22 +00:00
Eric Christopher 27886c6c1e When recursing for the original size of a type, stop if we are at a
pointer or a reference type - we actually just want the size of the
pointer then for that.

Fixes rdar://10335756

llvm-svn: 146785
2011-12-16 23:42:45 +00:00
Jakob Stoklund Olesen 9790187b6c Fix off-by-one error in bucket sort.
The bad sorting caused a misaligned basic block when building 176.vpr in
ARM mode.

<rdar://problem/10594653>

llvm-svn: 146767
2011-12-16 23:00:05 +00:00
Benjamin Kramer 9ca2e7293b Hexagon: Fix a nasty order-of-initialization bug.
Reenable the tests.

llvm-svn: 146750
2011-12-16 19:08:59 +00:00
Manuel Klimek 2c899a181c Adds a JSON parser and a benchmark (json-bench) to catch performance regressions.
llvm-svn: 146735
2011-12-16 13:09:10 +00:00
Chris Lattner 3fdf98c60f By popular demand, link up types by name if they are isomorphic and one is an
autorenamed version of the other.   This makes the IR easier to read, because
we don't end up with random renamed versions of the types after LTO'ing a large app.

llvm-svn: 146728
2011-12-16 08:36:07 +00:00
Craig Topper a4d411cb1b Don't try to match 'unpackl/h v, v' for 32xi8 and 16xi16 when only AVX1 is supported. Fix 'unpackh v, v' for 256-bit types to understand 128-bit lanes.
llvm-svn: 146726
2011-12-16 08:06:31 +00:00
Kostya Serebryany 561dade58d [asan] add a test for instrumenting globals
llvm-svn: 146718
2011-12-16 01:28:19 +00:00
Eli Friedman 64944090ff Make sure we correctly note the existence of an i8 immediate for vblendvps and friends, so we compute fixups correctly. PR11586.
llvm-svn: 146709
2011-12-15 23:46:18 +00:00
Jim Grosbach a47294e24d ARM NEON VCLE is an alias for VCGE w/ the source operands reversed.
llvm-svn: 146699
2011-12-15 22:56:33 +00:00
Jim Grosbach 4a5c887370 ARM NEON VTBL/VTBX assembly parsing and encoding.
llvm-svn: 146691
2011-12-15 22:27:11 +00:00
Chad Rosier 41dbf59e12 Add missing zmovl AVX patterns which were causing crashes.
Patch by Elena Demikhovsky <elena.demikhovsky@intel.com>!

llvm-svn: 146689
2011-12-15 22:11:31 +00:00
Chad Rosier 75ed9dcbc6 Fix assert in LowerBUILD_VECTOR for v16i16 type on AVX.
Patch by Elena Demikhovsky <elena.demikhovsky@intel.com>!

llvm-svn: 146684
2011-12-15 21:34:44 +00:00
Lang Hames 918f976e66 Set specific target cpu for testcase.
llvm-svn: 146678
2011-12-15 20:22:34 +00:00
Lang Hames 2d6d3a2f96 Added test case for r146671.
llvm-svn: 146675
2011-12-15 19:56:07 +00:00
Hal Finkel 750366f014 Add a test case to make sure that the nop really does follow the bl on ppc64 elf
llvm-svn: 146666
2011-12-15 17:59:23 +00:00
Eli Friedman ef7b2f2532 Fix test.
llvm-svn: 146642
2011-12-15 04:52:47 +00:00
Eli Friedman a45ab503f6 Make constant folding for GEPs a bit more aggressive.
llvm-svn: 146639
2011-12-15 04:33:48 +00:00
Eli Friedman 2ec824966d Don't try to form FGETSIGN after legalization; it is possible in some cases, but the existing code can't do it correctly. PR11570.
llvm-svn: 146630
2011-12-15 02:07:20 +00:00
Chad Rosier 1940baa76b Add support for lowering fneg when AVX is enabled.
rdar://10566486

llvm-svn: 146625
2011-12-15 01:02:25 +00:00
Pete Cooper b33c297f14 Added InstCombine for "select cond, ~cond, x" type patterns
These can be reduced to "~cond & x" or "~cond | x"

llvm-svn: 146624
2011-12-15 00:56:45 +00:00
Eli Friedman 16ad2905a3 Make loop preheader insertion in LoopSimplify handle the case where the loop header is a landing pad correctly (by splitting the landingpad out of the loop header). Make some adjustments to the rest of LoopSimplify to make it clear that the rest of LoopSimplify isn't making bad assumptions about the presence of landing pads. PR11575.
llvm-svn: 146621
2011-12-15 00:50:34 +00:00
Dan Gohman 75d7d5e988 Move Instruction::isSafeToSpeculativelyExecute out of VMCore and
into Analysis as a standalone function, since there's no need for
it to be in VMCore. Also, update it to use isKnownNonZero and
other goodies available in Analysis, making it more precise,
enabling more aggressive optimization.

llvm-svn: 146610
2011-12-14 23:49:11 +00:00
Jim Grosbach a8aa30b620 ARM NEON VLD2/VST2 lane indexed assembly parsing and encoding.
llvm-svn: 146605
2011-12-14 23:25:46 +00:00
Devang Patel c268688643 Do not sink instruction, if it is not profitable.
On ARM, peephole optimization for ABS creates a trivial cfg triangle which tempts machine sink to sink instructions in code which is really straight line code. Sometimes this sinking may alter register allocator input such that use and def of a reg is divided by a branch in between, which may result in extra spills. Now mahine sink avoids sinking if final sink destination is post dominator.

Radar 10266272.

llvm-svn: 146604
2011-12-14 23:20:38 +00:00
Kevin Enderby ad41ab5015 Improve the implementation of .incbin directive by replacing a loop by using
getStreamer().EmitBytes.  Suggestion by Benjamin Kramer!

llvm-svn: 146599
2011-12-14 22:34:45 +00:00
Andrew Trick e0ced62119 LSR: Fold redundant bitcasts on-the-fly.
llvm-svn: 146597
2011-12-14 22:07:19 +00:00
Jim Grosbach bb18fb4f52 ARM NEON fix alignment encoding for VST2 w/ writeback.
Add tests for w/ writeback instruction parsing and encoding.

llvm-svn: 146594
2011-12-14 21:49:24 +00:00
Kevin Enderby 109f25c966 Add the .incbin directive which takes the binary data from a file and emits
it to the streamer.  rdar://10383898

llvm-svn: 146592
2011-12-14 21:47:48 +00:00
Jim Grosbach 8d24618975 ARM NEON VST2 assembly parsing and encoding.
Work in progress. Parsing for non-writeback, single spaced register lists
works now. The rest have the representations better factored, but still
need more to be able to parse properly.

llvm-svn: 146579
2011-12-14 19:35:22 +00:00
Stepan Dyatkovskiy d7b2bb3bdd Fix for bug #11429: Wrong behaviour for switches. Small improvement for code size heuristics.
llvm-svn: 146578
2011-12-14 19:19:17 +00:00
Dan Gohman bd944b4153 It turns out that clang does use pointer-to-function types to
point to ARC-managed pointers sometimes. This fixes rdar://10551239.

llvm-svn: 146577
2011-12-14 19:10:53 +00:00
Akira Hatanaka bff84e1914 Add support for local dynamic TLS model in LowerGlobalTLSAddress. Direct object
emission is not supported yet, but a patch that adds the support should follow
soon.

llvm-svn: 146572
2011-12-14 18:26:41 +00:00
Jim Grosbach a342667fd0 ARM/Thumb2 'cmp rn, #imm' alias to cmn.
When 'cmp rn #imm' doesn't match due to the immediate not being representable,
but 'cmn rn, #-imm' does match, use the latter in place of the former, as
it's equivalent.

rdar://10552389

llvm-svn: 146567
2011-12-14 17:30:24 +00:00
Jim Grosbach ab5830e51b ARM assembler support for the target-specific .req directive.
rdar://10549683

llvm-svn: 146543
2011-12-14 02:16:11 +00:00
Evan Cheng 7fae11b231 - Add MachineInstrBundle.h and MachineInstrBundle.cpp. This includes a function
to finalize MI bundles (i.e. add BUNDLE instruction and computing register def
  and use lists of the BUNDLE instruction) and a pass to unpack bundles.
- Teach more of MachineBasic and MachineInstr methods to be bundle aware.
- Switch Thumb2 IT block to MI bundles and delete the hazard recognizer hack to
  prevent IT blocks from being broken apart.

llvm-svn: 146542
2011-12-14 02:11:42 +00:00
Chad Rosier 4020ae75ea Add newline at EOF.
llvm-svn: 146538
2011-12-14 01:34:39 +00:00
Jim Grosbach 485e5622f4 Thumb2 assembler aliases for "mov(shifted register)"
rdar://10549767

llvm-svn: 146520
2011-12-13 22:45:11 +00:00
Jim Grosbach 18bf363078 ARM LDM/STM system instruction variants.
rdar://10550269

llvm-svn: 146519
2011-12-13 21:48:29 +00:00
Jim Grosbach dce106940e Test for 146516
llvm-svn: 146517
2011-12-13 21:06:59 +00:00
Jim Grosbach 1f1a3598c2 ARM thumb2 parsing of "rsb rd, rn, #0".
rdar://10549741

llvm-svn: 146515
2011-12-13 20:50:38 +00:00
Jim Grosbach 4b0844e191 ARM NEON two-operand aliases for VQDMULH.
llvm-svn: 146514
2011-12-13 20:40:37 +00:00
Jim Grosbach 561e4e18cf ARM pre-UAL NEG mnemonic for convenience when porting old code.
llvm-svn: 146511
2011-12-13 20:23:22 +00:00
Chad Rosier 563de603f7 [fast-isel] Unaligned loads of floats are not supported. Therefore, convert to a regular
load and then move the result from a GPR to a FPR.

llvm-svn: 146502
2011-12-13 19:22:14 +00:00
Akira Hatanaka 7200123fa3 Add test/MC/Mips/dg.exp.
llvm-svn: 146472
2011-12-13 04:12:49 +00:00
Akira Hatanaka 341850fdc6 Move direct object emitter test to directory test/MC/Mips. Rename it to
elf-relsym.ll.

llvm-svn: 146470
2011-12-13 03:50:34 +00:00
Akira Hatanaka e41963ce47 Relocation against a symbol, instead of against section. We had some extreme
test cases where there were a lot of relocations applied relative to a large
rodata section. Gas would create a symbol for each of these whereas we would
be relative to the beginning of the rodata section. This change mimics what
gas does.

Patch by Jack Carter.

llvm-svn: 146468
2011-12-13 02:27:40 +00:00
Nick Lewycky 86ffb03c79 Don't rely on a particular version string for llvm.
llvm-svn: 146456
2011-12-13 00:34:14 +00:00
Tony Linthicum 525ca5fc69 Temporarily disable Hexagon tests. They are failing on OS X
llvm-svn: 146455
2011-12-13 00:33:45 +00:00
Akira Hatanaka 9e5908ae3a Test case for r146432 by Jack Carter.
llvm-svn: 146433
2011-12-12 22:41:39 +00:00
Bob Wilson fadc2c83e5 Implement 'e' and 'f' modifiers for Neon inline asm. <rdar://problem/10551006>
These modifiers simply select either the low or high D subregister of a Neon
Q register.  I've also removed the unimplemented 'p' modifier, which turns out
to be a bit different than the comment here suggests and as far as I can tell
was only intended for internal use in Apple's version of gcc.

llvm-svn: 146417
2011-12-12 21:45:15 +00:00
Tony Linthicum 1213a7a57f Hexagon backend support
llvm-svn: 146412
2011-12-12 21:14:40 +00:00
Joerg Sonnenberger 45c4164166 Only replace fwrite with fputc, if the return value is unused.
llvm-svn: 146411
2011-12-12 20:18:31 +00:00
Jan Sjödin 7c0face455 XOP instructions and encoding tests.
llvm-svn: 146407
2011-12-12 19:37:49 +00:00
Roman Divacky 735cb8bcdc Add support for gnu_indirect_function.
llvm-svn: 146377
2011-12-12 17:34:04 +00:00
Chandler Carruth 6b0e34c445 Manually upgrade the test suite to specify the flag to cttz and ctlz.
I followed three heuristics for deciding whether to set 'true' or
'false':

- Everything target independent got 'true' as that is the expected
  common output of the GCC builtins.
- If the target arch only has one way of implementing this operation,
  set the flag in the way that exercises the most of codegen. For most
  architectures this is also the likely path from a GCC builtin, with
  'true' being set. It will (eventually) require lowering away that
  difference, and then lowering to the architecture's operation.
- Otherwise, set the flag differently dependending on which target
  operation should be tested.

Let me know if anyone has any issue with this pattern or would like
specific tests of another form. This should allow the x86 codegen to
just iteratively improve as I teach the backend how to differentiate
between the two forms, and everything else should remain exactly the
same.

llvm-svn: 146370
2011-12-12 11:59:10 +00:00
Chandler Carruth f13db84794 Add an explicit test of the auto-upgrade functionality for the new
intrinsic syntax.

Now that this is explicitly covered, I plan to upgrade the existing test
suite to use an explicit immediate. Note that I plan to specify 'true'
in most places rather than the auto-upgraded value as that is the far
more common value to end up here as that is the value coming from GCC's
builtins. The only place I'm likely to put a 'false' in is when testing
x86 which actually has different instructions for the two variants.

llvm-svn: 146369
2011-12-12 11:23:11 +00:00
Chandler Carruth 026cc37e48 Teach the verifier to reject all non-constant arguments to the second
argument of the cttz and ctlz intrinsics.

llvm-svn: 146360
2011-12-12 04:36:02 +00:00
Stepan Dyatkovskiy 4683740967 Fixed bug 9905: Failure in code selection for llvm intrinsics sqrt/exp (fix for FSQRT, FSIN, FCOS, FPOWI, FPOW, FLOG, FLOG2, FLOG10, FEXP, FEXP2). Third attempt: simplified checks in test for armv7-apple-darwin11.
llvm-svn: 146341
2011-12-11 14:35:48 +00:00
Chandler Carruth 1d76d4196a Don't assume things about the exact details of the LLVM version number,
such as what VCS information is attached.

llvm-svn: 146333
2011-12-10 21:40:31 +00:00
Chad Rosier 1c468af854 Revert associate SelectInsertValue test as well.
llvm-svn: 146332
2011-12-10 21:34:28 +00:00
Chad Rosier 6641294e3b Revert r146322 to appease buildbots. Original commit message:
Fixed bug 9905: Failure in code selection for llvm intrinsics sqrt/exp (fix for
FSQRT, FSIN, FCOS, FPOWI, FPOW, FLOG, FLOG2, FLOG10, FEXP, FEXP2). Second
attempt.

llvm-svn: 146328
2011-12-10 19:55:03 +00:00
Stepan Dyatkovskiy df0b779e9f Fixed bug 9905: Failure in code selection for llvm intrinsics sqrt/exp (fix for FSQRT, FSIN, FCOS, FPOWI, FPOW, FLOG, FLOG2, FLOG10, FEXP, FEXP2). Second attempt.
llvm-svn: 146322
2011-12-10 08:42:24 +00:00
Hal Finkel 67a7f18faf Make CR spill and restore use a reserved register. These operations cannot use the register scavenger because the scavenger can only scavenge one register and frame-index elimination may have already grabbed it.
llvm-svn: 146318
2011-12-10 04:50:53 +00:00
Rafael Espindola c7f355b8e1 Handle expressions of the form _GLOBAL_OFFSET_TABLE_-symbol the same way gas
does. The _GLOBAL_OFFSET_TABLE_ is still magical in that we get a R_386_GOTPC,
but it doesn't change the immediate in the same way as when the expression
has no right hand side symbol.

llvm-svn: 146311
2011-12-10 02:28:43 +00:00
Eli Friedman 4e36a934dc Splats can contain undef's; make sure to handle them correctly. PR11526.
llvm-svn: 146299
2011-12-09 23:54:42 +00:00
Jim Grosbach 6192b6570d ARM assembly aliases for BIC<-->AND (immediate).
When the immediate operand of an AND or BIC instruction isn't representable
in the immediate field of the instruction, but the bitwise negation of the
immediate is, assemble the instruction as the inverse operation instead
with the inverted immediate as the operand.

rdar://10550057

llvm-svn: 146283
2011-12-09 22:02:17 +00:00
Evan Cheng 1d54d2210a Update test to something more sensible.
llvm-svn: 146282
2011-12-09 21:54:10 +00:00
Jim Grosbach d146a02c79 ARM assembly parsing and encoding for VLD2 with writeback.
Refactor the instructions into fixed writeback and register-stride
writeback variants to simplify the offset operand (no more optional
register operand using reg0). This is a simpler representation and allows
the assembly parser to more easily handle these instructions.

Add tests for the instruction variants now supported.

llvm-svn: 146278
2011-12-09 21:28:25 +00:00
Chad Rosier dd998ff4df [fast-isel] Add support for selecting insertvalue.
rdar://10530851

llvm-svn: 146276
2011-12-09 20:09:54 +00:00
Rafael Espindola 7e0a793183 Handle reloc_signed_4byte in here. Not doing so was a regression from my
previous commit. It is strange that we see it in 32 bits. We already
have a fixme about it.

llvm-svn: 146273
2011-12-09 19:57:29 +00:00
Kevin Enderby e7739d484f The second part of support for generating dwarf for assembly source files. This
generates the dwarf Compile Unit DIE and a dwarf subprogram DIE for each
non-temporary label.

The next part will be to get the clang driver to enable this when assembling
a .s file.  rdar://9275556

llvm-svn: 146262
2011-12-09 18:09:40 +00:00
Benjamin Kramer 16bbfbec66 X86: Add patterns for the various rounding ops for SSE4.1 and AVX.
llvm-svn: 146257
2011-12-09 15:44:03 +00:00
Andrew Trick d04d152998 Add -unroll-runtime for unrolling loops with run-time trip counts.
Patch by Brendon Cahoon!

This extends the existing LoopUnroll and LoopUnrollPass. Brendon
measured no regressions in the llvm test suite with -unroll-runtime
enabled. This implementation works by using the existing loop
unrolling code to unroll the loop by a power-of-two (default 8). It
generates an if-then-else sequence of code prior to the loop to
execute the extra iterations before entering the unrolled loop.

llvm-svn: 146245
2011-12-09 06:19:40 +00:00
Evan Cheng 5895fa79d6 Forgot setting -march.
llvm-svn: 146244
2011-12-09 06:15:00 +00:00
Rafael Espindola 0a7f336475 Handle the case of the magical _GLOBAL_OFFSET_TABLE_ showing up in a
symbol difference. This matches gas behavior and fixes PR11513.

We still don't handle _GLOBAL_OFFSET_TABLE_ in data sections.

llvm-svn: 146238
2011-12-09 03:03:58 +00:00
Akira Hatanaka 8e16aac534 jalr should use t9 ($25) for indirect calls regardless of the relocation model
specified.

llvm-svn: 146229
2011-12-09 01:45:12 +00:00
Eli Friedman 053a724483 Fix a couple of logic bugs in TargetLowering::SimplifyDemandedBits. PR11514.
llvm-svn: 146219
2011-12-09 01:16:26 +00:00
Nick Lewycky fe970725cc Fix infinite loop in DSE when deleting a free in a reachable loop that's also
trivially infinite.

llvm-svn: 146197
2011-12-08 22:36:35 +00:00
Evan Cheng b96bca81e7 Add 256-bit variant vmovss and vmovsd patterns. rdar://10538417
llvm-svn: 146196
2011-12-08 22:30:45 +00:00
Jim Grosbach db731be7b8 ARM 64-bit VEXT assembly uses a .64 suffix, not .32, amazingly enough.
llvm-svn: 146194
2011-12-08 22:19:04 +00:00
Jim Grosbach ba7d6ed05d ARM VSHR implied destination operand form aliases.
llvm-svn: 146192
2011-12-08 22:06:06 +00:00
Evan Cheng 2a217be25f Add various missing AVX patterns which was causing crashes. Sadly, the generated
code looks pretty bad compared to SSE.

rdar://10538793

llvm-svn: 146191
2011-12-08 22:05:28 +00:00
Jim Grosbach 3a97d946d2 Tidy up a bit.
llvm-svn: 146190
2011-12-08 22:04:40 +00:00
Jim Grosbach ab9c8bb45b ARM VSUB implied destination operand form aliases.
llvm-svn: 146182
2011-12-08 20:56:26 +00:00
Jim Grosbach 27a33edfa0 Tidy up a bit.
llvm-svn: 146181
2011-12-08 20:53:19 +00:00
Jim Grosbach 66c9ad7642 ARM VQADD implied destination operand form aliases.
llvm-svn: 146179
2011-12-08 20:49:43 +00:00
Jim Grosbach e9ee1092e1 ARM a few more VMUL implied destination operand form aliases.
llvm-svn: 146177
2011-12-08 20:42:35 +00:00
Owen Anderson 0b9b9da6c8 Teach SelectionDAG to match more calls to libm functions onto existing SDNodes. Mark these nodes as illegal by default, unless the target declares otherwise.
llvm-svn: 146171
2011-12-08 19:32:14 +00:00
Evan Cheng 3294538546 Add test for r146163.
llvm-svn: 146167
2011-12-08 19:21:39 +00:00
Daniel Dunbar c09e4593b2 Revert r146143, "Fix bug 9905: Failure in code selection for llvm intrinsics
sqrt/exp (fix for FSQRT, FSIN, FCOS, FPOWI, FPOW, FLOG, FLOG2, FLOG10, FEXP,
FEXP2).", it is failing tests.

llvm-svn: 146157
2011-12-08 17:32:18 +00:00
NAKAMURA Takumi 0faa233439 test/CodeGen/X86/vec_compare-2.ll: Add explicit -mtriple=i686-linux.
llvm-svn: 146152
2011-12-08 15:24:09 +00:00
Nadav Rotem 26edb291ac Fix a bug in the integer-promotion of bitcast operations on vector types.
We must not issue a bitcast operation for integer-promotion of vector types, because the
location of the values in the vector may be different.

llvm-svn: 146150
2011-12-08 13:10:01 +00:00
Stepan Dyatkovskiy a4bcf27dae Fix bug 9905: Failure in code selection for llvm intrinsics sqrt/exp (fix for FSQRT, FSIN, FCOS, FPOWI, FPOW, FLOG, FLOG2, FLOG10, FEXP, FEXP2).
llvm-svn: 146143
2011-12-08 07:55:03 +00:00
Jim Grosbach 00326406d4 ARM NEON two-operand aliases for VSHL(immediate).
llvm-svn: 146125
2011-12-08 01:30:04 +00:00
Jim Grosbach f10a635eb4 ARM NEON two-operand aliases for VSHL(register).
llvm-svn: 146123
2011-12-08 01:12:35 +00:00
Jim Grosbach 6600f520b0 ARM optional destination operand variants for VEXT instructions.
llvm-svn: 146114
2011-12-08 00:43:47 +00:00
Jim Grosbach 5ff64c7141 Tidy up.
llvm-svn: 146113
2011-12-08 00:41:54 +00:00
Jim Grosbach 3050625a50 ARM assembler aliases for "add Rd, #-imm" to "sub Rd, #imm".
llvm-svn: 146111
2011-12-08 00:31:07 +00:00
Jim Grosbach 3b559ff3c5 ARM assembly, allow 'asl' as a synonym for 'lsl' in shifted-register operands.
For 'gas' compatibility.

llvm-svn: 146106
2011-12-07 23:40:58 +00:00
Akira Hatanaka ae378af667 32 to 64-bit zext pattern.
llvm-svn: 146096
2011-12-07 23:14:41 +00:00
Jim Grosbach 90d961250b ARM two-operand aliases for VAND/VEOR/VORR instructions.
llvm-svn: 146095
2011-12-07 23:08:12 +00:00
Jim Grosbach 3744a7febb ARM two-operand aliases for VADDW instructions.
llvm-svn: 146093
2011-12-07 23:01:10 +00:00
Jim Grosbach 552691556c ARM two-operand aliases for VADD instructions.
llvm-svn: 146091
2011-12-07 22:52:54 +00:00
Akira Hatanaka b2e05cb6b1 64-bit WrapperPICPat patterns.
llvm-svn: 146086
2011-12-07 22:11:43 +00:00
Akira Hatanaka c5b5a8d8b1 Modify LowerFCOPYSIGN to handle Mips64.
llvm-svn: 146080
2011-12-07 21:48:50 +00:00
Akira Hatanaka 4a04a56a36 Fix 64-bit immediate patterns.
llvm-svn: 146059
2011-12-07 20:10:24 +00:00
Jim Grosbach d6ae4ba002 Darwin assembler improved relocs when w/o subsections_via_symbols.
When the file isn't being built with subsections-via-symbols, symbol
differences involving non-local symbols can be resolved more aggressively.
Needed for gas compatibility.

llvm-svn: 146054
2011-12-07 19:46:59 +00:00
Jim Grosbach 18b0e5dca0 Thumb2 alias for long-form pop and friends.
rdar://10542474

llvm-svn: 146046
2011-12-07 18:32:28 +00:00
Jim Grosbach 7f882399b8 ARM support the .arm and .thumb directives for assembly mode switching.
llvm-svn: 146042
2011-12-07 18:04:19 +00:00
Jim Grosbach 721042fa3a ARM NEON VCLT(register) is a pseudo aliasing VCGT(register).
llvm-svn: 146039
2011-12-07 17:51:15 +00:00
Jim Grosbach a4337ced68 Tidy up. Move MachO tests to MachO directory.
llvm-svn: 146038
2011-12-07 17:50:28 +00:00
Eli Friedman ed8b3e38ec Support vector bitcasts in the AsmPrinter. PR11495.
llvm-svn: 146001
2011-12-07 00:50:54 +00:00
Eli Friedman 0e58cba286 Fix an optimization involving EXTRACT_SUBVECTOR in DAGCombine so it behaves correctly. PR11494.
llvm-svn: 145996
2011-12-07 00:11:56 +00:00
Hal Finkel 0fc34bc2d3 delaying restore-cr changed assigned registers in some tests
llvm-svn: 145963
2011-12-06 20:55:46 +00:00
Hal Finkel 0702bc1b28 add a test case that uses RESTORE_CR
llvm-svn: 145962
2011-12-06 20:55:41 +00:00
Justin Holewinski 04424665c3 PTX: Continue to fix up the register mess.
llvm-svn: 145947
2011-12-06 17:39:48 +00:00
Craig Topper 6572e0f203 Fix a bunch of SSE/AVX patterns to use v2i64/v4i64 loads since all other integer vector loads are promoted to those.
llvm-svn: 145927
2011-12-06 09:04:59 +00:00
NAKAMURA Takumi 51416d5f00 test/MC: Introduce MC/MachO/ARM, and relocate relax-thumb2-branches.s into it.
FIXME: Restore more other arch-dependent MachO tests. (eg. r126401 and r133856)
llvm-svn: 145925
2011-12-06 06:48:26 +00:00
Jim Grosbach e303e24d77 ARM mode 'mul' operand ordering tweak.
Same as r145922, just for ARM mode.

llvm-svn: 145923
2011-12-06 05:28:00 +00:00
Jim Grosbach 5f143be8c5 Thumb2: MUL two-operand form encoding operand order fix.
Fix the alias to encode 'mul r5, r6' as if it were 'mul r5, r6, r5' so we
match gas.

rdar://10532439

llvm-svn: 145922
2011-12-06 05:03:45 +00:00
Craig Topper bf41eb3a98 Merge isSHUFPMask and isCommutedSHUFPMask into single function that can do both. Do the same for the 256-bit version. Use loops to reduce size of isVSHUFPYMask. Fix test cases that were incorrectly passing due to isCommutedSHUFPMask not checking for the vector being 128-bit. This caused some 256-bit shuffles to be incorrectly commuted.
llvm-svn: 145921
2011-12-06 04:59:07 +00:00
Jim Grosbach 175c7d0da5 Thumb2 encoding choice correction for PLD.
Using encoding T1 for offset of #0 and encoding T2 for #-0.

rdar://10532413

llvm-svn: 145919
2011-12-06 04:49:29 +00:00
NAKAMURA Takumi 5bdc0fbabd test/MC: Move relax-thumb2-branches.s from MC/MachO/ to MC/ARM.
MC/MachO assumes x86.

llvm-svn: 145916
2011-12-06 03:56:05 +00:00
Andrew Trick 5df9096584 LSR: prune undesirable formulae early.
It's always good to prune early, but formulae that are unsatisfactory
in their own right need to be removed before running any other pruning
heuristics. We easily avoid generating such formulae, but we need them
as an intermediate basis for forming other good formulae.

llvm-svn: 145906
2011-12-06 03:13:31 +00:00
Chad Rosier c77830d21e [arm-fast-isel] Doublewords only require word-alignment.
rdar://10528060

llvm-svn: 145891
2011-12-06 01:44:17 +00:00
Jakob Stoklund Olesen 2e05db2fa0 Align ARM constant pool islands via their basic block.
Previously, all ARM::CONSTPOOL_ENTRY instructions had a hardwired
alignment of 4 bytes emitted by ARMAsmPrinter.  Now the same alignment
is set on the basic block.

This is in preparation of supporting ARM constant pool islands with
different alignments.

llvm-svn: 145890
2011-12-06 01:43:02 +00:00
Jim Grosbach 9105085b4a Fix ARM handling of tBcc branch relaxation.
rdar://10069056

llvm-svn: 145885
2011-12-06 01:08:19 +00:00
Chad Rosier 8abf65a130 Probably not a good idea to convert a single vector load into a memcpy. We
don't do this now, but add a test case to prevent this from happening in the
future.
Additional test for rdar://9892684

llvm-svn: 145879
2011-12-06 00:19:08 +00:00
Chad Rosier 19446a07a7 Make the MemCpyOptimizer a bit more aggressive. I can't think of a scenerio
where this would be bad as the backend shouldn't have a problem inlining small
memcpys.
rdar://10510150

llvm-svn: 145865
2011-12-05 22:37:00 +00:00
Jim Grosbach b8c719ccc6 Tweak ADDrr fix. Bad check for explicit .w
llvm-svn: 145863
2011-12-05 22:27:04 +00:00
Jim Grosbach 8b5e92577b Update tests for r145860. Add a few new ones.
llvm-svn: 145861
2011-12-05 22:21:28 +00:00
Akira Hatanaka 20cee2eba1 Add definitions of 64-bit extract and insert instrucions and make
PerformANDCombine and PerformOrCombine aware of them. Test cases are included
too.

llvm-svn: 145853
2011-12-05 21:26:34 +00:00
Jim Grosbach ec9ba98299 Thumb2 prefer encoding T3 to T4 for ADD/SUB immediate instructions.
rdar://10529348

llvm-svn: 145851
2011-12-05 21:06:26 +00:00
Akira Hatanaka 34e3df76f9 Have LowerJumpTable support Mips64. Modify 2010-07-20-Switch.ll to test N64 and
O32 with relocation-model=pic too.

llvm-svn: 145850
2011-12-05 21:03:03 +00:00
Jim Grosbach fdf9e1587a ARM assembly parsing for the rest of the VMUL data type aliases.
Finish up rdar://10522016.

llvm-svn: 145846
2011-12-05 20:29:59 +00:00
Hal Finkel 97a6028b3a Add test case - this input used to crash because of duplicate generation of SPILL_CRs
llvm-svn: 145820
2011-12-05 17:55:22 +00:00
Hal Finkel 8f6834dfa5 enable PPC register scavenging by default (update tests and remove some FIXMEs)
llvm-svn: 145819
2011-12-05 17:55:17 +00:00
Hal Finkel e18c72689c remove wasted space for extra bit copies of CR2 subregs
llvm-svn: 145817
2011-12-05 17:55:06 +00:00
NAKAMURA Takumi e6efe405de test/CodeGen/X86/pointer-vector.ll: Add explicit -mtriple=i686-linux.
llvm-svn: 145805
2011-12-05 07:54:57 +00:00
Nadav Rotem 3924cb0267 Add support for vectors of pointers.
llvm-svn: 145801
2011-12-05 06:29:09 +00:00
Anton Korobeynikov 965e0c6de2 Emit the ctors in the proper order on ARM/EABI.
Maybe some targets should use this as well.

Patch by Evgeniy Stepanov!

llvm-svn: 145781
2011-12-03 23:49:37 +00:00
Venkatraman Govindaraju 6dae604f50 Sparc CodeGen: Fix AnalyzeBranch for PR 10282. Removing addSuccessor() since
AnalyzeBranch doesn't change the successor, just the order.

llvm-svn: 145779
2011-12-03 21:24:48 +00:00
Sanjoy Das 006e43bcc0 Check for stack space more intelligently.
libgcc sets the stack limit field in TCB to 256 bytes above the actual
allocated stack limit.  This means if the function's stack frame needs
less than 256 bytes, we can just compare the stack pointer with the
stack limit.  This should result in lesser calls to __morestack.

llvm-svn: 145766
2011-12-03 09:32:07 +00:00
Sanjoy Das 165ca1d4ba Fix a bug in the x86-32 code generated for segmented stacks.
Currently LLVM pads the call to __morestack with a add and sub of 8
bytes to esp.  This isn't correct since __morestack expects the call
to be followed directly by a ret.

This commit also adjusts the relevant test-case.

llvm-svn: 145765
2011-12-03 09:21:07 +00:00
Chad Rosier ec3b77e00d [arm-fast-isel] Unaligned stores of floats require special care.
rdar://10510150

llvm-svn: 145742
2011-12-03 02:21:57 +00:00
Pete Cooper e03fe83d98 Fixed deadstoreelimination bug where negative indices were incorrectly causing the optimisation to occur
Turns out long long + unsigned long long is unsigned.  Doh!

Fixes http://llvm.org/bugs/show_bug.cgi?id=11455

llvm-svn: 145731
2011-12-03 00:04:30 +00:00
Chad Rosier 0155a63513 Add support for constant folding the pow intrinsic.
rdar://10514247

llvm-svn: 145730
2011-12-03 00:00:03 +00:00
Akira Hatanaka 430f917fbe Test cases for 64-bit multiplication and division.
llvm-svn: 145717
2011-12-02 22:31:36 +00:00
Akira Hatanaka bbc5555bee Fix test cases to use FileCheck.
llvm-svn: 145716
2011-12-02 22:28:09 +00:00
Jim Grosbach 7276397f41 ARM tests for VLD1 single lane w/ writeback.
llvm-svn: 145713
2011-12-02 22:03:52 +00:00
Chad Rosier 9fd0e55e91 [arm-fast-isel] After promoting a function parameter be sure to update the
argument value type.  Otherwise, the sign/zero-extend has no effect on arguments
passed via the stack (i.e., undefined high-order bits).
rdar://10515467

llvm-svn: 145701
2011-12-02 20:25:18 +00:00
Hal Finkel d87f7af1f3 specify cpu for test to fix failure on some darwin systems with a g4+ cpu
llvm-svn: 145699
2011-12-02 19:38:17 +00:00
Jim Grosbach e7dcbc8691 Clean up aliases for ARM VLD1 single-lane assembly parsing a bit.
Add the 16-bit lane variants while I'm at it.

llvm-svn: 145693
2011-12-02 18:52:30 +00:00
Craig Topper abeb79eee3 Add instruction selection support for horizontal add/sub of 256-bit floating point vectors. Also add the test case for 256-bit integer vectors.
llvm-svn: 145680
2011-12-02 07:16:01 +00:00
Hal Finkel 9286705955 adjust the instruction ordering in some PPC tests: changes due to postRA haz. rec.
llvm-svn: 145678
2011-12-02 04:58:12 +00:00
Chad Rosier 3367123b12 Prevent library calls from being folded if -fno-builtin has been specified.
rdar://10500969

llvm-svn: 145639
2011-12-01 22:14:50 +00:00
Pete Cooper fdddc27143 Improved fix for abs(val) != 0 to check other similar case. Also fixed style issues and confusing comment
llvm-svn: 145618
2011-12-01 19:13:26 +00:00
Eric Christopher 9da7f305a4 For 64-bit the rest of the general regs are ok for the q constraint. Make
sure we can emit both the high and low versions of those registers.

Fixes rdar://10392864

llvm-svn: 145579
2011-12-01 08:12:41 +00:00
Eli Friedman d61887dd0a Pass AVX vectors which are arguments to varargs functions on the stack. <rdar://problem/10463281>.
llvm-svn: 145573
2011-12-01 04:49:21 +00:00
Pete Cooper 3b7f35bf08 Removed use of grep from test and moved it to be with other icmp tests
llvm-svn: 145570
2011-12-01 04:35:26 +00:00
Pete Cooper bc5c524b71 Added instcombine pattern to spot comparing -val or val against 0.
(val != 0) == (-val != 0) so "abs(val) != 0" becomes "val != 0"

Fixes <rdar://problem/10482509>

llvm-svn: 145563
2011-12-01 03:58:40 +00:00
Jan Sjödin 9430e284a9 Support for encoding all FMA4 instructions and tablegen patterns for all
remaining FMA4 instructions and intrinsics with tests.

llvm-svn: 145525
2011-11-30 22:09:42 +00:00
Eli Friedman 6cff9df298 Make GlobalMerge honor the preferred alignment on globals without an explicitly specified alignment.
<rdar://problem/10497732>.

llvm-svn: 145523
2011-11-30 21:54:15 +00:00
Jim Grosbach 7d8517b1d4 Add some tests for all-lanes VLD1 parsing.
llvm-svn: 145512
2011-11-30 19:37:38 +00:00
Nadav Rotem 0a1801015c Add test arch to make it pass on non x86 targets
llvm-svn: 145498
2011-11-30 17:34:28 +00:00
Nadav Rotem 66427bcce9 Add a tripple to the test
llvm-svn: 145489
2011-11-30 11:20:56 +00:00
Nadav Rotem 96923cc2bb X86: PerformOrCombine introduced a vselect node with a wrong order of operands. This bug was introduced when a dedicated blend sdnode was replaced with the vselect node (in 139479).
llvm-svn: 145488
2011-11-30 10:13:37 +00:00
Andrew Trick 613c67e475 Better test case found in duplicate PR10570.
llvm-svn: 145484
2011-11-30 06:26:42 +00:00
Andrew Trick ceafa2c746 LSR: handle the expansion of phi operands that use postinc forms of the IV.
Fixes PR11431: SCEVExpander::expandAddRecExprLiterally(const llvm::SCEVAddRecExpr*): Assertion `(!isa<Instruction>(Result) || SE.DT->dominates(cast<Instruction>(Result), Builder.GetInsertPoint())) && "postinc expansion does not dominate use"' failed.

llvm-svn: 145482
2011-11-30 06:07:54 +00:00
Chad Rosier 82e1bd8e94 Add support for sqrt, sqrtl, and sqrtf in TargetLibraryInfo. Disable
(fptrunc (sqrt (fpext x))) -> (sqrtf x) transformation if -fno-builtin is 
specified.
rdar://10466410

llvm-svn: 145460
2011-11-29 23:57:10 +00:00
Jakob Stoklund Olesen f50d2eafdb FileCheckize.
llvm-svn: 145452
2011-11-29 23:09:16 +00:00
Akira Hatanaka dc25f9f38a Change names for MIPS "generic" processors defined in Mips.td to match what GNU
tools use. Patch by Simon Atanasyan.

"mips32r1" => "mips32"
"4ke" => mips32r2"
"mips64r1" => "mips64"

llvm-svn: 145451
2011-11-29 23:08:41 +00:00
Jim Grosbach 5ee209ce3a ARM assembly parsing and encoding for four-register VST1.
llvm-svn: 145450
2011-11-29 22:58:48 +00:00
Evan Cheng 648e48d02e Add another missing pattern. llvm-gcc likes f64 but clang likes i64 so it was generating poor code for some SSE builtins.
llvm-svn: 145448
2011-11-29 22:48:34 +00:00
Jim Grosbach 2a9c43649a Enable some VST1 tests and add a few more.
llvm-svn: 145443
2011-11-29 22:40:32 +00:00
Jakob Stoklund Olesen bde32d36bb Make X86::FsFLD0SS / FsFLD0SD real pseudo-instructions.
Like V_SET0, these instructions are expanded by ExpandPostRA to xorps /
vxorps so they can participate in execution domain swizzling.

This also makes the AVX variants redundant.

llvm-svn: 145440
2011-11-29 22:27:25 +00:00
Chad Rosier 46addb9e07 If fast-isel fails, remove dead instructions generated during the failed
attempt.  

llvm-svn: 145425
2011-11-29 19:40:47 +00:00
Duncan Sands ca6f8ddbf8 Fix a theoretical problem (not seen in the wild): if different instances of a
weak variable are compiled by different compilers, such as GCC and LLVM, while
LLVM may increase the alignment to the preferred alignment there is no reason to
think that GCC will use anything more than the ABI alignment.  Since it is the
GCC version that might end up in the final program (as the linkage is weak), it
is wrong to increase the alignment of loads from the global up to the preferred
alignment as the alignment might only be the ABI alignment.

Increasing alignment up to the ABI alignment might be OK, but I'm not totally
convinced that it is.  It seems better to just leave the alignment of weak
globals alone.

llvm-svn: 145413
2011-11-29 18:26:38 +00:00
Michael J. Spencer de3a2118db MC/X86/COFF: Allow quotes in names when targeting MS/Windows,
as MC is the only assembler we support.

This splits MS/Windows and GNU/Windows ASM infos into two seperate classes.
While there is currently only one difference, full MS C++ ABI support will
require many more.

llvm-svn: 145409
2011-11-29 18:00:06 +00:00
Danil Malyshev cbe72fc959 Fixed ObjectFile functions:
- getSymbolOffset() renamed as getSymbolFileOffset()
- getSymbolFileOffset(), getSymbolAddress(), getRelocationAddress() returns same result for ELFObjectFile, MachOObjectFile and COFFObjectFile.
- added getRelocationOffset()
- fixed MachOObjectFile::getSymbolSize()
- fixed MachOObjectFile::getSymbolSection()
- fixed MachOObjectFile::getSymbolOffset() for symbols without section data.

llvm-svn: 145408
2011-11-29 17:40:10 +00:00
Elena Demikhovsky 7a81dea516 Fixed vsqrt.ss intrinsic usage - order of input operands was wrong.
Added a test.
Thanks Bruno for reviewing the patch.

llvm-svn: 145403
2011-11-29 15:00:45 +00:00
Craig Topper 1d63ae3731 Fix shuffle decoding for memory forms for (V)SHUFPS/D.
llvm-svn: 145392
2011-11-29 07:58:09 +00:00
Craig Topper c16db840be Fix issues in shuffle decoding around VPERM* instructions. Fix shuffle decoding for VSHUFPS/D for 256-bit types. Add pattern matching for memory forms of VPERMILPS/VPERMILPD.
llvm-svn: 145390
2011-11-29 07:49:05 +00:00
Craig Topper 12b72def4e Fix VINSERTF128/VEXTRACTF128 to be marked as FP instructions. Allow execution dependency fix pass to convert them to their integer equivalents when AVX2 is enabled.
llvm-svn: 145376
2011-11-29 05:37:58 +00:00
Craig Topper 897a7d4b9c Correctly mark VPERM2F128 as being an FP instruction and add execution domain fixing support to convert it to VPERM2I128 for AVX2.
llvm-svn: 145370
2011-11-29 03:57:34 +00:00
Andrew Trick d25089f8e0 SCEV fix. In general, Add/Mul expressions should not inherit NSW/NUW.
This reverts r139450, fixes r139453, and adds much needed comments and a
unit test.

llvm-svn: 145367
2011-11-29 02:16:38 +00:00
Andrew Trick 5ec136c57e Filecheckize.
llvm-svn: 145363
2011-11-29 02:05:23 +00:00
Andrew Trick e756031a62 Reenable this IndVars unit test.
SCEV can't optimize undef in all cases, which is a separate issue from this test case.

llvm-svn: 145343
2011-11-29 00:52:04 +00:00
Eli Friedman b3f9b0676a Add a missing safety check to ProcessUGT_ADDCST_ADD. Fixes PR11438.
llvm-svn: 145316
2011-11-28 23:32:19 +00:00
Eli Friedman e7ab1a2f0f Make SelectionDAG::InferPtrAlignment use llvm::ComputeMaskedBits instead of duplicating the logic for globals. Make llvm::ComputeMaskedBits handle GlobalVariables slightly more aggressively, to match what InferPtrAlignment knew how to do.
llvm-svn: 145304
2011-11-28 22:48:22 +00:00
Evan Cheng 4a5b2040e2 Revert r145273 and fix in SelectionDAG::InferPtrAlignment() instead.
Conservatively returns zero when the GV does not specify an alignment nor is it
initialized. Previously it returns ABI alignment for type of the GV. However, if
the type is a "packed" type, then the under-specified alignments is attached to
the load / store instructions. In that case, the alignment of the type cannot be
trusted.
rdar://10464621

llvm-svn: 145300
2011-11-28 22:37:34 +00:00
Evan Cheng a4b6404cf0 DAG combine should not increase alignment of loads / stores with alignment less
than ABI alignment. These are loads / stores from / to "packed" data structures.
Their alignments are intentionally under-specified.

rdar://10301431

llvm-svn: 145273
2011-11-28 20:42:56 +00:00
Craig Topper 818a983e93 Add X86 instruction selection for VPERM2I128 when AVX2 is enabled. Merge VPERMILPS/VPERMILPD detection since they are pretty similar.
llvm-svn: 145238
2011-11-28 10:14:51 +00:00
NAKAMURA Takumi 8284ec46b6 test/lit.cfg: Enable the feature 'asserts' to check output of llc -version.
llc knows whether he is compiled with -DNDEBUG.
|  Optimized build with assertions.

llvm-svn: 145230
2011-11-28 05:09:15 +00:00
Chris Lattner 251d827d2c remove a test that is using old-style llvm.dbg intrinsics, apparently only
fails on ppc and arm hosts.

llvm-svn: 145188
2011-11-27 18:13:47 +00:00
Chandler Carruth 03adbd46ca Take two on rotating the block ordering of loops. My previous attempt
was centered around the premise of laying out a loop in a chain, and
then rotating that chain. This is good for preserving contiguous layout,
but bad for actually making sane rotations. In order to keep it safe,
I had to essentially make it impossible to rotate deeply nested loops.
The information needed to correctly reason about a deeply nested loop is
actually available -- *before* we layout the loop. We know the inner
loops are already fused into chains, etc. We lose information the moment
we actually lay out the loop.

The solution was the other alternative for this algorithm I discussed
with Benjamin and some others: rather than rotating the loop
after-the-fact, try to pick a profitable starting block for the loop's
layout, and then use our existing layout logic. I was worried about the
complexity of this "pick" step, but it turns out such complexity is
needed to handle all the important cases I keep teasing out of benchmarks.

This is, I'm afraid, a bit of a work-in-progress. It is still
misbehaving on some likely important cases I'm investigating in Olden.
It also isn't really tested. I'm going to try to craft some interesting
nested-loop test cases, but it's likely to be extremely time consuming
and I don't want to go there until I'm sure I'm testing the correct
behavior. Sadly I can't come up with a way of getting simple, fine
grained test cases for this logic. We need complex loop structures to
even trigger much of it.

llvm-svn: 145183
2011-11-27 13:34:33 +00:00
Chandler Carruth a054580993 Rework a bit of the implementation of loop block rotation to not rely so
heavily on AnalyzeBranch. That routine doesn't behave as we want given
that rotation occurs mid-way through re-ordering the function. Instead
merely check that there are not unanalyzable branching constructs
present, and then reason about the CFG via successor lists. This
actually simplifies my mental model for all of this as well.

The concrete result is that we now will rotate more loop chains. I've
added a test case from Olden highlighting the effect. There is still
a bit more to do here though in order to regain all of the performance
in Olden.

llvm-svn: 145179
2011-11-27 09:22:53 +00:00
Chris Lattner ee471c484a remove autoupgrade support for old forms of llvm.prefetch and the old
trampoline forms.  Both of these were correct in LLVM 3.0, and we don't
need to support LLVM 2.9 and earlier in mainline.

llvm-svn: 145174
2011-11-27 07:42:04 +00:00
Chris Lattner 6a144a2227 Upgrade syntax of tests using volatile instructions to use 'load volatile' instead of 'volatile load', which is archaic.
llvm-svn: 145171
2011-11-27 06:54:59 +00:00
Chris Lattner 90ef78c07f remove autoupgrade support for really old-style debug info intrinsics.
I think this is the last of autoupgrade that can be removed in 3.1.
Can the atomic upgrade stuff also go?

llvm-svn: 145169
2011-11-27 06:18:33 +00:00
Chris Lattner 6aa6c0c3b7 remove some old autoupgrade logic
llvm-svn: 145167
2011-11-27 06:10:54 +00:00
Chris Lattner 1c9e5678b8 remove support for reading llvm 2.9 .bc files. LLVM 3.1 is only compatible back to 3.0
llvm-svn: 145164
2011-11-27 05:48:27 +00:00
Wesley Peck 97b3da5433 Add several new instructions supported by the latest MicroBlaze.
These instructions are not generated by the backend yet, this will come in a later commit.

llvm-svn: 145161
2011-11-27 05:16:58 +00:00
Chandler Carruth 9ffb97e631 Introduce a loop block rotation optimization to the new block placement
pass. This is designed to achieve one of the important optimizations
that the old code placement pass did, but more simply.

This is a somewhat rough and *very* conservative version of the
transform. We could get a lot fancier here if there are profitable cases
to do so. In particular, this only looks for a single pattern, it
insists that the loop backedge being rotated away is the last backedge
in the chain, and it doesn't provide any means of doing better in-loop
placement due to the rotation. However, it appears that it will handle
the important loops I am finding in the LLVM test suite.

llvm-svn: 145158
2011-11-27 00:38:03 +00:00
Chandler Carruth f156f0cf57 FileCheck-ize this test and make it more precise. This is in preparation
for adding other tests.

llvm-svn: 145143
2011-11-26 08:24:25 +00:00
Eli Friedman a84ad7d0d0 Fix APFloat::convert so that it handles narrowing conversions correctly; it
was returning incorrect values in rare cases, and incorrectly marking
exact conversions as inexact in some more common cases. Fixes PR11406, and a
missed optimization in test/CodeGen/X86/fp-stack-O0.ll.

llvm-svn: 145141
2011-11-26 03:38:02 +00:00
Bruno Cardoso Lopes 0f9a1f5e6c This patch contains support for encoding FMA4 instructions and
tablegen patterns for scalar FMA4 operations and intrinsic. Also
add tests for vfmaddsd.

Patch by Jan Sjodin

llvm-svn: 145133
2011-11-25 19:33:42 +00:00
Craig Topper d65a444478 Remove 256-bit specific node types for UNPCKHPS/D and instead use the 128-bit versions and let the operand type disinquish. Also fix the load form of the v8i32 patterns for these to realize that the load would be promoted to v4i64.
llvm-svn: 145126
2011-11-24 22:57:10 +00:00
Benjamin Kramer 651db37352 X86: alias cqo to cqto.
llvm-svn: 145121
2011-11-24 12:02:46 +00:00
Chandler Carruth 7adee1a01a Fix a silly use-after-free issue. A much earlier version of this code
need lots of fanciness around retaining a reference to a Chain's slot in
the BlockToChain map, but that's all gone now. We can just go directly
to allocating the new chain (which will update the mapping for us) and
using it.

Somewhat gross mechanically generated test case replicates the issue
Duncan spotted when actually testing this out.

llvm-svn: 145120
2011-11-24 11:23:15 +00:00
Chandler Carruth d394bafd2d When adding blocks to the list of those which no longer have any CFG
conflicts, we should only be adding the first block of the chain to the
list, lest we try to merge into the middle of that chain. Most of the
places we were doing this we already happened to be looking at the first
block, but there is no reason to assume that, and in some cases it was
clearly wrong.

I've added a couple of tests here. One already worked, but I like having
an explicit test for it. The other is reduced from a test case Duncan
reduced for me and used to crash. Now it is handled correctly.

llvm-svn: 145119
2011-11-24 08:46:04 +00:00
Richard Smith 4f9a8081c3 Correctly byte-swap APInts with bit-widths greater than 64.
llvm-svn: 145111
2011-11-23 21:33:37 +00:00
Duncan Sands 81a2af12d6 Fix a crash in which a multiplication was being reported as being both negative
and positive: positive, because it could be directly computed to be positive;
negative, because the nsw flags means it is either negative or undefined (the
multiplication always overflowed).

llvm-svn: 145104
2011-11-23 16:26:47 +00:00
Benjamin Kramer ebcb451874 X86: Use btq for bit tests if the immediate can't be encoded in 32 bits.
Before:
	movabsq	$4294967296, %rax       ## encoding: [0x48,0xb8,0x00,0x00,0x00,0x00,0x01,0x00,0x00,0x00]
	testq	%rax, %rdi              ## encoding: [0x48,0x85,0xf8]
	jne	LBB0_2                  ## encoding: [0x75,A]

After:
	btq	$32, %rdi               ## encoding: [0x48,0x0f,0xba,0xe7,0x20]
	jb	LBB0_2                  ## encoding: [0x72,A]

btq is usually slower than testq because it doesn't fuse with the jump, but here we're better off
saving one register and a giant movabsq.

llvm-svn: 145103
2011-11-23 13:54:17 +00:00
NAKAMURA Takumi 0b3e996485 test/CodeGen/X86/block-placement.ll: Add explicit -mtriple=i686-linux. X86 Win32 CodeGen does not support EH yet.
llvm-svn: 145101
2011-11-23 12:18:22 +00:00
Chandler Carruth 99fe42fbd9 Relax an invariant that block placement was trying to assert a bit
further. This invariant just wasn't going to work in the face of
unanalyzable branches; we need to be resillient to the phenomenon of
chains poking into a loop and poking out of a loop. In fact, we already
were, we just needed to not assert on it.

This was found during a bootstrap with block placement turned on.

llvm-svn: 145100
2011-11-23 10:35:36 +00:00
Elena Demikhovsky 779ba6d7b7 I added several lines in X86 code generator that allow to choose
VSHUFPS/VSHUFPD instructions while lowering VECTOR_SHUFFLE node. I check a commuted VSHUFP mask.

The patch was reviewed by Bruno.

llvm-svn: 145099
2011-11-23 10:23:16 +00:00
Chandler Carruth 8c68f1f3c8 Handle the case of a no-return invoke correctly. It actually still has
successors, they just are all landing pad successors. We handle this the
same way as no successors. Comments attached for the next person to wade
through here and another lovely test case courtesy of Benjamin Kramer's
bugpoint reduction.

llvm-svn: 145098
2011-11-23 08:23:54 +00:00
Bob Wilson ebb44646c4 Enable stack protectors for all arrays, not just char arrays. rdar://5875909
Patch by Bill Wendling.

llvm-svn: 145097
2011-11-23 07:13:56 +00:00
Jakob Stoklund Olesen 02845410f9 Fix PR11422.
This was a bug in keeping track of the available domains when merging
domain values.

The wrong domain mask caused ExecutionDepsFix to try to move VANDPSYrr
to the integer domain which is only available in AVX2.

Also add an assertion to catch future attempts at emitting AVX2
instructions.

llvm-svn: 145096
2011-11-23 04:03:08 +00:00
Chandler Carruth 4a87aa0c31 Fix a crash in block placement due to an inner loop that happened to be
reversed in the function's original ordering, and we happened to
encounter it while handling an outer unnatural CFG structure.

Thanks to the test case reduced from GCC's source by Benjamin Kramer.
This may also fix a crasher in gzip that Duncan reduced for me, but
I haven't yet gotten to testing that one.

llvm-svn: 145094
2011-11-23 03:03:21 +00:00
Kostya Serebryany 8b5c7a56a3 [asan] do not instrument threadlocal globals, this is buggy
llvm-svn: 145092
2011-11-23 02:10:54 +00:00
Hal Finkel 6f0ae783fe add basic PPC register-pressure feedback; adjust the vaarg test to match the new register-allocation pattern
llvm-svn: 145065
2011-11-22 16:21:04 +00:00
Chandler Carruth ee54feb6f6 Fix a devilish miscompile exposed by block placement. The
updateTerminator code didn't correctly handle EH terminators in one very
specific case. AnalyzeBranch would find no terminator instruction, and
so the fallback in updateTerminator is to assume fallthrough. This is
correct, but the destination of the fallthrough was assumed to be the
first successor.

This is *almost always* true, but in certain cases the loop
transformations will cause the landing pad to be the first successor!
Instead of this brittle logic, actually look through the successors for
a non-landing-pad accessor, and to assert if more than one is found.

This will hopefully fix some (if not all) of the self host miscompiles
with block placement. Thanks to Benjamin Kramer for reporting, Nick
Lewycky for an initial stab at a reduction, and Duncan for endless
advice on EH (which I know nothing about) as well as reviewing the
actual fix.

llvm-svn: 145062
2011-11-22 13:13:16 +00:00
Rafael Espindola c55e1af137 Add triple to the test.
llvm-svn: 145057
2011-11-22 06:36:25 +00:00
Rafael Espindola 2021f38281 If a register is both an early clobber and part of a tied use, handle the use
before the clobber so that we copy the value if needed.

Fixes pr11415.

llvm-svn: 145056
2011-11-22 06:27:18 +00:00
Nick Lewycky 063ae5897c Fix crasher in GVN due to my recent capture tracking changes.
llvm-svn: 145047
2011-11-21 19:42:56 +00:00
Craig Topper 6270d072c5 Lowering for v32i8 to VPUNPCKLBW/VPUNPCKHBW when AVX2 is enabled.
llvm-svn: 145028
2011-11-21 08:26:50 +00:00
Craig Topper d12d6f4b1c Test case for r145026
llvm-svn: 145027
2011-11-21 06:58:09 +00:00
Craig Topper a065238c6e Make LowerSIGN_EXTEND_INREG split 256-bit vectors when AVX1 is enabled and use AVX2 shifts when AVX2 is enabled.
llvm-svn: 145022
2011-11-21 01:12:36 +00:00
NAKAMURA Takumi 76dfa03874 test/CodeGen/X86/block-placement.ll: Relax expressions for Win32.
llvm-svn: 145011
2011-11-20 12:49:45 +00:00
Chandler Carruth 18dfac385b The logic for breaking the CFG in the presence of hot successors didn't
properly account for the *global* probability of the edge being taken.
This manifested as a very large number of unconditional branches to
blocks being merged against the CFG even though they weren't
particularly hot within the CFG.

The fix is to check whether the edge being merged is both locally hot
relative to other successors for the source block, and globally hot
compared to other (unmerged) predecessors of the destination block.

This introduces a new crasher on GCC single-source, but it's currently
behind a flag, and Ben has offered to work on the reduction. =]

llvm-svn: 145010
2011-11-20 11:22:06 +00:00
Benjamin Kramer 650c09aa4d XFAIL this test until I figure out what indvars is doing here (or find someone who does)
llvm-svn: 145008
2011-11-20 11:10:03 +00:00
Chandler Carruth 20df3953d3 Add some comments to the latest test case I added here to document what
is actually being tested. Also add some FileCheck goodness to much more
carefully ensure that the result is the desired result. Before this test
would only have failed through an assert failure if the underlying fix
were reverted.

Also, add some weight metadata and a comment explaining exactly what is
going on to a trick section of the test case. Originally, we were
getting very unlucky and trying to form a block chain that isn't
actually profitable. I'm working on a fix to avoid forming these
unprofitable chains, and that would also have masked any failure from
this test case. The easy solution is to add some metadata that makes it
*really* profitable to form the bad chain here.

llvm-svn: 145006
2011-11-20 09:30:40 +00:00
Craig Topper e79761df73 Add code for lowering v32i8 shifts by a splat to AVX2 immediate shift instructions. Remove 256-bit splat handling from LowerShift as it was already handled by PerformShiftCombine.
llvm-svn: 145005
2011-11-20 00:12:05 +00:00
Craig Topper a3a6583694 Use 256-bit vcmpeqd for creating an all ones vector when AVX2 is enabled.
llvm-svn: 145004
2011-11-19 22:34:59 +00:00
Chandler Carruth f3dc9eff16 Move the handling of unanalyzable branches out of the loop-driven chain
formation phase and into the initial walk of the basic blocks. We
essentially pre-merge all blocks where unanalyzable fallthrough exists,
as we won't be able to update the terminators effectively after any
reorderings. This is quite a bit more principled as there may be CFGs
where the second half of the unanalyzable pair has some analyzable
predecessor that gets placed first. Then it may get placed next,
implicitly breaking the unanalyzable branch even though we never even
looked at the part that isn't analyzable. I've included a test case that
triggers this (thanks Benjamin yet again!), and I'm hoping to synthesize
some more general ones as I dig into related issues.

Also, to make this new scheme work we have to be able to handle branches
into the middle of a chain, so add this check. We always fallback on the
incoming ordering.

Finally, this starts to really underscore a known limitation of the
current implementation -- we don't consider broken predecessors when
merging successors. This can caused major missed opportunities, and is
something I'm planning on looking at next (modulo more bug reports).

llvm-svn: 144994
2011-11-19 10:26:02 +00:00
Craig Topper 6d77f4ae14 Test cases for SSSE3/AVX integer horizontal add/sub.
llvm-svn: 144990
2011-11-19 09:03:33 +00:00
Craig Topper de6b73bb4d Extend VPBLENDVB and VPSIGN lowering to work for AVX2.
llvm-svn: 144987
2011-11-19 07:07:26 +00:00
Andrew Trick 6b4d578f54 Fix a corner case in updating LoopInfo after fully unrolling an outer loop.
The loop tree's inclusive block lists are painful and expensive to
update. (I have no idea why they're inclusive). The design was
supposed to handle this case but the implementation missed it and my
unit tests weren't thorough enough.

Fixes PR11335: loop unroll update.

llvm-svn: 144970
2011-11-18 03:42:41 +00:00
Nadav Rotem 1ec141d0f9 Add AVX2 vpbroadcast support
llvm-svn: 144967
2011-11-18 02:49:55 +00:00
Kostya Serebryany 1cdc6e9567 [asan] workaround for reg alloc bug 11395: don't instrument functions with large chunks of inline assembler
llvm-svn: 144962
2011-11-18 01:41:06 +00:00
Devang Patel 107e8ec30d DISubrange supports unsigned lower/upper array bounds, so let's not fake it in the end while emitting DWARF. If a FE needs to encode signed lower/upper array bounds then we need to extend DISubrange or ad DISignedSubrange.
llvm-svn: 144937
2011-11-17 23:43:15 +00:00
Andrew Trick 949045864d Fix an overly general check in SimplifyIndvar to handle useless phi cycles.
The right way to check for a binary operation is
cast<BinaryOperator>. The original check: cast<Instruction> &&
numOperands() == 2 would match phi "instructions", leading to an
infinite loop in extreme corner case: a useless phi with operands
[self, constant] that prior optimization passes failed to remove,
being used in the loop by another useless phi, in turn being used by an
lshr or udiv.

Fixes PR11350: runaway iteration assertion.

llvm-svn: 144935
2011-11-17 23:36:35 +00:00
Kostya Serebryany 65e2211b95 fall back to explicit list of allowed linkages when instrumenting globals in asan; add a test check that asan does not touch linkonce_odr
llvm-svn: 144933
2011-11-17 23:14:59 +00:00
Chad Rosier f83ab704e4 When fast iseling a GEP, accumulate the offset rather than emitting a series of
ADDs.  MaxOffs is used as a threshold to limit the size of the offset. Tradeoffs
being: (1) If we can't materialize the large constant then we'll cause fast-isel
to bail. (2) Too large of an offset can't be directly encoded in the ADD
resulting in a MOV+ADD.  Generally not a bad thing because otherwise we would
have had ADD+ADD, but on Thumb this turns into a MOVS+MOVT+ADD. Working on a fix
for that. (3) Conversely, too low of a threshold we'll miss opportunities to 
coalesce ADDs.
rdar://10412592

llvm-svn: 144886
2011-11-17 07:15:58 +00:00
Eli Friedman 489c0ff4a4 Add support for custom names for library functions in TargetLibraryInfo. Add a custom name for fwrite and fputs on x86-32 OSX. Make SimplifyLibCalls honor the custom
names for fwrite and fputs.

Fixes <rdar://problem/9815881>.

llvm-svn: 144876
2011-11-17 01:27:36 +00:00
Daniel Dunbar 586aabc44a build/make/test: Get rid of unused BUGPOINT_TOPTS variable.
llvm-svn: 144864
2011-11-16 23:56:03 +00:00
Eli Friedman ff1eaa7578 Make sure to replace the chain properly when DAGCombining a LOAD+EXTRACT_VECTOR_ELT into a single LOAD. Fixes PR10747/PR11393.
llvm-svn: 144863
2011-11-16 23:50:22 +00:00
Jim Grosbach f4d2e0d458 Remove obsolete test.
The PLD encoding is checked via the .s file now.

llvm-svn: 144853
2011-11-16 22:50:38 +00:00
Jim Grosbach d3f02cbce9 Generalize the fixup info for ARM mode.
We don't (yet) have the granularity in the fixups to be specific about which
bitranges are affected. That's a future cleanup, but we're not there yet.

llvm-svn: 144852
2011-11-16 22:48:37 +00:00
Jim Grosbach d66cb5ab33 Update test for r144842.
llvm-svn: 144851
2011-11-16 22:46:27 +00:00
Evan Cheng 011538dc79 Another missing X86ISD::MOVLPD pattern. rdar://10450317
llvm-svn: 144839
2011-11-16 22:24:44 +00:00
Evan Cheng 822ddde50d Disable expensive two-address optimizations at -O0. rdar://10453055
llvm-svn: 144806
2011-11-16 18:44:48 +00:00
Nick Lewycky db408f1857 Fix typo in test.
llvm-svn: 144774
2011-11-16 03:56:38 +00:00
Nick Lewycky c7f1e7993c Merge isObjectPointerWithTrustworthySize with getPointerSize. Use it when
looking at the size of the pointee. Fixes PR11390!

llvm-svn: 144773
2011-11-16 03:49:48 +00:00
Eli Friedman e6270395e3 Fix testcase.
llvm-svn: 144769
2011-11-16 03:03:52 +00:00
Eli Friedman 87f92512c3 CONCAT_VECTORS can have more than two operands. PR11389.
llvm-svn: 144768
2011-11-16 02:52:39 +00:00
Kostya Serebryany 6e6b03ec46 AddressSanitizer, first commit (compiler module only)
llvm-svn: 144758
2011-11-16 01:35:23 +00:00
Andrew Trick 90c7a108ca Fix SCEV overly optimistic back edge taken count for multi-exit loops.
Fixes PR11375: Different results for 'clang++ huh.cpp'...

llvm-svn: 144746
2011-11-16 00:52:40 +00:00
Jim Grosbach e891fe8d6c ARM assembly parsing for register range syntax for VLD/VST register lists.
For example,
vld1.f64 {d2-d5}, [r2,:128]!

Should be equivalent to:
vld1.f64 {d2,d3,d4,d5}, [r2,:128]!

It's not documented syntax in the ARM ARM, but it is consistent with what's
accepted for VLDM/VSTM and is unambiguous in meaning, so it's a good thing to
support.

rdar://10451128

llvm-svn: 144727
2011-11-15 23:19:15 +00:00
Nadav Rotem 37010002f2 AVX: Add support for vbroadcast from BUILD_VECTOR and refactor some of the vbroadcast code.
llvm-svn: 144720
2011-11-15 22:50:37 +00:00
NAKAMURA Takumi f6b315c081 test/CodeGen/X86/dec-eflags-lower.ll: Relax expression for win32 x64.
llvm-svn: 144714
2011-11-15 22:30:37 +00:00
Jim Grosbach 75fb4abcdc ARM assembly parsing two operand forms for shift instructions.
llvm-svn: 144713
2011-11-15 22:27:54 +00:00
Pete Cooper 7c7ba1baa1 Added custom lowering for load->dec->store sequence in x86 when the EFLAGS registers is used
by later instructions.

Only done for DEC64m right now.

Fixes <rdar://problem/6172640>

llvm-svn: 144705
2011-11-15 21:57:53 +00:00
Jim Grosbach 131b45e632 ARM alternate size suffices for VTRN instructions.
rdar://10435076

llvm-svn: 144694
2011-11-15 20:49:46 +00:00
Jim Grosbach 5803f6d5a2 ARM assembly parsing for optional datatype suffix on VFP VMOV GPR<->VFP insns.
Yet more of rdar://10435076.

llvm-svn: 144691
2011-11-15 20:29:42 +00:00
Jim Grosbach c5b1bc561e ARM assembly parsing for two-operand form of 'mul' instruction.
rdar://10449856.

llvm-svn: 144689
2011-11-15 20:14:51 +00:00
Jim Grosbach 72dfd20aba ARM assembly parsing for two-operand form of 'mul' instruction.
Ongoing rdar://10435114.

llvm-svn: 144688
2011-11-15 20:02:06 +00:00
Jim Grosbach 68c899c211 Testcase for r144684.
llvm-svn: 144685
2011-11-15 19:56:17 +00:00
Owen Anderson 0ac9058f89 Fix an ambiguous decoding where we failed to properly decode VMOVv2f32 and VMOVv4f32.
llvm-svn: 144683
2011-11-15 19:55:00 +00:00
Jim Grosbach 6efa7b9852 Thumb2 assembly parsing for mul.w in IT block fix.
When the 3rd operand is not a low-register, and the first two operands are
the same low register, the parser was incorrectly trying to use the 16-bit
instruction encoding.

rdar://10449281

llvm-svn: 144679
2011-11-15 19:29:45 +00:00
Rafael Espindola f11e7f1305 We currently use a callback to handle an IL pass deleting a BB that still
has a reference to it. Unfortunately, that doesn't work for codegen passes
since we don't get notified of MBB's being deleted (the original BB stays).

Use that fact to our advantage and after printing a function, check if
any of the IL BBs corresponds to a symbol that was not printed. This fixes
pr11202.

llvm-svn: 144674
2011-11-15 19:08:46 +00:00
Jakob Stoklund Olesen 4949b9a283 Revert r144611 and r144613.
These tests are actually correct, clang was miscompiling ExeDepsFix::processUses.

Evan fixed the miscompilation in r144628.

llvm-svn: 144630
2011-11-15 07:13:03 +00:00
Chandler Carruth 9b548a7fcf Rather than trying to use the loop block sequence *or* the function
block sequence when recovering from unanalyzable control flow
constructs, *always* use the function sequence. I'm not sure why I ever
went down the path of trying to use the loop sequence, it is
fundamentally not the correct sequence to use. We're trying to preserve
the incoming layout in the cases of unreasonable control flow, and that
is only encoded at the function level. We already have a filter to
select *exactly* the sub-set of blocks within the function that we're
trying to form into a chain.

The resulting code layout is also significantly better because of this.
In several places we were ending up with completely unreasonable control
flow constructs due to the ordering chosen by the loop structure for its
internal storage. This change removes a completely wasteful vector of
basic blocks, saving memory allocation in the common case even though it
costs us CPU in the fairly rare case of unnatural loops. Finally, it
fixes the latest crasher reduced out of GCC's single source. Thanks
again to Benjamin Kramer for the reduction, my bugpoint skills failed at
it.

llvm-svn: 144627
2011-11-15 06:26:43 +00:00
Craig Topper 05baa85f58 Properly qualify AVX2 specific parts of execution dependency table. Also enable converting between 256-bit PS/PD operations when AVX1 is enabled. Fixes PR11370.
llvm-svn: 144622
2011-11-15 05:55:35 +00:00
Jakob Stoklund Olesen 9c0de9bb6b Really fix test.
llvm-svn: 144613
2011-11-15 03:17:01 +00:00
Jakob Stoklund Olesen 14b66375a9 Allow for depencendy-breaking instructions before cvt*.
This should unbreak clang-x86_64-darwin10-RA, but I can't actually
reproduce the failure.

llvm-svn: 144611
2011-11-15 02:29:48 +00:00
Evan Cheng 7ca4b6eb5c Add vmov.f32 to materialize f32 immediate splats which cannot be handled by
integer variants. rdar://10437054

llvm-svn: 144608
2011-11-15 02:12:34 +00:00
Jakob Stoklund Olesen f8ad336bc4 Break false dependencies before partial register updates.
Two new TargetInstrInfo hooks lets the target tell ExecutionDepsFix
about instructions with partial register updates causing false unwanted
dependencies.

The ExecutionDepsFix pass will break the false dependencies if the
updated register was written in the previoius N instructions.

The small loop added to sse-domains.ll runs twice as fast with
dependency-breaking instructions inserted.

llvm-svn: 144602
2011-11-15 01:15:30 +00:00
Jim Grosbach a498af2b1d ARM parsing datatype suffix variants for non-writeback VST1 instructions.
rdar://10435076

llvm-svn: 144593
2011-11-14 23:43:46 +00:00
Jim Grosbach 72838a0345 ARM parsing datatype suffix variants for non-writeback VLD1 instructions.
rdar://10435076

llvm-svn: 144592
2011-11-14 23:32:59 +00:00
Jim Grosbach 3d6c0e0bb2 ARM parsing optional datatype suffix for VAND/VEOR/VORR instructions.
rdar://10435076

llvm-svn: 144587
2011-11-14 23:11:19 +00:00
Jim Grosbach 3e2c6f380c ARM VLDR/VSTR instructions don't need a size suffix.
Canonicallize on the non-suffixed form, but continue to accept assembly that
has any correctly sized type suffix.

llvm-svn: 144583
2011-11-14 23:03:21 +00:00
Nick Lewycky 7013a19e8a Refactor capture tracking (which already had a couple flags for whether returns
and stores capture) to permit the caller to see each capture point and decide
whether to continue looking.

Use this inside memdep to do an analysis that basicaa won't do. This lets us
solve another devirtualization case, fixing PR8908!

llvm-svn: 144580
2011-11-14 22:49:42 +00:00
Chad Rosier 4e88fbebde Add newline to end of file. Thanks, Eli.
llvm-svn: 144579
2011-11-14 22:48:33 +00:00
Chad Rosier ab7223e99a Add support for inlining small memcpys.
rdar://10412592

llvm-svn: 144578
2011-11-14 22:46:17 +00:00
Chad Rosier 45110fdf8d Fix a performance regression from r144565. Positive offsets were being lowered
into registers, rather then encoded directly in the load/store.

llvm-svn: 144576
2011-11-14 22:34:48 +00:00
Evan Cheng fb13d32b3f Add a missing pattern for X86ISD::MOVLPD. rdar://10436044
llvm-svn: 144566
2011-11-14 20:35:52 +00:00
Chad Rosier adfd200bcb Add support for Thumb load/stores with negative offsets.
rdar://10412592

llvm-svn: 144565
2011-11-14 20:22:27 +00:00
Evan Cheng 30f44ad785 Teach two-address pass to re-schedule two-address instructions (or the kill
instructions of the two-address operands) in order to avoid inserting copies.
This fixes the few regressions introduced when the two-address hack was
disabled (without regressing the improvements).
rdar://10422688

llvm-svn: 144559
2011-11-14 19:48:55 +00:00
Pete Cooper 890e02e854 Changed SSE4/AVX <2 x i64> extract and insert ops to be Custom lowered
Constant idx case is still done in tablegen but other cases are then expanded

Fixes <rdar://problem/10435460>

llvm-svn: 144557
2011-11-14 19:38:42 +00:00
Jakob Stoklund Olesen 7e6004a3c1 Fix early-clobber handling in shrinkToUses.
I broke this in r144515, it affected most ARM testers.

<rdar://problem/10441389>

llvm-svn: 144547
2011-11-14 18:45:38 +00:00
Jakob Stoklund Olesen 7e07b388ac Delete stale comment.
llvm-svn: 144542
2011-11-14 18:03:05 +00:00
Chandler Carruth ed5aa547bc Fix an overflow bug in MachineBranchProbabilityInfo. This pass relied on
the sum of the edge weights not overflowing uint32, and crashed when
they did. This is generally safe as BranchProbabilityInfo tries to
provide this guarantee. However, the CFG can get modified during codegen
in a way that grows the *sum* of the edge weights. This doesn't seem
unreasonable (imagine just adding more blocks all with the default
weight of 16), but it is hard to come up with a case that actually
triggers 32-bit overflow. Fortuately, the single-source GCC build is
good at this. The solution isn't very pretty, but its no worse than the
previous code. We're already summing all of the edge weights on each
query, we can sum them, check for an overflow, compute a scale, and sum
them again.

I've included a *greatly* reduced test case out of the GCC source that
triggers it. It's a pretty lame test, as it clearly is just barely
triggering the overflow. I'd like to have something that is much more
definitive, but I don't understand the fundamental pattern that triggers
an explosion in the edge weight sums.

The buggy code is duplicated within this file. I'll colapse them into
a single implementation in a subsequent commit.

llvm-svn: 144526
2011-11-14 08:50:16 +00:00
Chad Rosier 2a1df883d0 Add support for ARM halfword load/stores and signed byte loads with negative
offsets.
rdar://10412592

llvm-svn: 144518
2011-11-14 04:09:28 +00:00
Chandler Carruth 1071cfa4ae Teach machine block placement to cope with unnatural loops. These don't
get loop info structures associated with them, and so we need some way
to make forward progress selecting and placing basic blocks. The
technique used here is pretty brutal -- it just scans the list of blocks
looking for the first unplaced candidate. It keeps placing blocks like
this until the CFG becomes tractable.

The cost is somewhat unfortunate, it requires allocating a vector of all
basic block pointers eagerly. I have some ideas about how to simplify
and optimize this, but I'm trying to get the logic correct first.

Thanks to Benjamin Kramer for the reduced test case out of GCC. Sadly
there are other bugs that GCC is tickling that I'm reducing and working
on now.

llvm-svn: 144516
2011-11-14 00:00:35 +00:00
Chandler Carruth 8d15078927 Rewrite #3 of machine block placement. This is based somewhat on the
second algorithm, but only loosely. It is more heavily based on the last
discussion I had with Andy. It continues to walk from the inner-most
loop outward, but there is a key difference. With this algorithm we
ensure that as we visit each loop, the entire loop is merged into
a single chain. At the end, the entire function is treated as a "loop",
and merged into a single chain. This chain forms the desired sequence of
blocks within the function. Switching to a single algorithm removes my
biggest problem with the previous approaches -- they had different
behavior depending on which system triggered the layout. Now there is
exactly one algorithm and one basis for the decision making.

The other key difference is how the chain is formed. This is based
heavily on the idea Andy mentioned of keeping a worklist of blocks that
are viable layout successors based on the CFG. Having this set allows us
to consistently select the best layout successor for each block. It is
expensive though.

The code here remains very rough. There is a lot that needs to be done
to clean up the code, and to make the runtime cost of this pass much
lower. Very much WIP, but this was a giant chunk of code and I'd rather
folks see it sooner than later. Everything remains behind a flag of
course.

I've added a couple of tests to exercise the issues that this iteration
was motivated by: loop structure preservation. I've also fixed one test
that was exhibiting the broken behavior of the previous version.

llvm-svn: 144495
2011-11-13 11:20:44 +00:00