Commit Graph

6687 Commits

Author SHA1 Message Date
Owen Anderson d1545e3715 Try to make this test more generic to unbreak buildbots.
llvm-svn: 162958
2012-08-30 23:51:20 +00:00
Owen Anderson cc61f87cf7 Teach the DAG combiner to turn chains of FADDs (x+x+x+x+...) into FMULs by constants. This is only enabled in unsafe FP math mode, since it does not preserve rounding effects for all such constants.
llvm-svn: 162956
2012-08-30 23:35:16 +00:00
Nadav Rotem ea973bda26 Currently targets that do not support selects with scalar conditions and vector operands - scalarize the code. ARM is such a target
because it does not support CMOV of vectors. To implement this efficientlyi, we broadcast the condition bit and use a sequence of NAND-OR
to select between the two operands. This is the same sequence we use for targets that don't have vector BLENDs (like SSE2).

rdar://12201387

llvm-svn: 162926
2012-08-30 19:17:29 +00:00
Michael Liao bbd10792c2 Introduce 'UseSSEx' to force SSE legacy encoding
- Add 'UseSSEx' to force SSE legacy insn not being selected when AVX is
  enabled.

  As the penalty of inter-mixing SSE and AVX instructions, we need
  prevent SSE legacy insn from being generated except explicitly
  specified through some intrinsics. For patterns supported by both
  SSE and AVX, so far, we force AVX insn will be tried first relying on
  AddedComplexity or position in td file. It's error-prone and
  introduces bugs accidentally.

  'UseSSEx' is disabled when AVX is turned on. For SSE insns inherited
  by AVX, we need this predicate to force VEX encoding or SSE legacy
  encoding only.

  For insns not inherited by AVX, we still use the previous predicates,
  i.e. 'HasSSEx'. So far, these insns fall into the following
  categories:
  * SSE insns with MMX operands
  * SSE insns with GPR/MEM operands only (xFENCE, PREFETCH, CLFLUSH,
    CRC, and etc.)
  * SSE4A insns.
  * MMX insns.
  * x87 insns added by SSE.

2 test cases are modified:

 - test/CodeGen/X86/fast-isel-x86-64.ll
   AVX code generation is different from SSE one. 'vcvtsi2sdq' cannot be
   selected by fast-isel due to complicated pattern and fast-isel
   fallback to materialize it from constant pool.

 - test/CodeGen/X86/widen_load-1.ll
   AVX code generation is different from SSE one after fixing SSE/AVX
   inter-mixing. Exec-domain fixing prefers 'vmovapd' instead of
   'vmovaps'.

llvm-svn: 162919
2012-08-30 16:54:46 +00:00
Tim Northover ca9f384ff8 Add support for moving pure S-register to NEON pipeline if desired
llvm-svn: 162898
2012-08-30 10:17:45 +00:00
Michael Liao 271f11b571 Should put test case under test/ExecutionEngine/MCJIT/
llvm-svn: 162885
2012-08-30 00:43:57 +00:00
Michael Liao 3c8980646b Fix PR13727
- The root cause is that target constant materialization in X86 fast-isel
  creates a PC-rel addressing which may overflow 32-bit range in non-Small code
  model if .rodata section is allocated too far away from code segment in
  MCJIT, which uses Large code model so far.
- Follow the similar logic to fix non-Small code model in fast-isel by skipping
  non-Small code model.

llvm-svn: 162881
2012-08-30 00:30:16 +00:00
Hal Finkel 1859d26528 Reserve space for the mandatory traceback fields on PPC64.
We need to reserve space for the mandatory traceback fields,
though leaving them as zero is appropriate for now.

Although the ABI calls for these fields to be filled in fully, no
compiler on Linux currently does this, and GDB does not read these
fields.  GDB uses the first word of zeroes during exception handling to
find the end of the function and the size field, allowing it to compute
the beginning of the function.  DWARF information is used for everything
else.  We need the extra 8 bytes of pad so the size field is found in
the right place.

As a comparison, GCC fills in a few of the fields -- language, number
of saved registers -- but ignores the rest.  IBM's proprietary OSes do
make use of the full traceback table facility.

Patch by Bill Schmidt.

llvm-svn: 162854
2012-08-29 20:22:24 +00:00
Jush Lu e87e559e62 [arm-fast-isel] Add support for ARM PIC.
llvm-svn: 162823
2012-08-29 02:41:21 +00:00
Roman Divacky 8c4b6a307e Emit word of zeroes after the last instruction as a start of the mandatory
traceback table on PowerPC64. This helps gdb handle exceptions. The other
mandatory fields are ignored by gdb and harder to implement so just add
there a FIXME.

Patch by Bill Schmidt. PR13641.

llvm-svn: 162778
2012-08-28 19:06:55 +00:00
Hal Finkel 742b535e40 Add PPC Freescale e500mc and e5500 subtargets.
Add subtargets for Freescale e500mc (32-bit) and e5500 (64-bit) to
the PowerPC backend.

Patch by Tobias von Koch.

llvm-svn: 162764
2012-08-28 16:12:39 +00:00
Bill Wendling cc56718038 The commutative flag is already correctly set within the multiclass. If we set
it here, then a 'register-memory' version would wrongly get the commutative
flag.
<rdar://problem/12180135>

llvm-svn: 162741
2012-08-28 07:36:46 +00:00
Craig Topper bd509eea4a Merge AVX_SET0PSY/AVX_SET0PDY/AVX2_SET0 into a single post-RA pseudo.
llvm-svn: 162738
2012-08-28 07:05:28 +00:00
NAKAMURA Takumi cdfe1d1cdb llvm/test/CodeGen/X86/pr12312.ll: Add -mtriple=x86_64-unknown-unknown.
llvm-svn: 162736
2012-08-28 04:04:29 +00:00
Michael Liao b7d85b6328 Fix PR12312
- Add a target-specific DAG optimization to recognize a pattern PTEST-able.
  Such a pattern is a OR'd tree with X86ISD::OR as the root node. When
  X86ISD::OR node has only its flag result being used as a boolean value and
  all its leaves are extracted from the same vector, it could be folded into an
  X86ISD::PTEST node.

llvm-svn: 162735
2012-08-28 03:34:40 +00:00
Jakob Stoklund Olesen 87cb471e52 Remove extra MayLoad/MayStore flags from atomic_load/store.
These extra flags are not required to properly order the atomic
load/store instructions. SelectionDAGBuilder chains atomics as if they
were volatile, and SelectionDAG::getAtomic() sets the isVolatile bit on
the memory operands of all atomic operations.

The volatile bit is enough to order atomic loads and stores during and
after SelectionDAG.

This means we set mayLoad on atomic_load, mayStore on atomic_store, and
mayLoad+mayStore on the remaining atomic read-modify-write operations.

llvm-svn: 162733
2012-08-28 03:11:32 +00:00
Akira Hatanaka b5af7121b1 Fix mips' long branch pass.
Instructions emitted to compute branch offsets now use immediate operands
instead of symbolic labels. This change was needed because there were problems
when R_MIPS_HI16/LO16 relocations were used to make shared objects.

llvm-svn: 162731
2012-08-28 03:03:05 +00:00
Akira Hatanaka adb14f56c7 Fix bug 13532.
In SelectionDAGLegalize::ExpandLegalINT_TO_FP, expand INT_TO_FP nodes without
using any f64 operations if f64 is not a legal type.

Patch by Stefan Kristiansson. 

llvm-svn: 162728
2012-08-28 02:12:42 +00:00
Hal Finkel 686f2ee226 Allow remat of LI on PPC.
Allow load-immediates to be rematerialised in the register coalescer for
PPC. This makes test/CodeGen/PowerPC/big-endian-formal-args.ll fail,
because it relies on a register move getting emitted. The immediate load is
equivalent, so change this test case.

Patch by Tobias von Koch.

llvm-svn: 162727
2012-08-28 02:10:33 +00:00
Hal Finkel 5ab378037f Eliminate redundant CR moves on PPC32.
The 32-bit ABI requires CR bit 6 to be set if the call has fp arguments and
unset if it doesn't. The solution up to now was to insert a MachineNode to
set/unset the CR bit, which produces a CR vreg. This vreg was then copied
into CR bit 6. When the register allocator saw a bunch of these in the same
function, it allocated the set/unset CR bit in some random CR register (1
extra instruction) and then emitted CR moves before every vararg function
call, rather than just setting and unsetting CR bit 6 directly before every
vararg function call. This patch instead inserts a PPCcrset/PPCcrunset
instruction which are then matched by a dedicated instruction pattern.

Patch by Tobias von Koch.

llvm-svn: 162725
2012-08-28 02:10:27 +00:00
Hal Finkel e39526a789 Optimize zext on PPC64.
The zeroextend IR instruction is lowered to an 'and' node with an immediate
mask operand, which in turn gets legalised to a sequence of ori's & ands.
This can be done more efficiently using the rldicl instruction.

Patch by Tobias von Koch.

llvm-svn: 162724
2012-08-28 02:10:15 +00:00
Bill Wendling 988a47d7e5 Make sure we add the predicate after all of the registers are added.
<rdar://problem/12183003>

llvm-svn: 162703
2012-08-27 22:12:44 +00:00
NAKAMURA Takumi fee50c8cf1 llvm/test/CodeGen/X86/fma.ll: Add -march=x86, or two tests would fail on non-x86 hosts.
llvm-svn: 162667
2012-08-27 11:50:26 +00:00
NAKAMURA Takumi 10eb4cfc3e llvm/test/CodeGen/X86/fma_patterns.ll: Add -mtriple=x86_64. It was incompatible on i686 and Windows x64.
llvm-svn: 162664
2012-08-27 09:37:54 +00:00
Craig Topper bfc1d0ed48 Commit test change for r162658.
llvm-svn: 162660
2012-08-27 07:55:50 +00:00
Anitha Boyapati 0dd589c5f1 FMA3 tests on bdver2 target for changes made in rev 162012. Also made
corresponding changes to existing tests for darwin triple to ensure that
same pattern is tested for bdver2 target.

llvm-svn: 162655
2012-08-27 06:59:01 +00:00
Craig Topper 9e4f0aae17 Make sure that FMA3 is favored even when FMA4 is also enabled. Test case for r162454.
llvm-svn: 162653
2012-08-27 03:38:15 +00:00
Jakob Stoklund Olesen c2272df1be Infer instruction properties from single-instruction patterns.
Previously, instructions without a primary patterns wouldn't get their
properties inferred. Now, we use all single-instruction patterns for
inference, including 'def : Pat<>' instances.

This causes a lot of instruction flags to change.

- Many instructions no longer have the UnmodeledSideEffects flag because
  their flags are now inferred from a pattern.

- Instructions with intrinsics will get a mayStore flag if they already
  have UnmodeledSideEffects and a mayLoad flag if they already have
  mayStore. This is because intrinsics properties are linear.

- Instructions with atomic_load patterns get a mayStore flag because
  atomic loads can't be reordered. The correct workaround is to create
  pseudo-instructions instead of using normal loads. PR13693.

llvm-svn: 162614
2012-08-24 22:46:53 +00:00
Akira Hatanaka 4a08a4a8b6 Disable Mips' delay slot filler when optimization level is O0.
llvm-svn: 162589
2012-08-24 20:40:15 +00:00
Akira Hatanaka e8e4ef102d In MipsDAGToDAGISel::SelectAddr, fold add node into address operand, if its
second operand is MipsISD::GPRel.

llvm-svn: 162584
2012-08-24 20:21:49 +00:00
Manman Ren cf10446ffa BranchProb: modify the definition of an edge in BranchProbabilityInfo to handle
the case of multiple edges from one block to another.

A simple example is a switch statement with multiple values to the same
destination. The definition of an edge is modified from a pair of blocks to
a pair of PredBlock and an index into the successors.

Also set the weight correctly when building SelectionDAG from LLVM IR,
especially when converting a Switch.
IntegersSubsetMapping is updated to calculate the weight for each cluster.

llvm-svn: 162572
2012-08-24 18:14:27 +00:00
Roman Divacky ace4707ea6 Lower constant pools and jump tables via TOC on PPC64/SVR4.
In collaboration with Adhemerval Zanella.

llvm-svn: 162562
2012-08-24 16:26:02 +00:00
Stepan Dyatkovskiy 99120e04be Rejected 169195. As Duncan commented, bitcasting to proper type is wrong approach. We need to insert some valid TRANCATE node here.
llvm-svn: 162354
2012-08-22 09:33:55 +00:00
Akira Hatanaka ad4950258b Add register Mips::GP to the list of reserved registers if target is bare-metal
to prevent it from being clobbered. mips uses $gp to access small data section.

This bug was originally reported by Carl Norum.

llvm-svn: 162340
2012-08-22 03:18:13 +00:00
Akira Hatanaka 9d957842e1 Add option disable-mips-delay-filler. Turn on mips' delay slot filler by
default.

Patch by Carl Norum.

llvm-svn: 162339
2012-08-22 02:51:28 +00:00
Tim Northover f39c1a3f72 Add correct set of regression tests for r162094 commit.
llvm-svn: 162276
2012-08-21 12:43:03 +00:00
Jakob Stoklund Olesen 74e6f9fc65 Add a missing def flag.
*** Bad machine code: Explicit definition marked as use ***
- function:    test_cos
- basic block: BB#0 L.entry (0x7ff2a2024fd0)
- instruction: VSETLNi32 %D11, %D11<undef>, %R0, 0, pred:14, pred:%noreg, %Q5<imp-use,kill>, %Q5<imp-def>
- operand 0:   %D11

llvm-svn: 162247
2012-08-21 00:34:53 +00:00
Jakob Stoklund Olesen 7d33c5739f Don't add CFG edges for redundant conditional branches.
IR that hasn't been through SimplifyCFG can look like this:

  br i1 %b, label %r, label %r

Make sure we don't create duplicate Machine CFG edges in this case.

Fix the machine code verifier to accept conditional branches with a
single CFG edge.

llvm-svn: 162230
2012-08-20 21:39:52 +00:00
Michael Liao 10ff96ce8c fix a case where all operands of BUILD_VECTOR are undefined
llvm-svn: 162214
2012-08-20 17:59:18 +00:00
Stepan Dyatkovskiy 6ee89aafc8 Forget to add testcase for r162195. Sorry.
llvm-svn: 162196
2012-08-20 08:03:18 +00:00
Nadav Rotem 178250ad87 When unsafe math is used, we can use commutative FMAX and FMIN. In some cases
this allows for better code generation.

Added a new DAGCombine transformation to convert FMAX and FMIN to FMANC and
FMINC, which are commutative.

For example:

  movaps  %xmm0, %xmm1
  movsd LC(%rip), %xmm0
  minsd %xmm1, %xmm0

becomes:

  minsd LC(%rip), %xmm0

llvm-svn: 162187
2012-08-19 13:06:16 +00:00
Jakob Stoklund Olesen dded061f85 Also combine zext/sext into selects for ARM.
This turns common i1 patterns into predicated instructions:

  (add (zext cc), x) -> (select cc (add x, 1), x)
  (add (sext cc), x) -> (select cc (add x, -1), x)

For a function like:

  unsigned f(unsigned s, int x) {
    return s + (x>0);
  }

We now produce:

  cmp r1, #0
  it  gt
  addgt.w r0, r0, #1

Instead of:

  movs  r2, #0
  cmp r1, #0
  it  gt
  movgt r2, #1
  add r0, r2

llvm-svn: 162177
2012-08-18 21:25:22 +00:00
Jakob Stoklund Olesen aab43dbfbb Also pass logical ops to combineSelectAndUse.
Add these transformations to the existing add/sub ones:

  (and (select cc, -1, c), x) -> (select cc, x, (and, x, c))
  (or  (select cc, 0, c), x)  -> (select cc, x, (or, x, c))
  (xor (select cc, 0, c), x)  -> (select cc, x, (xor, x, c))

The selects can then be transformed to a single predicated instruction
by peephole.

This transformation will make it possible to eliminate the ISD::CAND,
COR, and CXOR custom DAG nodes.

llvm-svn: 162176
2012-08-18 21:25:16 +00:00
Nadav Rotem a136939fa9 Reapply r162160 with a fix: Optimize Arith->Trunc->SETCC sequence to allow better compare/branch code.
llvm-svn: 162172
2012-08-18 17:53:03 +00:00
Nadav Rotem c324af609e Revert r162160 because it made a few buildbots fail.
llvm-svn: 162164
2012-08-18 05:02:36 +00:00
Nadav Rotem 2cb14a5c4b The X86 backend has a number of optimizations for SETCC nodes which use
arithmetic instructions. However, when small data types are used, a truncate
node appears between the SETCC node and the arithmetic operation. This patch
adds support for this pattern.

Before:
  xorl  %esi, %edi
  testb %dil, %dil
  setne %al
  ret

After:
  xorb  %dil, %sil
  setne %al
  ret

rdar://12081007

llvm-svn: 162160
2012-08-18 02:43:28 +00:00
Eli Friedman 79a6b30d8a Make atomic load and store of pointers work. Tighten verification of atomic operations
so other unexpected operations don't slip through.  Based on patch by Logan Chien.
PR11786/PR13186.

llvm-svn: 162146
2012-08-17 23:24:29 +00:00
Jakob Stoklund Olesen 7b1a2e8f02 Avoid folding ADD instructions with FI operands.
PEI can't handle the pseudo-instructions. This can be removed when the
pseudo-instructions are replaced by normal predicated instructions.

Fixes PR13628.

llvm-svn: 162130
2012-08-17 20:55:34 +00:00
Benjamin Kramer ca7ca4f6c6 TargetLowering: Use the large shift amount during legalize types. The legalizer may call us with an overly large type.
llvm-svn: 162101
2012-08-17 15:54:21 +00:00
Benjamin Kramer 2f47a3fb07 Fix broken check lines.
I really need to find a way to automate this, but I can't come up with a regex
that has no false positives while handling tricky cases like custom check
prefixes.

llvm-svn: 162097
2012-08-17 12:28:26 +00:00
Tim Northover f66181530f Implement NEON domain switching for scalar <-> S-register vmovs on ARM
llvm-svn: 162094
2012-08-17 11:32:52 +00:00
Jakob Stoklund Olesen 0ea1fce6b4 Add ADD and SUB to the predicable ARM instructions.
It is not my plan to duplicate the entire ARM instruction set with
predicated versions. We need a way of representing predicated
instructions in SSA form without requiring a separate opcode.

Then the pseudo-instructions can go away.

llvm-svn: 162061
2012-08-16 23:21:55 +00:00
Jush Lu 26088cb30e [arm-fast-isel] Add support for fastcc.
Without fastcc support, the caller just falls through to CallingConv::C
for fastcc, but callee still uses fastcc, this inconsistency of calling
convention is a problem, and fastcc support can fix it.

llvm-svn: 162013
2012-08-16 05:15:53 +00:00
Akira Hatanaka 269d3fd101 Test case for r162008.
llvm-svn: 162009
2012-08-16 03:48:41 +00:00
Jakob Stoklund Olesen 6cb96120f1 Fold predicable instructions into MOVCC / t2MOVCC.
The ARM select instructions are just predicated moves. If the select is
the only use of an operand, the instruction defining the operand can be
predicated instead, saving one instruction and decreasing register
pressure.

This implementation can turn AND/ORR/EOR instructions into their
corresponding ANDCC/ORRCC/EORCC variants. Ideally, we should be able to
predicate any instruction, but we don't yet support predicated
instructions in SSA form.

llvm-svn: 161994
2012-08-15 22:16:39 +00:00
Bill Wendling 2c8685e327 Rework test so that it reproduces the error without the horrible flag.
llvm-svn: 161989
2012-08-15 21:10:18 +00:00
Bill Wendling d63f1f5a9c Remove invalid test. This test requires that dead basic blocks be kept
around. That's not how we do things. Besides, the commit message tells us that
it is covered by the GCC test suite.

------------------------------------------------------------------------
r127497 | zwarich | 2011-03-11 13:51:56 -0800 (Fri, 11 Mar 2011) | 3 lines

Fix the GCC test suite issue exposed by r127477, which was caused by stack
protector insertion not working correctly with unreachable code. Since that
revision was rolled out, this test doesn't actual fail before this fix.
------------------------------------------------------------------------

llvm-svn: 161985
2012-08-15 20:54:09 +00:00
Evan Cheng eec6bc6270 Use vld1/vst1 to load/store f64 if alignment is < 4 and the target allows unaligned access. rdar://12091029
llvm-svn: 161962
2012-08-15 17:44:53 +00:00
Anton Korobeynikov c6d945b11a The names of VFP variants of half-to-float conversion instructions were
reversed. This leads to wrong codegen for float-to-half conversion
intrinsics which are used to support storage-only fp16 type.
NEON variants of same instructions are fine.

llvm-svn: 161907
2012-08-14 23:36:01 +00:00
Michael Liao 34107b9177 fix PR11334
- FP_EXTEND only support extending from vectors with matching elements.
  This results in the scalarization of extending to v2f64 from v2f32,
  which will be legalized to v4f32 not matching with v2f64.
- add X86-specific VFPEXT supproting extending from v4f32 to v2f64.
- add BUILD_VECTOR lowering helper to recover back the original
  extending from v4f32 to v2f64.
- test case is enhanced to include different vector width.

llvm-svn: 161894
2012-08-14 21:24:47 +00:00
Nadav Rotem 70409991bc During the CodeGenPrepare we often lower intrinsics (such as objsize)
and allow some optimizations to turn conditional branches into unconditional.
This commit adds a simple control-flow optimization which merges two consecutive
basic blocks which are connected by a single edge. This allows the codegen to
operate on larger basic blocks.

rdar://11973998

llvm-svn: 161852
2012-08-14 05:19:07 +00:00
NAKAMURA Takumi 245920463d llvm/test/CodeGen/ARM/floorf.ll: Add explicit -mtriple=arm-unknown-unknown, or it fails on msvc.
llvm-svn: 161825
2012-08-14 00:56:06 +00:00
Owen Anderson a40319b7f1 Add a roundToIntegral method to APFloat, which can be parameterized over various rounding modes. Use this to implement SelectionDAG constant folding of FFLOOR, FCEIL, and FTRUNC.
llvm-svn: 161807
2012-08-13 23:32:49 +00:00
Bill Wendling 72baa6eeae Rename test since it's not linux-specific.
llvm-svn: 161792
2012-08-13 21:32:42 +00:00
Jakob Stoklund Olesen 83a927d84a Handle extra Tail predecessors in if-conversion.
It is still possible to if-convert if the tail block has extra
predecessors, but the tail phis must be rewritten instead of being
removed.

llvm-svn: 161781
2012-08-13 20:49:04 +00:00
Arnold Schwaighofer 0bb7f23cfc [Hexagon] Don't mark callee saved registers as clobbered by a tail call
This was causing unnecessary spills/restores of callee saved registers.

Fixes PR13572.

Patch by Pranav Bhandarkar!

llvm-svn: 161778
2012-08-13 19:54:01 +00:00
Manman Ren 9746b33e26 Fix failure on Atom bot due to r161769
llvm-svn: 161777
2012-08-13 19:34:29 +00:00
Nadav Rotem 3a94c545cf Do not optimize (or (and X,Y), Z) into BFI and other sequences if the AND ISDNode has more than one user.
rdar://11876519

llvm-svn: 161775
2012-08-13 18:52:44 +00:00
Manman Ren 959acb106b X86: move Int_CVTSD2SSrr, Int_CVTSI2SSrr, Int_CVTSI2SDrr, Int_CVTSS2SDrr from
OpTbl1 to OpTbl2 since they have 3 operands and the last operand can be changed
to a memory operand.

PR13576

llvm-svn: 161769
2012-08-13 18:29:41 +00:00
Eric Christopher 7d8b53c1f8 Add support for the %H output modifier.
Patch by Weiming Zhao.

llvm-svn: 161768
2012-08-13 18:18:52 +00:00
Tim Northover b4abb84d9c Add test for previous commit correcting NEON load patterns.
llvm-svn: 161750
2012-08-13 10:38:45 +00:00
Arnold Schwaighofer b73da9453c Revert 161581: Patch to implement UMLAL/SMLAL instructions for the ARM
architecture

It broke MultiSource/Applications/JM/ldecod/ldecod on armv7 thumb O0 g and armv7
thumb O3.

llvm-svn: 161736
2012-08-12 05:11:56 +00:00
Michael Liao e7e828fd64 fix PR13577, an issue introduced by r161687
- FCMOV only supports a subset of X86 conditions. Skip boolean
  simplification if X86 condition is not valid for FCMOV.
- add a minimal test case for PR13577.

llvm-svn: 161732
2012-08-11 23:47:06 +00:00
Benjamin Kramer ef6494f24d PR13578: Teach MachineCSE that instructions that use a constant register can be CSE'd safely.
This is common e.g. when doing rip-relative addressing on x86_64.

llvm-svn: 161728
2012-08-11 19:05:13 +00:00
Manman Ren 1acb6707cd X86: when we are auto-detecting the subtarget features, make sure we turn on
FeatureFastUAMem for Nehalem, Westmere and Sandy Bridge.

FeatureFastUAMem is already on if we pass in nehalem or westmere as a command
argument.

rdar: 7252306
llvm-svn: 161717
2012-08-10 23:43:32 +00:00
Eli Friedman 4c923b3b3f The normal edge of an invoke is not allowed to branch to a block with a
landingpad.  Enforce it in the verifier, and fix the regression tests to match.

llvm-svn: 161697
2012-08-10 20:55:20 +00:00
Michael Liao 5248e9913f add X86-specific DAG optimization to simplify boolean test
- if a boolean test (X86ISD::CMP or X86ISD:SUB) checks a boolean value
  generated from X86ISD::SETCC, try to simplify the boolean value
  generation and checking by reusing the original EFLAGS with proper
  condition code
- add hooks to X86 specific SETCC/BRCOND/CMOV, the major 3 places
  consuming EFLAGS

part of patches fixing PR12312

llvm-svn: 161687
2012-08-10 19:58:13 +00:00
Jakob Stoklund Olesen 8c28ac9ec9 Update edge weights correctly in replaceSuccessor().
When replacing Old with New, it can happen that New is already a
successor. Add the old and new edge weights instead of creating a
duplicate edge.

llvm-svn: 161653
2012-08-10 03:23:27 +00:00
Jakob Stoklund Olesen d9b66506a3 Reapply r161633-161634 "Partition use lists so defs always come before uses.""
No changes to these patches, MRI needed to be notified when changing
uses into defs and vice versa.

llvm-svn: 161644
2012-08-10 00:21:30 +00:00
Jakob Stoklund Olesen acd27c9279 Revert r161633-161634 "Partition use lists so defs always come before uses."
These commits broke a number of buildbots.

llvm-svn: 161640
2012-08-09 23:31:36 +00:00
Jakob Stoklund Olesen df01e00710 Partition use lists so defs always come before uses.
This makes it possible to speed up def_iterator by stopping at the first
use. This makes def_empty() and getUniqueVRegDef() much faster when
there are many uses.

In a +Asserts build, LiveVariables is 100x faster in one case because
getVRegDef() has an assertion that would scan to the end of a
def_iterator chain.

Spill weight calculation is significantly faster (300x in one case)
because isTriviallyReMaterializable() calls MRI->isConstantPhysReg(%RIP)
which calls def_empty(%RIP).

llvm-svn: 161634
2012-08-09 22:49:46 +00:00
Jakob Stoklund Olesen 7d7051ca3c Don't use pointer-pointers for the register use lists.
Use a more conventional doubly linked list where the Prev pointers form
a cycle. This means it is no longer necessary to adjust the Prev
pointers when reallocating the VRegInfo array.

The test changes are required because the register allocation hint is
using the use-list order to break ties.

llvm-svn: 161633
2012-08-09 22:49:42 +00:00
Jakob Stoklund Olesen 4238a89db8 Don't modify MO while use_iterator is still pointing to it.
llvm-svn: 161626
2012-08-09 22:08:24 +00:00
Arnold Schwaighofer 81b2eec1ab Patch to implement UMLAL/SMLAL instructions for the ARM architecture
This patch corrects the definition of umlal/smlal instructions and adds support
for matching them to the ARM dag combiner.

Bug 12213

Patch by Yin Ma!

llvm-svn: 161581
2012-08-09 15:25:52 +00:00
Nadav Rotem e0f84d31c8 Fix the legalization of ExtLoad on ARM. ExpandUnalignedLoad did not properly
handle the cases where the memory value type was illegal. 
PR 13111. 

llvm-svn: 161565
2012-08-09 01:56:44 +00:00
Bob Wilson 4c65c505e0 Add test triples to fix win32 failures. Revert workaround from r161292.
I don't have a win32 system to test, so hopefully I got them all fixed here.

llvm-svn: 161519
2012-08-08 20:31:37 +00:00
Manman Ren 1be131ba27 X86: enable CSE between CMP and SUB
We perform the following:
1> Use SUB instead of CMP for i8,i16,i32 and i64 in ISel lowering.
2> Modify MachineCSE to correctly handle implicit defs.
3> Convert SUB back to CMP if possible at peephole.

Removed pattern matching of (a>b) ? (a-b):0 and like, since they are handled
by peephole now.

rdar://11873276

llvm-svn: 161462
2012-08-08 00:51:41 +00:00
Evan Cheng fbdd25c135 X86 cmp lowering is looking past truncate on the condition node. It should only
do so when the high bits are known zero. This caused a subtle miscompilation.

rdar://12027825 

llvm-svn: 161451
2012-08-07 22:21:00 +00:00
Chandler Carruth 881d0a7966 Add a much more conservative strategy for aligning branch targets.
Previously, MBP essentially aligned every branch target it could. This
bloats code quite a bit, especially non-looping code which has no real
reason to prefer aligned branch targets so heavily.

As Andy said in review, it's still a bit odd to do this without a real
cost model, but this at least has much more plausible heuristics.

Fixes PR13265.

llvm-svn: 161409
2012-08-07 09:45:24 +00:00
Manman Ren cb36b8c2e6 MachineCSE: Update the heuristics for isProfitableToCSE.
If the result of a common subexpression is used at all uses of the candidate
expression, CSE should not increase the live range of the common subexpression.

rdar://11393714 and rdar://11819721

llvm-svn: 161396
2012-08-07 06:16:46 +00:00
Hal Finkel 33e529d56b MFTB on PPC64 should really be encoded using MFSPR.
The MFTB instruction itself is being phased out, and its functionality
is provided by MFSPR. According to the ISA docs, using MFSPR works on all known
chips except for the 601 (which did not have a timebase register anyway)
and the POWER3.

Thanks to Adhemerval Zanella for pointing this out!

llvm-svn: 161346
2012-08-06 21:21:44 +00:00
Craig Topper ab47fe4e16 Implement proper handling for pcmpistri/pcmpestri intrinsics. Requires custom handling in DAGISelToDAG due to limitations in TableGen's implicit def handling. Fixes PR11305.
llvm-svn: 161318
2012-08-06 06:22:36 +00:00
Craig Topper 812005e562 Update test to check for r161305
llvm-svn: 161307
2012-08-05 09:06:28 +00:00
Hal Finkel 70381a7b18 Add readcyclecounter lowering on PPC64.
On PPC64, this can be done with a simple TableGen pattern.
To enable this, I've added the (otherwise missing) readcyclecounter
SDNode definition to TargetSelectionDAG.td.

llvm-svn: 161302
2012-08-04 14:10:46 +00:00
Anton Korobeynikov 218aaf6d04 Add stack spill / reload instructions for DTriple and DQuad register classes, which
were missed for no reason. This fixes PR13377

llvm-svn: 161299
2012-08-04 13:16:12 +00:00
Bob Wilson 874886cd66 Refactor and check "onlyReadsMemory" before optimizing builtins.
This patch is mostly just refactoring a bunch of copy-and-pasted code, but
it also adds a check that the call instructions are readnone or readonly.
That check was already present for sin, cos, sqrt, log2, and exp2 calls, but
it was missing for the rest of the builtins being handled in this code.

llvm-svn: 161282
2012-08-03 23:29:17 +00:00
Akira Hatanaka 22bec282e9 1. Redo mips16 instructions to avoid multiple opcodes for same instruction.
Change these to patterns.
2. Add another 16 instructions.

Patch by Reed Kotler.

llvm-svn: 161272
2012-08-03 22:57:02 +00:00
Bob Wilson fa59485b94 Fix memcmp code-gen to honor -fno-builtin.
I noticed that SelectionDAGBuilder::visitCall was missing a check for memcmp
in TargetLibraryInfo, so that it would use custom code for memcmp calls even
with -fno-builtin.  I also had to add a new -disable-simplify-libcalls option
to llc so that I could write a test for this.

llvm-svn: 161262
2012-08-03 21:26:18 +00:00
Bob Wilson 3e6fa462f3 Fall back to selection DAG isel for calls to builtin functions.
Fast isel doesn't currently have support for translating builtin function
calls to target instructions.  For embedded environments where the library
functions are not available, this is a matter of correctness and not
just optimization.  Most of this patch is just arranging to make the
TargetLibraryInfo available in fast isel.  <rdar://problem/12008746>

llvm-svn: 161232
2012-08-03 04:06:28 +00:00
Jush Lu 4705da9020 [arm-fast-isel] Add support for shl, lshr, and ashr.
llvm-svn: 161230
2012-08-03 02:37:48 +00:00
Manman Ren ba8122cc25 X86 Peephole: fold loads to the source register operand if possible.
Add more comments and use early returns to reduce nesting in isLoadFoldable.
Also disable folding for V_SET0 to avoid introducing a const pool entry and
a const pool load.

rdar://10554090 and rdar://11873276

llvm-svn: 161207
2012-08-02 19:37:32 +00:00
Akira Hatanaka fffad897f2 Set transient stack alignment in constructor of MipsFrameLowering and re-enable
test o32_cc_vararg.ll.

llvm-svn: 161189
2012-08-02 18:15:13 +00:00
NAKAMURA Takumi 7020f51622 llvm/test/CodeGen/X86/fold-pcmpeqd-1.ll: Make sure this is testing without +avx.
FIXME: Could +avx be checked here too?
llvm-svn: 161156
2012-08-02 06:36:56 +00:00
NAKAMURA Takumi aaca1e690d llvm/test/CodeGen/X86/fold-pcmpeqd-1.ll: Rewrite expressions to pass regardless of PR11031.
- Relax to match even if epilogue (pop %ebp) were emitted.
  - Assume the return value is stored to %xmm0.

llvm-svn: 161155
2012-08-02 06:33:58 +00:00
Manman Ren 5759d01230 X86 Peephole: fold loads to the source register operand if possible.
Machine CSE and other optimizations can remove instructions so folding
is possible at peephole while not possible at ISel.

This patch is a rework of r160919 and was tested on clang self-host on my local
machine.

rdar://10554090 and rdar://11873276

llvm-svn: 161152
2012-08-02 00:56:42 +00:00
Matt Beaumont-Gay 7947aecaf1 Line endings.
llvm-svn: 161117
2012-08-01 16:42:35 +00:00
Elena Demikhovsky 3cb3b0045c Added FMA functionality to X86 target.
llvm-svn: 161110
2012-08-01 12:06:00 +00:00
Akira Hatanaka d1c43cee24 Add definitions of two subclasses of MipsFrameLowering, Mips16FrameLowering and
MipsSEFrameLowering.

Implement MipsSEFrameLowering::hasReservedCallFrame. Call frames will not be
reserved if there is a call with a large call frame or there are variable sized
objects on the stack.

llvm-svn: 161090
2012-07-31 22:50:19 +00:00
Akira Hatanaka 02de0e4425 Let PEI::calculateFrameObjectOffsets compute the final stack size rather than
computing it in MipsFrameLowering::emitPrologue.

llvm-svn: 161078
2012-07-31 21:28:49 +00:00
Akira Hatanaka 33a25af5a8 Expand DYNAMIC_STACKALLOC nodes rather than doing custom-lowering.
The frame object which points to the dynamically allocated area will not be
needed after changes are made to cease reserving call frames.

llvm-svn: 161076
2012-07-31 20:54:48 +00:00
Akira Hatanaka beda2241a4 When store nodes or memcpy nodes are created to copy the function call
arguments to the stack in MipsISelLowering::LowerCall, use stack pointer and
integer offset operands rather than frame object operands.

llvm-svn: 161068
2012-07-31 18:46:41 +00:00
Chad Rosier 710be7df71 [x86 frame lowering] In 32-bit mode, use ESI as the base pointer.
Previously, we were using EBX, but PIC requires the GOT to be in EBX before 
function calls via PLT GOT pointer.

llvm-svn: 161066
2012-07-31 18:29:21 +00:00
Akira Hatanaka 4ce7c4060d Fix type of LUXC1 and SUXC1. These instructions were incorrectly defined as
single-precision load and store.

Also avoid selecting LUXC1 and SUXC1 instructions during isel. It is incorrect
to map unaligned floating point load/store nodes to these instructions.

llvm-svn: 161063
2012-07-31 18:16:49 +00:00
Manman Ren 8c549b586c MachineSink: Sort the successors before trying to find SuccToSinkTo.
One motivating example is to sink an instruction from a basic block which has
two successors: one outside the loop, the other inside the loop. We should try
to sink the instruction outside the loop.

rdar://11980766

llvm-svn: 161062
2012-07-31 18:10:39 +00:00
Jakob Stoklund Olesen 0c807dfae2 Clear kill flags in removeCopyByCommutingDef().
We are extending live ranges, so kill flags are not accurate. They
aren't needed until they are recomputed after RA anyway.

<rdar://problem/11950722>

llvm-svn: 161023
2012-07-31 02:47:24 +00:00
Manman Ren 2b6a0dfd4c Reverse order of the two branches at end of a basic block if it is profitable.
We branch to the successor with higher edge weight first.
Convert from
     je    LBB4_8  --> to outer loop
     jmp   LBB4_14 --> to inner loop
to
     jne   LBB4_14
     jmp   LBB4_8

PR12750
rdar: 11393714

llvm-svn: 161018
2012-07-31 01:11:07 +00:00
Pete Cooper 91244268d7 Consider address spaces for hashing and CSEing DAG nodes. Otherwise two loads from different x86 segments but the same address would get CSEd
llvm-svn: 160987
2012-07-30 20:23:19 +00:00
Manman Ren f87dd7c01b Revert r160920 and r160919 due to dragonegg and clang selfhost failure
llvm-svn: 160927
2012-07-29 02:44:09 +00:00
Manman Ren 9de95e779c X86 Peephole: fold loads to the source register operand if possible.
Trying to fix the bot by specifying a triple in the failing testing cases.

llvm-svn: 160920
2012-07-28 17:51:24 +00:00
Manman Ren 0fa3ab88ba X86 Peephole: fold loads to the source register operand if possible.
Machine CSE and other optimizations can remove instructions so folding
is possible at peephole while not possible at ISel.

rdar://10554090 and rdar://11873276

llvm-svn: 160919
2012-07-28 16:48:01 +00:00
Manman Ren 32367c063b X86 Peephole: fix PR13475 in optimizeCompare.
It is possible that an instruction can use and update EFLAGS.
When checking the safety, we should check the usage of EFLAGS first before
declaring it is safe to optimize due to the update.

llvm-svn: 160912
2012-07-28 03:15:46 +00:00
Evan Cheng 249716e8ae Teach CodeGenPrep to look past bitcast when it's duplicating return instruction
into predecessor blocks to enable tail call optimization.

rdar://11958338

llvm-svn: 160894
2012-07-27 21:21:26 +00:00
Jakob Stoklund Olesen bc65e8f94e Add <imp-def> of super-register when lowering SUBREG_TO_REG.
Patch by Tyler Nowicki!

llvm-svn: 160888
2012-07-27 20:19:49 +00:00
Jakob Stoklund Olesen ceee4a9d0c Eliminate a batch of uses of sub_ss and sub_sd in the X86 target.
These idempotent sub-register indices don't do anything --- They simply
map XMM registers to themselves.  They no longer affect register classes
either since the SubRegClasses field has been removed from Target.td.

This patch replaces XMM->XMM EXTRACT_SUBREG and INSERT_SUBREG patterns
with COPY_TO_REGCLASS patterns which simply become COPY instructions.

The number of IMPLICIT_DEF instructions before register allocation is
reduced, and that is the cause of the test case changes.

llvm-svn: 160816
2012-07-26 21:40:42 +00:00
Akira Hatanaka 64626fc20f Fix call setup for PIC.
Patch by Reed Kotler.

llvm-svn: 160774
2012-07-26 02:24:43 +00:00
Manman Ren e8c6b15137 Update testing case for Atom when disabling rematerialization in
TwoAddressInstructionPass.

The generated code for Atom has a different code sequence. This is realted
to commit r160749.

llvm-svn: 160755
2012-07-25 20:17:14 +00:00
Manman Ren cc1dc6dc11 Disable rematerialization in TwoAddressInstructionPass.
It is redundant; RegisterCoalescer will do the remat if it can't eliminate
the copy. Collected instruction counts before and after this. A few extra
instructions are generated due to spilling but it is normal to see these kinds
of changes with almost any small codegen change, according to Jakob.

This also fixed rdar://11830760 where xor is expected instead of movi0.

llvm-svn: 160749
2012-07-25 18:28:13 +00:00
Rafael Espindola 11c38b9657 When a return struct pointer is passed in registers, the called has nothing
to pop.

llvm-svn: 160725
2012-07-25 13:41:10 +00:00
Akira Hatanaka 5a69c235ae Eliminate the stack slot used to save the global base register.
The long branch pass (fixed in r160601) no longer uses the global base register
to compute addresses of branch destinations, so it is not necessary to reserve
a slot on the stack.

llvm-svn: 160703
2012-07-25 03:16:47 +00:00
Rafael Espindola a92cf29f0d Add a cpu to the test. Should fix the atom bot.
llvm-svn: 160701
2012-07-24 22:56:06 +00:00
Rafael Espindola f30e9bfb90 Add a triple to the test.
llvm-svn: 160698
2012-07-24 21:55:04 +00:00
Rafael Espindola a44e193a11 In order to correctly compile
struct s {
  double x1;
  float x2;
};
__attribute__((regparm(3))) struct s f(int a, int b, int c);
void g(void) {
  f(41, 42, 43);
}

We need to be able to represent passing the address of s to f (sret) in a
register (inreg). Turns out that all that is needed is to not mark them as
mutually incompatible.

llvm-svn: 160695
2012-07-24 21:40:17 +00:00
David Chisnall 5b8c1680de ELF does not imply GNU/Linux. Do not assume GNU conventions just because we
are targeting an ELF platform.  Only fold gs-relative (and fs-relative) loads
if it is actually sensible to do so for the target platform.

This fixes PR13438.

llvm-svn: 160687
2012-07-24 20:04:16 +00:00
Akira Hatanaka 26e9ecb7a3 Add basic ability to setup call frame, and make procedure calls.
Hello world will compile and execute with this patch.

Patch by Reed Kotler.

llvm-svn: 160651
2012-07-23 23:45:54 +00:00
Sylvestre Ledru 35521e2310 Fix a typo (the the => the)
llvm-svn: 160621
2012-07-23 08:51:15 +00:00
Nadav Rotem 9056076cab Fixed DAGCombine optimizations which generate select_cc for targets
that do not support it (X86 does not lower select_cc).

PR: 13428

Together with Michael Kuperstein <michael.m.kuperstein@intel.com>

llvm-svn: 160619
2012-07-23 07:59:50 +00:00
Akira Hatanaka f72efdb62f Fix Mips long branch pass.
This pass no longer requires that the global pointer value be saved to the
stack or register since it uses bal instruction to compute branch distance.

llvm-svn: 160601
2012-07-21 03:30:44 +00:00
Jakob Stoklund Olesen e2cfd0d45a Avoid folding loads that are unsafe to move.
LiveRangeEdit::foldAsLoad() can eliminate a register by folding a load
into its only use. Only do that when the load is safe to move, and it
won't extend any live ranges.

This fixes PR13414.

llvm-svn: 160575
2012-07-20 21:29:31 +00:00
Jakob Stoklund Olesen f62c07f147 Split loop exiting edges more aggressively.
PHIElimination splits critical edges when it predicts it can resolve
interference and eliminate copies. It doesn't split the edge if the
interference wouldn't be resolved anyway because the phi-use register is
live in the critical edge anyway.

Teach PHIElimination to split loop exiting edges with interference, even
if it wouldn't resolve the interference. This removes the necessary
copies from the loop, which is still an improvement from injecting the
copies into the loop.

The test case demonstrates the improvement. Before:

LBB0_1:
  cmpb  $0, (%rdx)
  leaq  1(%rdx), %rdx
  movl  %esi, %eax
  je  LBB0_1

After:

LBB0_1:
  cmpb  $0, (%rdx)
  leaq  1(%rdx), %rdx
  je  LBB0_1

  movl  %esi, %eax

llvm-svn: 160571
2012-07-20 20:49:53 +00:00
Preston Gurd f2ea70ae4a Fix remaining lit tests which were failing when run on an Atom
processor.

Patches by Tyler Nowicki, Andy Zhang, and Preston Gurd!

llvm-svn: 160520
2012-07-19 18:53:21 +00:00
Jush Lu e67e07b901 [arm-fast-isel] Add support for vararg function calls.
llvm-svn: 160500
2012-07-19 09:49:00 +00:00
Manman Ren d0a4ee8427 X86: remove redundant cmp against zero.
Updated OptimizeCompare in peephole to remove redundant cmp against zero.
We only remove Compare if CF and OF are not used.

rdar://11855129

llvm-svn: 160454
2012-07-18 21:40:01 +00:00
Preston Gurd f0a48ec8f1 This patch fixes 8 out of 20 unexpected failures in "make check"
when run on an Intel Atom processor. The failures have arisen due
to changes elsewhere in the trunk over the past 8 weeks or so.

These failures were not detected by the Atom buildbot because the
CPU on the Atom buildbot was not being detected as an Atom CPU.
The fix for this problem is in Host.cpp and X86Subtarget.cpp, but
shall remain commented out until the current set of Atom test failures
are fixed.

Patch by Andy Zhang and Tyler Nowicki!

llvm-svn: 160451
2012-07-18 20:49:17 +00:00
Chandler Carruth 985454e0ac Fix a somewhat nasty crasher in PR13378. This crashes inside of
LiveIntervals due to the two-addr pass generating bogus MI code.

The crux of the issue was a loop nesting problem. The intent of the code
which attempts to transform instructions before converting them to
two-addr form is to defer and reprocess any transformed instructions as
the second processing is likely to have more opportunities to coalesce
copies, etc. Unfortunately, there was one section of processing that was
not deferred -- the INSERT_SUBREG rewriting. Due to quirks of how this
rewriting proceeded, not only did it occur early, it removed the bits of
information needed for the deferred processing to correctly generate the
necessary two address form (specifically inserting a copy), but didn't
trigger any immediate assertions and produced what appeared to be
already valid two-address from code. Thus, the assertion only fired much
later in the pipeline.

The fix is to hoist the transformation logic up layer to where it can
more firmly defer all further processing, and to teach the normal
processing to handle an edge case previously handled as part of the
transformation logic. This edge case (already matched tied register
operands) needs to *not* defer any steps.

As has been brought up repeatedly in the process: wow does this code
need refactoring. I *may* squeeze in some time to at least bring sanity
to this loop... but wow... =]

Thanks to Jakob for helpful hints on the way here, and the review.

llvm-svn: 160443
2012-07-18 18:58:22 +00:00
Victor Oliveira a1de408aa7 test commit
llvm-svn: 160438
2012-07-18 17:53:05 +00:00
Jack Carter a62ba82825 Mips specific inline asm operand modifier 'M':
Print the high order register of a double word register operand.

In 32 bit mode, a 64 bit double word integer will be represented
by 2 32 bit registers. This modifier causes the high order register
to be used in the asm expression. It is useful if you are using 
doubles in assembler and continue to control register to variable
relationships.

This patch also fixes a related bug in a previous patch:

    case 'D': // Second part of a double word register operand
    case 'L': // Low order register of a double word register operand
    case 'M': // High order register of a double word register operand

I got 'D' and 'M' confused. The second part of a double word operand
will only match 'M' for one of the endianesses. I had 'L' and 'D'
be the opposite twins when 'L' and 'M' are.

llvm-svn: 160429
2012-07-18 06:41:36 +00:00
Joel Jones b84f7bea09 More replacing of target-dependent intrinsics with target-indepdent
intrinsics.  The second instruction(s) to be handled are the vector versions 
of count set bits (ctpop).

The changes here are to clang so that it generates a target independent 
vector ctpop when it sees an ARM dependent vector bits set count.  The changes 
in llvm are to match the target independent vector ctpop and in 
VMCore/AutoUpgrade.cpp to update any existing bc files containing ARM 
dependent vector pop counts with target-independent ctpops.  There are also 
changes to an existing test case in llvm for ARM vector count instructions and 
to a test for the bitcode upgrade.

<rdar://problem/11892519>

There is deliberately no test for the change to clang, as so far as I know, no
consensus has been reached regarding how to test neon instructions in clang;
q.v. <rdar://problem/8762292>

llvm-svn: 160410
2012-07-18 00:02:16 +00:00
Evan Cheng f73d7553cc Add test case for r160387
llvm-svn: 160389
2012-07-17 19:40:05 +00:00
Nadav Rotem 277a40bc0a Fix a crash in the legalization of large vectors.
When truncating a result of a vector that is split we need
to use the result of the split vector, and not re-split the dead node.

llvm-svn: 160357
2012-07-17 09:07:37 +00:00
Evan Cheng 780f9b5f92 Implement r160312 as target indepedenet dag combine.
llvm-svn: 160354
2012-07-17 08:31:11 +00:00
Evan Cheng f579beca6d This is another case where instcombine demanded bits optimization created
large immediates. Add dag combine logic to recover in case the large
immediates doesn't fit in cmp immediate operand field.

int foo(unsigned long l) {
  return (l>> 47) == 1;
}

we produce

  %shr.mask = and i64 %l, -140737488355328
  %cmp = icmp eq i64 %shr.mask, 140737488355328
  %conv = zext i1 %cmp to i32
  ret i32 %conv

which codegens to

movq    $0xffff800000000000,%rax
andq    %rdi,%rax
movq    $0x0000800000000000,%rcx
cmpq    %rcx,%rax
sete    %al
movzbl    %al,%eax
ret

TargetLowering::SimplifySetCC would transform
(X & -256) == 256 -> (X >> 8) == 1
if the immediate fails the isLegalICmpImmediate() test. For x86,
that's immediates which are not a signed 32-bit immediate.

Based on a patch by Eli Friedman.

PR10328
rdar://9758774

llvm-svn: 160346
2012-07-17 06:53:39 +00:00
Akira Hatanaka 046744467d Fix function select_cc_f32 in test/CodeGen/Mips/selectcc.ll.
llvm-svn: 160329
2012-07-16 23:56:51 +00:00
Evan Cheng 75315b877c For something like
uint32_t hi(uint64_t res)
{
        uint_32t hi = res >> 32;
        return !hi;
}

llvm IR looks like this:
define i32 @hi(i64 %res) nounwind uwtable ssp {
entry:
  %lnot = icmp ult i64 %res, 4294967296
  %lnot.ext = zext i1 %lnot to i32
  ret i32 %lnot.ext
}

The optimizer has optimize away the right shift and truncate but the resulting
constant is too large to fit in the 32-bit immediate field. The resulting x86
code is worse as a result:
        movabsq $4294967296, %rax       ## imm = 0x100000000
        cmpq    %rax, %rdi
        sbbl    %eax, %eax
        andl    $1, %eax

This patch teaches the x86 lowering code to handle ult against a large immediate
with trailing zeros. It will issue a right shift and a truncate followed by
a comparison against a shifted immediate.
        shrq    $32, %rdi
        testl   %edi, %edi
        sete    %al
        movzbl  %al, %eax

It also handles a ugt comparison against a large immediate with trailing bits
set. i.e. X >  0x0ffffffff -> (X >> 32) >= 1

rdar://11866926

llvm-svn: 160312
2012-07-16 19:35:43 +00:00
Nadav Rotem 839a06e9d7 Make ComputeDemandedBits return a deterministic result when computing an AssertZext value.
In the added testcase the constant 55 was behind an AssertZext of type i1, and ComputeDemandedBits
reported that some of the bits were both known to be one and known to be zero.

Together with Michael Kuperstein <michael.m.kuperstein@intel.com>

llvm-svn: 160305
2012-07-16 18:34:53 +00:00
Tom Stellard fc3db614c0 Revert "test/CodeGen/R600: Add some basic tests v6"
This reverts commit 11d3457afcda7848448dd7f11b2ede6552ffb9ea.

llvm-svn: 160300
2012-07-16 18:19:43 +00:00
Alexey Samsonov 893d3d336a Fix tests that failed on i686-win32 after r160248:
1. FileCheck-ize epilogue.ll and allow another asm instruction to restore %rsp.
2. Remove check in widen_arith-3.ll that was hitting instruction in epilogue instead of
vector add.

llvm-svn: 160274
2012-07-16 14:33:36 +00:00
Tom Stellard 6693fbe3eb test/CodeGen/R600: Add some basic tests v6
llvm-svn: 160273
2012-07-16 14:17:19 +00:00
Nadav Rotem 4968e45b9f Fix a bug in the 3-address conversion of LEA when one of the operands is an
undef virtual register. The problem is that ProcessImplicitDefs removes the
definition of the register and marks all uses as undef. If we lose the undef
marker then we get a register which has no def, is not marked as undef. The
live interval analysis does not collect information for these virtual
registers and we crash in later passes.

Together with Michael Kuperstein <michael.m.kuperstein@intel.com>

llvm-svn: 160260
2012-07-16 10:52:25 +00:00
Alexey Samsonov dcc1291d17 This CL changes the function prologue and epilogue emitted on X86 when stack needs realignment.
It is intended to fix PR11468.

Old prologue and epilogue looked like this:
push %rbp
mov %rsp, %rbp
and $alignment, %rsp
push %r14
push %r15
...
pop %r15
pop %r14
mov %rbp, %rsp
pop %rbp

The problem was to reference the locations of callee-saved registers in exception handling:
locations of callee-saved had to be re-calculated regarding the stack alignment operation. It would
take some effort to implement this in LLVM, as currently MachineLocation can only have the form
"Register + Offset". Funciton prologue and epilogue are now changed to:

push %rbp
mov %rsp, %rbp
push %14
push %15
and $alignment, %rsp
...
lea -$size_of_saved_registers(%rbp), %rsp
pop %r15
pop %r14
pop %rbp

Reviewed by Chad Rosier.

llvm-svn: 160248
2012-07-16 06:54:09 +00:00
Nadav Rotem 3050e07108 Fix a bug in the scalarization of BUILD_VECTOR. BUILD_VECTOR elements may be wider than the output element type. Make sure to trunc them if needed.
Together with Michael Kuperstein <michael.m.kuperstein@intel.com>

llvm-svn: 160235
2012-07-15 20:39:08 +00:00
Nadav Rotem eec74c7279 Teach getTargetVShiftNode about TargetConstant nodes.
llvm-svn: 160234
2012-07-15 20:27:43 +00:00
NAKAMURA Takumi 032dc0a06c llvm/test/CodeGen/X86/2012-07-15-broadcastfold.ll: Rewrite expressions to fit various targets.
- Make sure existence of "barrier".
  - Confirm reload corresponding to spill.

llvm-svn: 160232
2012-07-15 14:38:35 +00:00
Nadav Rotem ee3552f88d Rename VBROADCASTSDrm into VBROADCASTSDYrm to match the naming convention.
Allow the folding of vbroadcastRR to vbroadcastRM, where the memory operand is a spill slot.

PR12782.

Together with Michael Kuperstein <michael.m.kuperstein@intel.com>

llvm-svn: 160230
2012-07-15 12:26:30 +00:00
Nadav Rotem 9466e81df6 AVX: Fix a bug in getTargetVShiftNode. The shift amount has to be a 128bit vector with the same element type as the input vector.
This is needed because of the patterns we have for the VP[SLL/SRA/SRL][W/D/Q] instructions.

llvm-svn: 160222
2012-07-14 22:26:05 +00:00
Nadav Rotem 018921002e Add a dagcombine optimization to convert concat_vectors of undefs into a single undef.
The unoptimized concat_vectors isd prevented the canonicalization of the vector_shuffle node.

llvm-svn: 160221
2012-07-14 21:30:27 +00:00
Joel Jones 43cb87839c This is one of the first steps at moving to replace target-dependent
intrinsics with target-indepdent intrinsics.  The first instruction(s) to be 
handled are the vector versions of count leading zeros (ctlz).

The changes here are to clang so that it generates a target independent 
vector ctlz when it sees an ARM dependent vector ctlz.  The changes in llvm 
are to match the target independent vector ctlz and in VMCore/AutoUpgrade.cpp 
to update any existing bc files containing ARM dependent vector ctlzs with 
target-independent ctlzs.  There are also changes to an existing test case in 
llvm for ARM vector count instructions and a new test for the bitcode upgrade.

<rdar://problem/11831778>

There is deliberately no test for the change to clang, as so far as I know, no
consensus has been reached regarding how to test neon instructions in clang;
q.v. <rdar://problem/8762292>

llvm-svn: 160200
2012-07-13 23:25:25 +00:00
Duncan Sands a9c373e49d Restrict this to x86, hopefully fixing ARM buildbots.
llvm-svn: 160163
2012-07-13 07:02:00 +00:00
Benjamin Kramer 4d0916788d Give the rdrand instructions a SideEffect flag and a chain so MachineCSE and MachineLICM don't touch it.
I already had the necessary things in place for IR-level passes but missed the machine passes.

llvm-svn: 160137
2012-07-12 18:14:57 +00:00
Nadav Rotem fdce33a495 The LIT tests below do not specify the exact cpu model and fail on AVX2 machines, because we select different instructions such as vbroadcast, new shuffles, etc.
Patch by Michael Liao.

llvm-svn: 160129
2012-07-12 13:45:15 +00:00
NAKAMURA Takumi f415fe70f3 llvm/test/CodeGen/X86/rdrand.ll: Relax expression corresponding to Win64 CC.
llvm-svn: 160124
2012-07-12 10:22:57 +00:00
Benjamin Kramer cbac2f3bc9 Use %s instead of the explicit name, the latter doesn't work in out-of-tree builds.
llvm-svn: 160120
2012-07-12 09:36:29 +00:00
Benjamin Kramer 0ab2794eda Add intrinsics for Ivy Bridge's rdrand instruction.
The rdrand/cmov sequence is the same that is emitted by both
GCC and ICC.

Fixes PR13284.

llvm-svn: 160117
2012-07-12 09:31:43 +00:00
Duncan Sands 671cc2575d The result type of EXTRACT_VECTOR_ELT doesn't have to match the element type of
the input vector, it can be bigger (this is helpful for powerpc where <2 x i16>
is a legal vector type but i16 isn't a legal type, IIRC).  However this wasn't
being taken into account by ExpandRes_EXTRACT_VECTOR_ELT, causing PR13220.
Lightly tweaked version of a patch by Michael Liao.

llvm-svn: 160116
2012-07-12 09:01:35 +00:00
Craig Topper f7755df776 Update GATHER instructions to support 2 read-write operands. Patch from myself and Manman Ren.
llvm-svn: 160110
2012-07-12 06:52:41 +00:00
Manman Ren 34cb93e192 ARM: Fix optimizeCompare to correctly check safe condition.
It is safe if CPSR is killed or re-defined.
When we are done with the basic block, check whether CPSR is live-out.
Do not optimize away cmp if CPSR is live-out.

llvm-svn: 160090
2012-07-11 22:51:44 +00:00
Akira Hatanaka 20dced4dbb Test case for r160036.
llvm-svn: 160067
2012-07-11 19:50:46 +00:00
Manman Ren 1553ce0e81 X86: Update to peephole optimization to move Movr0 before (Sub, Cmp) pair.
When Movr0 is between sub and cmp, we move Movr0 before sub if it enables
removal of Cmp.

llvm-svn: 160066
2012-07-11 19:35:12 +00:00
Akira Hatanaka 24cf4e36e5 Implement MipsTargetLowering::LowerSELECT_CC to custom lower SELECT_CC.
llvm-svn: 160064
2012-07-11 19:32:27 +00:00
Benjamin Kramer 3aab6a86a2 PR13326: Fix a subtle edge case in the udiv -> magic multiply generator.
This caused 6 of 65k possible 8 bit udivs to be wrong.

llvm-svn: 160058
2012-07-11 18:31:59 +00:00
Nadav Rotem d2bdcebb14 When ext-loading and trunc-storing vectors to memory, on x86 32bit systems, allow loads/stores of 64bit values from xmm registers.
llvm-svn: 160044
2012-07-11 13:27:05 +00:00
Akira Hatanaka 878ad8b28d Lower RETURNADDR node in Mips backend.
Patch by Sasa Stankovic.

llvm-svn: 160031
2012-07-11 00:53:32 +00:00
Jack Carter e8cb2fc616 Mips specific inline asm operand modifier 'L'.
Low order register of a double word register operand. Operands 
   are defined by the name of the variable they are marked with in
   the inline assembler code. This is a way to specify that the 
   operand just refers to the low order register for that variable.
   
   It is the opposite of modifier 'D' which specifies the high order
   register.
   
   Example:
   
 main()
{

    long long ll_input = 0x1111222233334444LL;
    long long ll_val = 3;
    int i_result = 0;

    __asm__ __volatile__( 
		   "or	%0, %L1, %2"
	     : "=r" (i_result) 
	     : "r" (ll_input), "r" (ll_val)); 
}

   Which results in:
   
   	lui	$2, %hi(_gp_disp)
	addiu	$2, $2, %lo(_gp_disp)
	addiu	$sp, $sp, -8
	addu	$2, $2, $25
	sw	$2, 0($sp)
	lui	$2, 13107
	ori	$3, $2, 17476     <-- Low 32 bits of ll_input
	lui	$2, 4369
	ori	$4, $2, 8738      <-- High 32 bits of ll_input
	addiu	$5, $zero, 3  <-- Low 32 bits of ll_val
	addiu	$2, $zero, 0  <-- High 32 bits of ll_val
	#APP
	or	$3, $4, $5        <-- or i_result, high 32 ll_input, low 32 of ll_val
	#NO_APP
	addiu	$sp, $sp, 8
	jr	$ra

If not direction is done for the long long for 32 bit variables results
in using the low 32 bits as ll_val shows.

There is an existing bug if 'L' or 'D' is used for the destination register
for 32 bit long longs in that the target value will be updated incorrectly
for the non-specified part unless explicitly set within the inline asm code.

llvm-svn: 160028
2012-07-10 22:41:20 +00:00
Chad Rosier 3ee9a4c29e Add newline.
llvm-svn: 160006
2012-07-10 17:57:00 +00:00
Chad Rosier 579b1fee6b Add test case accidentally omitted from r160002.
llvm-svn: 160004
2012-07-10 17:49:39 +00:00
Chad Rosier bdb08ac50a Add support for dynamic stack realignment in the presence of dynamic allocas on
X86.  Basically, this is a reapplication of r158087 with a few fixes.

Specifically, (1) the stack pointer is restored from the base pointer before
popping callee-saved registers and (2) in obscure cases (see comments in patch)
we must cache the value of the original stack adjustment in the prologue and
apply it in the epilogue.

rdar://11496434

llvm-svn: 160002
2012-07-10 17:45:53 +00:00
Nadav Rotem d908ddc186 Improve the loading of load-anyext vectors by allowing the codegen to load
multiple scalars and insert them into a vector. Next, we shuffle the elements
into the correct places, as before.
Also fix a small dagcombine bug in SimplifyBinOpWithSameOpcodeHands, when the
migration of bitcasts happened too late in the SelectionDAG process.

llvm-svn: 159991
2012-07-10 13:25:08 +00:00
Akira Hatanaka efff7b763b Make register Mips::RA allocatable if not in mips16 mode.
llvm-svn: 159971
2012-07-10 00:19:06 +00:00
Owen Anderson d4b841f8f9 Teach the DAG combiner to turn sitofp/uitofp from i1 into a conditional move, since there are only two possible values.
Previously, this would become an integer extension operation, followed by a real integer->float conversion.

llvm-svn: 159957
2012-07-09 20:31:12 +00:00
Manman Ren 5f6fa428fa X86: implement functions to analyze & synthesize CMOV|SET|Jcc
getCondFromSETOpc, getCondFromCMovOpc, getSETFromCond, getCMovFromCond

No functional change intended.
If we want to update the condition code of CMOV|SET|Jcc, we first analyze the
opcode to get the condition code, then update the condition code, finally
synthesize the new opcode form the new condition code.

llvm-svn: 159955
2012-07-09 18:57:12 +00:00
Manman Ren bb36074047 X86: Fix optimizeCompare to correctly check safe condition.
It is safe if EFLAGS is killed or re-defined.
When we are done with the basic block, check whether EFLAGS is live-out.
Do not optimize away cmp if EFLAGS is live-out.

llvm-svn: 159888
2012-07-07 03:34:46 +00:00
Manman Ren c965673707 X86: peephole optimization to remove cmp instruction
For each Cmp, we check whether there is an earlier Sub which make Cmp
redundant. We handle the case where SUB operates on the same source operands as
Cmp, including the case where the two source operands are swapped.

llvm-svn: 159838
2012-07-06 17:36:20 +00:00
Chad Rosier 88d53eae56 [fast-isel] Tell fast-isel to do nothing with the new donothing intrinsic.
llvm-svn: 159837
2012-07-06 17:33:39 +00:00
Duncan Sands c65aa3f6ae Attempt to fix windows buildbots. Patch by James Benton.
llvm-svn: 159826
2012-07-06 14:43:16 +00:00
NAKAMURA Takumi 4f934676fb test/CodeGen/X86/sext-setcc-self.ll: Mark it as XFAIL: cygwin,mingw32,win32. Investigating.
llvm-svn: 159820
2012-07-06 12:12:39 +00:00
NAKAMURA Takumi 0246724cd6 Revert r159804, "[arm-fast-isel] Add support for vararg function calls."
It broke LLVM :: CodeGen/Thumb2/large-call.ll on several hosts.

llvm-svn: 159817
2012-07-06 11:12:44 +00:00
Jush Lu 5e6e6264f4 [arm-fast-isel] Add support for vararg function calls.
llvm-svn: 159804
2012-07-06 03:02:37 +00:00
Jack Carter b2af512cef Mips specific inline asm operand modifier D.
Print the second half of a double word operand.
   
   The include list was cleaned up a bit as well.
   
   Also the test case was modified to test for both
   big and little patterns.
   

llvm-svn: 159787
2012-07-05 23:58:21 +00:00
Akira Hatanaka bbf374c4c6 test case for r159770.
llvm-svn: 159771
2012-07-05 19:29:31 +00:00
Duncan Sands 0552a2cad2 Use the right kind of booleans: we were emitting 0/1 booleans, instead of 0/-1
booleans.  Patch by James Benton.

llvm-svn: 159739
2012-07-05 09:32:46 +00:00
Jakob Stoklund Olesen 2dee812445 Ensure CopyToReg nodes are always glued to the call instruction.
The CopyToReg nodes that set up the argument registers before a call
must be glued to the call instruction. Otherwise, the scheduler may emit
the physreg copies long before the call, causing long live ranges for
the fixed registers.

Besides disabling good register allocation, that can also expose
problems when EmitInstrWithCustomInserter() splits a basic block during
the live range of a physreg.

llvm-svn: 159721
2012-07-04 19:28:31 +00:00
Rafael Espindola 1a7cf13215 Add a testcase for pr13209. It is not a great test, but it still fails if
159509 and 159479 are reverted. It would be really nice to be able to run
just the coalescer :-(

llvm-svn: 159715
2012-07-04 16:06:00 +00:00
Jakob Stoklund Olesen 49e4d4b3ef Add early if-conversion support to X86.
Implement the TII hooks needed by EarlyIfConversion to create cmov
instructions and estimate their latency.

Early if-conversion is still not enabled by default.

llvm-svn: 159695
2012-07-04 00:09:58 +00:00
NAKAMURA Takumi 2338556320 test/CodeGen/SPARC/private.ll: Fixup. Forgot to prune old RUN lines.
llvm-svn: 159643
2012-07-03 04:29:20 +00:00
NAKAMURA Takumi c2a5bd6822 test/CodeGen/SPARC/private.ll: FileCheck-ize.
llvm-svn: 159642
2012-07-03 04:21:57 +00:00
NAKAMURA Takumi dff1a78321 test/CodeGen/X86/sincos.ll: FileCheck-ize.
llvm-svn: 159639
2012-07-03 03:59:22 +00:00
NAKAMURA Takumi 10dc235746 test/CodeGen/X86/fabs.ll: FileCheck-ize.
llvm-svn: 159638
2012-07-03 03:59:15 +00:00
NAKAMURA Takumi ff680b1db6 test/CodeGen/X86/2007-09-05-InvalidAsm.ll: FileCheck-ize.
llvm-svn: 159637
2012-07-03 03:59:08 +00:00
NAKAMURA Takumi e5e19e4f7b test/CodeGen/X86/2004-03-30-Select-Max.ll: FileCheck-ize.
llvm-svn: 159636
2012-07-03 03:58:59 +00:00
Jack Carter b353094f27 mips32 long long register inline asm constraint support.
inlineasm-cnstrnt-bad-r-1.ll is NOT supposed to fail, so it was removed.    This resulted in the removal of a negative test (inlineasm-cnstrnt-bad-r-1.ll)
    

llvm-svn: 159625
2012-07-02 23:35:23 +00:00
Eric Christopher dfc3e68c40 Revert " mips32 long long register inline asm constraint support." as
it appears to be breaking the bots.

This reverts commit 1b055ce320fa13f6f1ac81670d11b45e01f79876.

llvm-svn: 159619
2012-07-02 23:22:25 +00:00
Jack Carter 939236c2eb deleted test/CodeGen/Mips/inlineasm-cnstrnt-bad-r-1.ll
llvm-svn: 159617
2012-07-02 23:21:22 +00:00
Jack Carter 5c1a01a625 mips32 long long register inline asm constraint support.
inlineasm-cnstrnt-bad-r-1.ll is NOT supposed to fail, so it was removed.    This resulted in the removal of a negative test (inlineasm-cnstrnt-bad-r-1.ll)
    

llvm-svn: 159610
2012-07-02 22:39:45 +00:00
Bob Wilson cac3b90633 Extend TargetPassConfig to allow running only a subset of the normal passes.
This is still a work in progress but I believe it is currently good enough
to fix PR13122 "Need unit test driver for codegen IR passes".  For example,
you can run llc with -stop-after=loop-reduce to have it dump out the IR after
running LSR.  Serializing machine-level IR is not yet supported but we have
some patches in progress for that.

The plan is to serialize the IR to a YAML file, containing separate sections
for the LLVM IR, machine-level IR, and whatever other info is needed.  Chad
suggested that we stash the stop-after pass in the YAML file and use that
instead of the start-after option to figure out where to restart the
compilation.  I think that's a great idea, but since it's not implemented yet
I put the -start-after option into this patch for testing purposes.

llvm-svn: 159570
2012-07-02 19:48:45 +00:00
Chandler Carruth ff123d5c63 Fix the remaining TCL-style quotes found in the testsuite. This is
another mechanical change accomplished though the power of terrible Perl
scripts.

I have manually switched some "s to 's to make escaping simpler.

While I started this to fix tests that aren't run in all configurations,
the massive number of tests is due to a really frustrating fragility of
our testing infrastructure: things like 'grep -v', 'not grep', and
'expected failures' can mask broken tests all too easily.

Essentially, I'm deeply disturbed that I can change the testsuite so
radically without causing any change in results for most platforms. =/

llvm-svn: 159547
2012-07-02 19:09:46 +00:00
Chandler Carruth 5da53436d5 Convert the uses of '|&' to use '2>&1 |' instead, which works on old
versions of Bash. In addition, I can back out the change to the lit
built-in shell test runner to support this.

This should fix the majority of fallout on Darwin, but I suspect there
will be a few straggling issues.

llvm-svn: 159544
2012-07-02 18:37:59 +00:00
Bob Wilson 2297221028 Do not attempt to use ROR for Thumb1.
Patch by Matt Fischer!

llvm-svn: 159538
2012-07-02 17:22:47 +00:00
Chandler Carruth 872ac7cfad Fix the TCL-style quoting in one random test that somehow slipped
through my perl nets.

With this, the test suite passes even if I force it to run with the
built-in shell test logic, except for a test which REQUIREs shell.

llvm-svn: 159529
2012-07-02 13:29:47 +00:00
Chandler Carruth a5a29f970e Convert all tests using TCL-style quoting to use shell-style quoting.
This was done through the aid of a terrible Perl creation. I will not
paste any of the horrors here. Suffice to say, it require multiple
staged rounds of replacements, state carried between, and a few
nested-construct-parsing hacks that I'm not proud of. It happens, by
luck, to be able to deal with all the TCL-quoting patterns in evidence
in the LLVM test suite.

If anyone is maintaining large out-of-tree test trees, feel free to poke
me and I'll send you the steps I used to convert things, as well as
answer any painful questions etc. IRC works best for this type of thing
I find.

Once converted, switch the LLVM lit config to use ShTests the same as
Clang. In addition to being able to delete large amounts of Python code
from 'lit', this will also simplify the entire test suite and some of
lit's architecture.

Finally, the test suite runs 33% faster on Linux now. ;]
For my 16-hardware-thread (2x 4-core xeon e5520): 36s -> 24s

llvm-svn: 159525
2012-07-02 12:47:22 +00:00
Chandler Carruth ae00a80869 Rewrite three tests that had truly egregious abuses of 'grep' in them to
use FileCheck.

Aside from removing a dependence on TCL-style quoting, this also makes
the tests ... significantly more robust. =] It would be really, *really*
great of the maintainer(s) of the CellSPU backend went through and
systematically rewrite these tests to use FileCheck. There are a lot
more that have nearly this bad of abuses.

Another step along the path to a TclTest-free testsuite.

llvm-svn: 159523
2012-07-02 12:20:14 +00:00
Rafael Espindola a77d31d7fd Now that RegistersDefinedFromSameValue handles one instruction being an
implicit_def, the other instruction can be anything, including instructions
that define multiple values. Be careful about that and don't assume what operand
0 is.
Fixes pr13249.

llvm-svn: 159509
2012-07-01 17:08:01 +00:00
Elena Demikhovsky 9af899fa88 Optimization of shuffle node that can fit to the register form of VBROADCAST instruction on AVX2.
llvm-svn: 159504
2012-07-01 06:12:26 +00:00
Jakob Stoklund Olesen 3e3cdecf98 Clear kill flags in InstrEmitter::EmitSubregNode().
When a local virtual register is made global, make sure to clear any
existing kill flags.

llvm-svn: 159461
2012-06-29 21:00:03 +00:00
Rafael Espindola efdfb1e6b2 In the initial exec mode we always do a load to find the address of a variable.
Before this patch in pic 32 bit code we would add the global base register
and not load from that address. This is a really old bug, but before the
introduction of the tls attributes we would never select initial exec for
pic code.

llvm-svn: 159409
2012-06-29 04:22:35 +00:00
Manman Ren 98a5bf24a9 X86: add more GATHER intrinsics in LLVM
Corrected type for index of llvm.x86.avx2.gather.d.pd.256
  from 256-bit to 128-bit.
Corrected types for src|dst|mask of llvm.x86.avx2.gather.q.ps.256
  from 256-bit to 128-bit.

Support the following intrinsics:
  llvm.x86.avx2.gather.d.q, llvm.x86.avx2.gather.q.q
  llvm.x86.avx2.gather.d.q.256, llvm.x86.avx2.gather.q.q.256
  llvm.x86.avx2.gather.d.d, llvm.x86.avx2.gather.q.d
  llvm.x86.avx2.gather.d.d.256, llvm.x86.avx2.gather.q.d.256

llvm-svn: 159402
2012-06-29 00:54:20 +00:00
Nuno Lopes ec9653b363 add a new @llvm.donothing intrinsic that, well, does nothing, and teach CodeGen to ignore calls to it
llvm-svn: 159383
2012-06-28 22:30:12 +00:00
Jack Carter 6c0bc0b378 The Mips specific inline asm operand modifier 'z' has the
following description in the gnu sources:

    Print $0 if operand is zero otherwise print the op normally.

llvm-svn: 159324
2012-06-28 01:33:40 +00:00
Akira Hatanaka ad31cd9a01 Test case for r159240.
llvm-svn: 159242
2012-06-27 00:40:34 +00:00
Rafael Espindola e0eaa043eb Fix llc's -print-before=pass and -print-after=pass.
llvm-svn: 159227
2012-06-26 21:33:36 +00:00
Manman Ren a09820414a X86: add GATHER intrinsics (AVX2) in LLVM
Support the following intrinsics:
llvm.x86.avx2.gather.d.pd, llvm.x86.avx2.gather.q.pd
llvm.x86.avx2.gather.d.pd.256, llvm.x86.avx2.gather.q.pd.256
llvm.x86.avx2.gather.d.ps, llvm.x86.avx2.gather.q.ps
llvm.x86.avx2.gather.d.ps.256, llvm.x86.avx2.gather.q.ps.256

Modified Disassembler to handle VSIB addressing mode.

llvm-svn: 159221
2012-06-26 19:47:59 +00:00
Jack Carter 5e69cffed5 There are a number of generic inline asm operand modifiers that
up to r158925 were handled as processor specific. Making them 
generic and putting tests for these modifiers in the CodeGen/Generic
directory caused a number of targets to fail. 

This commit addresses that problem by having the targets call 
the generic routine for generic modifiers that they don't currently
have explicit code for.

For now only generic print operands 'c' and 'n' are supported.vi


Affected files:

    test/CodeGen/Generic/asm-large-immediate.ll
    lib/Target/PowerPC/PPCAsmPrinter.cpp
    lib/Target/NVPTX/NVPTXAsmPrinter.cpp
    lib/Target/ARM/ARMAsmPrinter.cpp
    lib/Target/XCore/XCoreAsmPrinter.cpp
    lib/Target/X86/X86AsmPrinter.cpp
    lib/Target/Hexagon/HexagonAsmPrinter.cpp
    lib/Target/CellSPU/SPUAsmPrinter.cpp
    lib/Target/Sparc/SparcAsmPrinter.cpp
    lib/Target/MBlaze/MBlazeAsmPrinter.cpp
    lib/Target/Mips/MipsAsmPrinter.cpp
    
MSP430 isn't represented because it did not even run with
the long existing 'c' modifier and it was not apparent what
needs to be done to get it inline asm ready.

Contributer: Jack Carter
llvm-svn: 159203
2012-06-26 13:49:27 +00:00
Elena Demikhovsky 26088d2e24 Shuffle optimization for AVX/AVX2.
The current patch optimizes frequently used shuffle patterns and gives these instruction sequence reduction.
Before:
      vshufps $-35, %xmm1, %xmm0, %xmm2 ## xmm2 = xmm0[1,3],xmm1[1,3]
       vpermilps       $-40, %xmm2, %xmm2 ## xmm2 = xmm2[0,2,1,3]
       vextractf128    $1, %ymm1, %xmm1
       vextractf128    $1, %ymm0, %xmm0
       vshufps $-35, %xmm1, %xmm0, %xmm0 ## xmm0 = xmm0[1,3],xmm1[1,3]
       vpermilps       $-40, %xmm0, %xmm0 ## xmm0 = xmm0[0,2,1,3]
       vinsertf128     $1, %xmm0, %ymm2, %ymm0
After:
      vshufps $13, %ymm0, %ymm1, %ymm1 ## ymm1 = ymm1[1,3],ymm0[0,0],ymm1[5,7],ymm0[4,4]
      vshufps $13, %ymm0, %ymm0, %ymm0 ## ymm0 = ymm0[1,3,0,0,5,7,4,4]
      vunpcklps       %ymm1, %ymm0, %ymm0 ## ymm0 = ymm0[0],ymm1[0],ymm0[1],ymm1[1],ymm0[4],ymm1[4],ymm0[5],ymm1[5]

llvm-svn: 159188
2012-06-26 08:04:10 +00:00
Andrew Trick fb2ba3e1cb Enable the new LoopInfo algorithm by default.
The primary advantage is that loop optimizations will be applied in a
stable order. This helps debugging and unit test creation. It is also
a better overall implementation without pathologically bad performance
on deep functions.

On large functions (llvm-stress --size=200000 | opt -loops)
Before: 0.1263s
After:  0.0225s

On deep functions (after tweaking llvm-stress, thanks Nadav):
Before: 0.2281s
After:  0.0227s

See r158790 for more comments.

The loop tree is now consistently generated in forward order, but loop
passes are applied in reverse order over the program. If we have a
loop optimization that prefers forward order, that can easily be
achieved by adding a different type of LoopPassManager.

llvm-svn: 159183
2012-06-26 04:11:38 +00:00
Eli Friedman bbcd09cc00 Make some ugly hacks for inline asm operands which name a specific register a bit more thorough. PR13196.
llvm-svn: 159176
2012-06-25 23:42:33 +00:00
Manman Ren 606953fbe7 ARM: update peephole optimization.
More condition codes are included when deciding whether to remove cmp after
a sub instruction. Specifically, we extend from GE|LT|GT|LE to 
GE|LT|GT|LE|HS|LS|HI|LO|EQ|NE. If we have "sub a, b; cmp b, a; movhs", we
should be able to replace with "sub a, b; movls".

rdar: 11725965
llvm-svn: 159166
2012-06-25 21:49:38 +00:00
Jakob Stoklund Olesen a57fc12ec9 Enforce stricter liveness rules for PHIs.
Verify that all paths from the entry block to a virtual register read
pass through a def. Enable this check even when MRI->isSSA() is false.

Verify that the live range of a virtual register is live out of all
predecessor blocks, even for PHI-values.

This requires that PHIElimination sometimes inserts IMPLICIT_DEF
instruction in predecessor blocks.

llvm-svn: 159150
2012-06-25 18:18:27 +00:00
Jakob Stoklund Olesen eb49566447 Run ProcessImplicitDefs on SSA form where it can be much simpler.
Implicitly defined virtual registers can simply have the <undef> bit set
on all uses, and copies can be turned into implicit defs recursively.

Physical registers are a bit trickier. We handle the common case where a
physreg def is used by a nearby instruction in the same basic block. For
more complicated cases, just leave the IMPLICIT_DEF instruction in.

llvm-svn: 159149
2012-06-25 18:12:18 +00:00
Jakob Stoklund Olesen 2e22e6a361 %RCX is not a function live-out in eh.return functions.
The function live-out registers must be live at all function returns,
and %RCX is only used by eh.return. When a function also has a normal
return, only %RAX holds a return value.

This fixes PR13188.

llvm-svn: 159116
2012-06-24 15:53:01 +00:00
Pete Cooper fe212e762f DAG legalisation can now handle illegal fma vector types by scalarisation
llvm-svn: 159092
2012-06-24 00:05:44 +00:00
Hans Wennborg cbe34b4cc9 Extend the IL for selecting TLS models (PR9788)
This allows the user/front-end to specify a model that is better
than what LLVM would choose by default. For example, a variable
might be declared as

  @x = thread_local(initialexec) global i32 42

if it will not be used in a shared library that is dlopen'ed.

If the specified model isn't supported by the target, or if LLVM can
make a better choice, a different model may be used.

llvm-svn: 159077
2012-06-23 11:37:03 +00:00
Rafael Espindola a3088f09b3 Handle aliases to tls variables in all architectures, not just x86.
llvm-svn: 159058
2012-06-23 00:30:03 +00:00
Evan Cheng 68c2f9a9a7 (sub X, imm) gets canonicalized to (add X, -imm)
There are patterns to handle immediates when they fit in the immediate field.
e.g. %sub = add i32 %x, -123
=>   sub r0, r0, #123
Add patterns to catch immediates that do not fit but should be materialized
with a single movw instruction rather than movw + movt pair.
e.g. %sub = add i32 %x, -65535
=>   movw r1, #65535
     sub r0, r0, r1

rdar://11726136

llvm-svn: 159057
2012-06-23 00:29:06 +00:00
Hal Finkel 460e94d842 Add support for the PPC isel instruction.
The isel (integer select) instruction is supported on the 440 and A2
embedded cores and on the POWER7.

llvm-svn: 159045
2012-06-22 23:10:08 +00:00
Chad Rosier 1ce3805b23 FileCheckize tests.
llvm-svn: 159044
2012-06-22 23:04:02 +00:00
Lang Hames c98ebda325 Rename fp-op fusion option (yet again) for compatibility with GCC option.
llvm-svn: 159042
2012-06-22 22:31:00 +00:00
Evan Cheng f5bd6c6510 EmitZerofill should take a 64-bit size or else it's chopping off large zero-filled global. rdar://11729134
llvm-svn: 159023
2012-06-22 20:14:46 +00:00
NAKAMURA Takumi c384b95939 test/CodeGen/Generic/asm-large-immediate.ll: Mark it as XFAIL: powerpc, possibly due to r158939.
llvm-svn: 158994
2012-06-22 13:41:00 +00:00
Jakob Stoklund Olesen 321d41a871 Functions calling __builtin_eh_return must have a frame pointer.
The code in X86TargetLowering::LowerEH_RETURN() assumes that a frame
pointer exists, but the frame pointer was forced by the presence of
llvm.eh.unwind.init which isn't guaranteed.

If llvm.eh.unwind.init is actually required in functions calling
eh.return (is it?), we should diagnose that instead of emitting bad
machine code.

This should fix the dragonegg-x86_64-linux-gcc-4.6-test bot.

llvm-svn: 158961
2012-06-22 03:04:27 +00:00
Andrew Trick 3ccb1b8cf9 ARM scheduling fix: compute predicated implicit use properly.
Minor drive by fix to cleanup latency computation. Calling
getOperandLatency with a deliberately incorrect operand index does not
give you the latency you want.

llvm-svn: 158959
2012-06-22 02:50:31 +00:00
Lang Hames b8650f106a Rename -allow-excess-fp-precision flag to -fuse-fp-ops, and switch from a
boolean flag to an enum: { Fast, Standard, Strict } (default = Standard).

This option controls the creation by optimizations of fused FP ops that store
intermediate results in higher precision than IEEE allows (E.g. FMAs). The
behavior of this option is intended to match the behaviour specified by a
soon-to-be-introduced frontend flag: '-ffuse-fp-ops'.

Fast mode - allows formation of fused FP ops whenever they're profitable.

Standard mode - allow fusion only for 'blessed' FP ops. At present the only
blessed op is the fmuladd intrinsic. In the future more blessed ops may be
added.

Strict mode - allow fusion only if/when it can be proven that the excess
precision won't effect the result.

Note: This option only controls formation of fused ops by the optimizers.  Fused
operations that are explicitly requested (e.g. FMA via the llvm.fma.* intrinsic)
will always be honored, regardless of the value of this option.

Internally TargetOptions::AllowExcessFPPrecision has been replaced by
TargetOptions::AllowFPOpFusion.

llvm-svn: 158956
2012-06-22 01:09:09 +00:00
Jack Carter c457f62033 The inline asm operand modifier 'n' is suppose
to be generic across architectures. It has the
following description in the gnu sources:

    Negate the immediate constant

Several Architectures such as x86 have local implementations
of operand modifier 'n' which go beyond the above description
slightly. This won't affect them.

Affected files:

    lib/CodeGen/AsmPrinter/AsmPrinterInlineAsm.cpp
        Added 'n' to the switch cases.

    test/CodeGen/Generic/asm-large-immediate.ll
        Generic compiled test (x86 for me)

    test/CodeGen/Mips/asm-large-immediate.ll
        Mips compiled version of the generic one

Contributer: Jack Carter
llvm-svn: 158939
2012-06-21 21:37:54 +00:00
Akira Hatanaka 765c312314 1. fix null program output after some other changes
2. re-enable null.ll test
3. fix some minor style violations

Patch by Reed Kotler.

llvm-svn: 158935
2012-06-21 20:39:10 +00:00
Hal Finkel a86b0f20dd Treat TargetGlobalAddress as a constant for the purpose of matching pre-inc stores on PPC.
Thanks to Tobias von Koch for pointing out this problem.

llvm-svn: 158932
2012-06-21 20:10:48 +00:00
Jack Carter b2fd5f66b4 The inline asm operand modifier 'c' is suppose
to be generic across architectures. It has the
following description in the gnu sources:

    Substitute immediate value without immediate syntax

Several Architectures such as x86 have local implementations
of operand modifier 'c' which go beyond the above description
slightly. To make use of the generic modifiers without overriding
local implementation one can make a call to the base class method
for AsmPrinter::PrintAsmOperand() in the locally derived method's 
"default" case in the switch statement. That way if it is already
defined locally the generic version will never get called.

This change is needed when test/CodeGen/generic/asm-large-immediate.ll
failed on a native Mips board. The test was assuming a generic
implementation was in place.

Affected files:

    lib/Target/Mips/MipsAsmPrinter.cpp:
        Changed the default case to call the base method.
    lib/CodeGen/AsmPrinter/AsmPrinterInlineAsm.cpp
        Added 'c' to the switch cases.
    test/CodeGen/Mips/asm-large-immediate.ll
        Mips compiled version of the generic one

Contributer: Jack Carter
llvm-svn: 158925
2012-06-21 17:14:46 +00:00
NAKAMURA Takumi 613663cfe2 Revert r158209, "test/CodeGen/Generic/APIntLoadStore.ll: Mark as XFAIL:ppc since r157911."
It passes according to ppc changes.

llvm-svn: 158917
2012-06-21 13:43:06 +00:00
Lang Hames 90b2a4cbad Add a missing llvm.fma -> VFNMS pattern to the ARM backend.
llvm-svn: 158902
2012-06-21 06:10:00 +00:00
Evan Cheng 8c2ad81238 Emit a single _udivmodsi4 libcall instead of two separate _udivsi3 and
_umodsi3 libcalls if they have the same arguments. This optimization
was apparently broken if one of the node was replaced in place.
rdar://11714607

llvm-svn: 158900
2012-06-21 05:56:05 +00:00
Jakob Stoklund Olesen 51c63e64e3 Remove the -live-regunits command line option.
Register allocators depend on it being permanently enabled now.

llvm-svn: 158873
2012-06-20 23:31:34 +00:00
Jakob Stoklund Olesen 833308d785 Only update regunit live ranges that have been precomputed.
Regunit live ranges are computed on demand, so when mi-sched calls
handleMove, some regunits may not have live ranges yet.

That makes updating them easier: Just skip the non-existing ranges. They
will be computed correctly from the rescheduled machine code when they
are needed.

llvm-svn: 158831
2012-06-20 18:00:57 +00:00
Hal Finkel ca542beffe Add support for generating reg+reg (indexed) pre-inc loads on PPC.
llvm-svn: 158823
2012-06-20 15:43:03 +00:00
Craig Topper b9e8e18949 Don't insert 128-bit UNDEF into 256-bit vectors. Just keep the 256-bit vector. Original patch by Elena Demikhovsky. Tweaked by me to allow possibility of covering more cases.
llvm-svn: 158792
2012-06-20 05:39:26 +00:00
Lang Hames 39fb1d08dc Add DAG-combines for aggressive FMA formation.
This patch adds DAG combines to form FMAs from pairs of FADD + FMUL or
FSUB + FMUL. The combines are performed when:
(a) Either
      AllowExcessFPPrecision option (-enable-excess-fp-precision for llc)
        OR
      UnsafeFPMath option (-enable-unsafe-fp-math)
    are set, and
(b) TargetLoweringInfo::isFMAFasterThanMulAndAdd(VT) is true for the type of
    the FADD/FSUB, and
(c) The FMUL only has one user (the FADD/FSUB).

If your target has fast FMA instructions you can make use of these combines by
overriding TargetLoweringInfo::isFMAFasterThanMulAndAdd(VT) to return true for
types supported by your FMA instruction, and adding patterns to match ISD::FMA
to your FMA instructions.

llvm-svn: 158757
2012-06-19 22:51:23 +00:00
Jakob Stoklund Olesen 77a0cfb19a Add a triple.
The test was failing on Linux because of asm syntax differences.

llvm-svn: 158748
2012-06-19 21:46:25 +00:00
Jakob Stoklund Olesen 0f855e4263 Implement PPCInstrInfo::isCoalescableExtInstr().
The PPC::EXTSW instruction preserves the low 32 bits of its input, just
like some of the x86 instructions. Use it to reduce register pressure
when the low 32 bits have multiple uses.

This requires a small change to PeepholeOptimizer since EXTSW takes a
64-bit input register.

This is related to PR5997.

llvm-svn: 158743
2012-06-19 21:14:34 +00:00
Hal Finkel 1cc27e44a4 Add support for generating reg+reg preinc stores on PPC.
PPC will now generate STWUX and friends.

llvm-svn: 158698
2012-06-19 02:34:32 +00:00
Rafael Espindola 31567515ed really add a triple :-(
llvm-svn: 158696
2012-06-19 02:17:35 +00:00
Rafael Espindola f2ae4075c8 Add a triple to the test.
llvm-svn: 158695
2012-06-19 01:42:34 +00:00
Rafael Espindola ca3e0ee8b3 Move the support for using .init_array from ARM to the generic
TargetLoweringObjectFileELF. Use this to support it on X86. Unlike ARM,
on X86 it is not easy to find out if .init_array should be used or not, so
the decision is made via TargetOptions and defaults to off.

Add a command line option to llc that enables it.

llvm-svn: 158692
2012-06-19 00:48:28 +00:00
Manman Ren 6e1fd46fdf ARM: use NOEN loads and stores if possible when handling struct byval.
This change is to be enabled in clang.

rdar://9877866

llvm-svn: 158684
2012-06-18 22:23:48 +00:00
Joel Jones 3237ce737e This change handles a another case for generating the bic instruction
when a compile time constant is known.  This occurs when implicitly zero 
extending function arguments from 16 bits to 32 bits.  The 8 bit case doesn't
need to be handled, as the 8 bit constants are encoded directly, thereby
not needing a separate load instruction to form the constant into a register.

<rdar://problem/11481151>

llvm-svn: 158659
2012-06-18 14:51:32 +00:00
Chandler Carruth a1da0bf5ef Add a regression test for the bug exposed by r158087, which has been
temporarily reverted.

This test is annoyingly overspecified, but I don't know of another way
to thoroughly test the saving and restoring of the registers. While this
will have to be adjusted even with the issue fixed in order to re-apply
r158087, those adjustments should very clearly indicate that it is still
correct (%esp getting restored prior to pops), whereas without it, this
case can easily slip under the radar.

Still, any suggestions for improvements are very welcome.

All credit to Matt Beaumont-Gay for reducing this out of an insane
Address Sanitizer crash to a reasonably small seg-faulting C program
when built with -mstackrealign. I just reduced it to IR, which was much
simpler. =]

llvm-svn: 158656
2012-06-18 09:15:04 +00:00
Chandler Carruth 2cc11fd8c7 Temporarily revert r158087.
This patch causes problems when both dynamic stack realignment and
dynamic allocas combine in the same function. With this patch, we no
longer build the epilog correctly, and silently restore registers from
the wrong position in the stack.

Thanks to Matt for tracking this down, and getting at least an initial
test case to Chad. I'm going to try to check a variation of that test
case in so we can easily track the fixes required.

llvm-svn: 158654
2012-06-18 07:03:12 +00:00
Hal Finkel 6261c2dc28 Cleanup trip-count finding for PPC CTR loops (and some bug fixes).
This cleans up the method used to find trip counts in order to form CTR loops on PPC.
This refactoring allows the pass to find loops which have a constant trip count but also
happen to end with a comparison to zero. This also adds explicit FIXMEs to mark two different
classes of loops that are currently ignored.

In addition, we now search through all potential induction operations instead of just the first.
Also, we check the predicate code on the conditional branch and abort the transformation if the
code is not EQ or NE, and we then make sure that the branch to be transformed matches the
condition register defined by the comparison (multiple possible comparisons will be considered).

llvm-svn: 158607
2012-06-16 20:34:07 +00:00
Manman Ren e0763c7472 ARM: optimization for sub+abs.
This patch will optimize abs(x-y)
FROM
sub, movs, rsbmi
TO
subs, rsbmi

For abs, we will use cmp instead of movs. This is necessary because we already
have an existing peephole pass which optimizes away cmp following sub.

rdar: 11633193
llvm-svn: 158551
2012-06-15 21:32:12 +00:00
Jakob Stoklund Olesen a15a224db0 Preserve <undef> flags in ARMExpandPseudo.
This probably mostly shows up in bugpoint-generated code.

llvm-svn: 158527
2012-06-15 17:46:54 +00:00
Akira Hatanaka d8ab16b86f 1. introduce MipsPat in place of Pat in order to exclude those from
being used by Mips16 or Micro Mips
2. clean up a few lines too long encountered

Patch by Reed Kotler.

llvm-svn: 158470
2012-06-14 21:03:23 +00:00
Akira Hatanaka 1b420ac4c8 Make machine verifier check the first instruction of the last bundle instead of
the last instruction of a basic block.

llvm-svn: 158468
2012-06-14 20:51:13 +00:00
Manman Ren 2764301a77 Revert: test/CodeGen/ARM/iabs.ll in r158441
Sorry that I accidently checked in this file with my previous commit.

llvm-svn: 158442
2012-06-14 06:04:02 +00:00
Manman Ren c2bc2d106b InstCombine: fix a bug when combining (fcmp cc0 x, y) && (fcmp cc1 x, y).
uno && ueq was converted to ueq, it should be converted to uno.

llvm-svn: 158441
2012-06-14 05:57:42 +00:00
Akira Hatanaka c6496e2cb6 Test case for MIPS long branch pass.
llvm-svn: 158438
2012-06-14 02:12:21 +00:00
Akira Hatanaka 843aca9328 Fix test cases.
llvm-svn: 158435
2012-06-14 01:21:00 +00:00
Akira Hatanaka df5205ef3d Implement a DAGCombine in MipsISelLowering.cpp which transforms the following
pattern:

(add v0, (add v1, abs_lo(tjt))) => (add (add v0, v1), abs_lo(tjt))

"tjt" is a TargetJumpTable node. 

llvm-svn: 158419
2012-06-13 20:33:18 +00:00
Akira Hatanaka 1daf8c2a16 Set a higher value for maxStoresPerMemcpy in MipsISelLowering.cpp.
llvm-svn: 158414
2012-06-13 19:33:32 +00:00
Akira Hatanaka f0273603f5 Implement fastcc calling convention for MIPS.
llvm-svn: 158410
2012-06-13 18:06:00 +00:00
Richard Osborne ab7d788eb5 Fix pattern for MKMSK instruction.
llvm-svn: 158409
2012-06-13 17:59:12 +00:00
Craig Topper 71dc02d659 Fix intrinsics for XOP frczss/sd instructions. These instructions only take one source register and zero the upper bits of the destination rather than preserving them.
llvm-svn: 158396
2012-06-13 07:18:53 +00:00
Akira Hatanaka 5fa541231b disable use of directive .set nomicromips
until this directive is pushed in gas to open source fsf

Patch by Reed Kotler.

llvm-svn: 158381
2012-06-13 02:41:14 +00:00
Andrew Trick 344fb64fa3 sched: fix latency of memory dependence chain edges for consistency.
For store->load dependencies that may alias, we should always use
TrueMemOrderLatency, which may eventually become a subtarget hook. In
effect, we should guarantee at least TrueMemOrderLatency on at least
one DAG path from a store to a may-alias load.

This should fix the standard mode as well as -enable-aa-sched-mi".

llvm-svn: 158380
2012-06-13 02:39:03 +00:00
Chad Rosier c6916f88a8 [arm-fast-isel] Add support for -arm-long-calls.
Patch by Jush Lu <jush.msn@gmail.com>.

llvm-svn: 158368
2012-06-12 19:25:13 +00:00
Jakob Stoklund Olesen e782fa649f Fix test that depends on register allocation.
The test is really checking the prolog/epilog load/store multiple
formation.

llvm-svn: 158328
2012-06-11 21:14:28 +00:00
Jakob Stoklund Olesen 4e28777465 Fix test case to work on ARM.
Patch by James Benton!

llvm-svn: 158316
2012-06-11 16:01:14 +00:00
Bill Wendling 4b79647a6e Re-enable the CMN instruction.
We turned off the CMN instruction because it had semantics which we weren't
getting correct. If we are comparing with an immediate, then it's okay to use
the CMN instruction.
<rdar://problem/7569620>

llvm-svn: 158302
2012-06-11 08:07:26 +00:00
Hal Finkel 4e9f1a859f Enable ILP scheduling for all nodes by default on PPC.
Over the entire test-suite, this has an insignificantly negative average
performance impact, but reduces some of the worst slowdowns from the
anti-dep. change (r158294).

Largest speedups:
SingleSource/Benchmarks/Stanford/Quicksort - 28%
SingleSource/Benchmarks/Stanford/Towers - 24%
SingleSource/Benchmarks/Shootout-C++/matrix - 23%
MultiSource/Benchmarks/SciMark2-C/scimark2 - 19%
MultiSource/Benchmarks/MiBench/automotive-bitcount/automotive-bitcount - 15%
(matrix and automotive-bitcount were both in the top-5 slowdown list from the
anti-dep. change)

Largest slowdowns:
MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 28%
MultiSource/Benchmarks/mediabench/gsm/toast/toast - 26%
MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan - 21%
SingleSource/Benchmarks/CoyoteBench/lpbench - 20%
MultiSource/Applications/d/make_dparser - 16%

llvm-svn: 158296
2012-06-10 19:32:29 +00:00
Hal Finkel 2edfbddcf0 Improve ext/trunc patterns on PPC64.
The PPC64 backend had patterns for i32 <-> i64 extensions and truncations that
would leave self-moves in the final assembly. Replacing those patterns with ones
based on the SUBREG builtins yields better-looking code.

Thanks to Jakob and Owen for their suggestions in this matter.

llvm-svn: 158283
2012-06-09 22:10:19 +00:00
Craig Topper 3352ba55b9 Replace XOP vpcom intrinsics with fewer intrinsics that take the immediate as an argument.
llvm-svn: 158278
2012-06-09 16:46:13 +00:00
Hal Finkel eb50c2d4a4 Enable tail merging on PPC.
Tail merging had been disabled on PPC because it would disturb bundling decisions
made during pre-RA scheduling on the 970 cores. Now, however, all bundling decisions
are made during post-RA scheduling, and tail merging is generally beneficial (the
average test-suite speedup is insignificantly positive).

Largest test-suite speedups:
MultiSource/Benchmarks/mediabench/gsm/toast/toast - 30%
MultiSource/Benchmarks/BitBench/uuencode/uuencode - 23%
SingleSource/Benchmarks/Shootout-C++/ary - 21%
SingleSource/Benchmarks/Stanford/Queens - 17%

Largest slowdowns:
MultiSource/Benchmarks/MiBench/security-sha/security-sha - 24%
MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 22%
MultiSource/Applications/JM/ldecod/ldecod - 14%
MultiSource/Benchmarks/mediabench/g721/g721encode/encode - 9%

This is improved by using full (instead of just critical) anti-dependency breaking,
but doing so still causes miscompiles and so cannot yet be enabled by default.

llvm-svn: 158259
2012-06-09 03:14:50 +00:00
Jakob Stoklund Olesen 33a1b416ac Don't run RAFast in the optimizing regalloc pipeline.
The fast register allocator is not supposed to work in the optimizing
pipeline. It doesn't make sense to compute live intervals, run full copy
coalescing, and then run RAFast.

Fast register allocation in the optimizing pipeline is better done by
RABasic.

llvm-svn: 158242
2012-06-08 23:15:12 +00:00
Hal Finkel c6b5debb40 Enable PPC CTR loop formation by default.
Thanks to Jakob's help, this now causes no new test suite failures!

Over the entire test suite, this gives an average 1% speedup. The largest speedups are:
SingleSource/Benchmarks/Misc/pi - 108%
SingleSource/Benchmarks/CoyoteBench/lpbench - 54%
MultiSource/Benchmarks/Prolangs-C/unix-smail/unix-smail - 50%
SingleSource/Benchmarks/Shootout/ary3 - 32%
SingleSource/Benchmarks/Shootout-C++/matrix - 30%

The largest slowdowns are:
MultiSource/Benchmarks/mediabench/gsm/toast/toast - -30%
MultiSource/Benchmarks/Prolangs-C/bison/mybison - -25%
MultiSource/Benchmarks/BitBench/uuencode/uuencode - -22%
MultiSource/Applications/d/make_dparser - -14%
SingleSource/Benchmarks/Shootout-C++/ary - -13%

In light of these slowdowns, additional profiling work is obviously needed!

llvm-svn: 158223
2012-06-08 19:19:53 +00:00
Manman Ren bf86b295bb Test case for r158160
llvm-svn: 158218
2012-06-08 18:42:37 +00:00
Chad Rosier 3d464d8068 Fix a crash in APInt::lshr when shiftAmt > BitWidth.
Patch by James Benton <jbenton@vmware.com>.

llvm-svn: 158213
2012-06-08 18:04:52 +00:00
NAKAMURA Takumi 5412cef77d test/CodeGen/Generic/APIntLoadStore.ll: Mark as XFAIL:ppc since r157911.
llvm-svn: 158209
2012-06-08 16:28:06 +00:00
Hal Finkel 821e00121c Disable the PPC CTR-Loops pass by default.
The pass itself works well, but the something in the Machine* infrastructure
does not understand terminators which define registers. Without the ability
to use the block-placement pass, etc. this causes performance regressions (and
so is turned off by default). Turning off the analysis turns off the problems
with the Machine* infrastructure.

llvm-svn: 158206
2012-06-08 15:38:25 +00:00
Hal Finkel 8b01503ee5 Fix a bug in the new PPC CTR-Loops pass.
The code which tests for an induction operation cannot assume that any
ADDI instruction will have a register operand because the operand could
also be a frame index; for example:
    %vreg16<def> = ADDI8 <fi#0>, 0; G8RC:%vreg16

llvm-svn: 158205
2012-06-08 15:38:23 +00:00
Hal Finkel 96c2d4d945 Add the PPCCTRLoops pass: a PPC machine-code-level optimization pass to form CTR-based loop branching code.
This pass is derived from the Hexagon HardwareLoops pass. The only significant enhancement over the Hexagon
pass is that PPCCTRLoops will also attempt to delete the replaced add and compare operations if they are
no longer otherwise used. Also, invalid preheader DebugLoc is not used.

llvm-svn: 158204
2012-06-08 15:38:21 +00:00
Manman Ren 2cdc8afccf X86: optimize generated code for integer ABS
This patch will generate the following for integer ABS:
      movl    %edi, %eax
      negl    %eax
      cmovll  %edi, %eax
INSTEAD OF
      movl    %edi, %ecx
      sarl    $31, %ecx
      leal    (%rdi,%rcx), %eax
      xorl    %ecx, %eax

There exists a target-independent DAG combine for integer ABS, which converts
integer ABS to sar+add+xor. For X86, we match this pattern back to neg+cmov. 
This is implemented in PerformXorCombine.

rdar://10695237

llvm-svn: 158175
2012-06-07 22:39:10 +00:00
Rafael Espindola 55d1145bd5 Use a base register instead of an index register with the local dynamic model.
Fixes pr13048.

llvm-svn: 158158
2012-06-07 18:39:19 +00:00
Manman Ren ae02c5a93e X86: replace SUB with CMP if possible
This patch will optimize the following
    movq    %rdi, %rax
    subq    %rsi, %rax
    cmovsq  %rsi, %rdi
    movq    %rdi, %rax
to
    cmpq    %rsi, %rdi
    cmovsq  %rsi, %rdi
    movq    %rdi, %rax

Perform this optimization if the actual result of SUB is not used.

rdar: 11540023
llvm-svn: 158126
2012-06-07 00:42:47 +00:00
Manman Ren 9c9641812c Revert r157755.
The commit is intended to fix rdar://11540023.
It is implemented as part of peephole optimization. We can actually implement
this in the SelectionDAG lowering phase.

llvm-svn: 158122
2012-06-06 23:53:03 +00:00
Chad Rosier 5d6f01ad77 Add support for dynamic stack realignment in the presence of dynamic allocas on
X86.
rdar://11496434

llvm-svn: 158087
2012-06-06 17:37:40 +00:00
Joel Jones 7f2ac7a2c8 Revert commit r157966
llvm-svn: 157972
2012-06-05 00:47:21 +00:00
Joel Jones d08534f82e This change handles a another case for generating the bic instruction
when a compile time constant is known.  This occurs when implicitly zero 
extending function arguments from 16 bits to 32 bits.

<rdar://problem/11481151>

llvm-svn: 157966
2012-06-04 23:38:57 +00:00
Akira Hatanaka 3ee0405231 Add a test case for mips64 unaligned load/store instructions.
llvm-svn: 157939
2012-06-04 17:57:06 +00:00
Akira Hatanaka b964932e70 Rename test/CodeGen/Mips/load-shift-left-right.ll.
llvm-svn: 157938
2012-06-04 17:50:36 +00:00
Roman Divacky e3f15c98d1 Implement local-exec TLS on PowerPC.
llvm-svn: 157935
2012-06-04 17:36:38 +00:00
Nadav Rotem b7bb72e4f3 Remove the "-promote-elements" flag. This flag is now enabled by default.
llvm-svn: 157925
2012-06-04 11:27:21 +00:00
Hal Finkel 595817eebe Enable generating PPC pre-increment (r+imm) instructions by default.
It seems that this no longer causes test suite failures on PPC64 (after r157159),
and often gives a performance benefit, so it can be enabled by default.

llvm-svn: 157911
2012-06-04 02:21:00 +00:00
Craig Topper 79dbb0c6e4 Rename FMA3 feature flag to just FMA to match gcc so it can be added to clang.
llvm-svn: 157903
2012-06-03 18:58:46 +00:00
Craig Topper fd53b80219 Rename fma4 intrinsics to just fma since they are now used for both FMA4 and FMA3. Autoupgrade support coming in a separate commit.
llvm-svn: 157898
2012-06-03 07:26:46 +00:00
Manman Ren 5097e4f38a Revert r157831
llvm-svn: 157896
2012-06-03 03:14:24 +00:00
Craig Topper 29eafea292 Use sse_load_f32/64 for scalar FMA3 intrinsic patterns instead of 128-bit loads to match instruction behavior.
llvm-svn: 157895
2012-06-03 01:40:43 +00:00
Manman Ren be10421c17 ARM: add testing case for struct byval
rdar://9877866

llvm-svn: 157876
2012-06-02 05:37:44 +00:00
Akira Hatanaka 27512b167b Add another test case which tests Mips' unaligned load/store instructions.
llvm-svn: 157874
2012-06-02 01:13:10 +00:00
Akira Hatanaka 63c0e2c58c Fix test cases in test/CodeGen/Mips.
llvm-svn: 157868
2012-06-02 00:05:45 +00:00
Manman Ren 879ca9d47d X86: peephole optimization to remove cmp instruction
This patch will optimize the following:
  sub r1, r3
  cmp r3, r1 or cmp r1, r3
  bge L1
TO
  sub r1, r3
  bge L1 or ble L1

If the branch instruction can use flag from "sub", then we can eliminate
the "cmp" instruction.

llvm-svn: 157831
2012-06-01 19:49:33 +00:00
Chris Lattner b1359894f3 testcase for PR13006, thanks to Duncan for filing it.
llvm-svn: 157824
2012-06-01 18:19:46 +00:00
Hans Wennborg 789acfb63d Implement the local-dynamic TLS model for x86 (PR3985)
This implements codegen support for accesses to thread-local variables
using the local-dynamic model, and adds a clean-up pass so that the base
address for the TLS block can be re-used between local-dynamic access on
an execution path.

llvm-svn: 157818
2012-06-01 16:27:21 +00:00
Craig Topper 00649d5111 Remove fadd(fmul) patterns for FMA3. This needs to be implemented by paying attention to FP_CONTRACT and matching @llvm.fma which is not available yet. This will allow us to enablle intrinsic use at least though.
llvm-svn: 157804
2012-06-01 06:07:48 +00:00
Chris Lattner 466076b95f enhance the logic for looking through tailcalls to look through transparent casts
in multiple-return value scenarios, like what happens on X86-64 when returning
small structs.

llvm-svn: 157800
2012-06-01 05:29:15 +00:00
Chris Lattner 182fe3eef1 enhance getNoopInput to know about vector<->vector bitcasts of legal
types, as well as int<->ptr casts.  This allows us to tailcall functions
with some trivial casts between the call and return (i.e. because the
return types disagree).

llvm-svn: 157798
2012-06-01 05:16:33 +00:00
Chris Lattner 22afea7689 add some simple 64-bit tail call tests.
llvm-svn: 157797
2012-06-01 05:03:31 +00:00
Chris Lattner 21b1e6bbdc merge some tests.
llvm-svn: 157795
2012-06-01 05:00:54 +00:00
Chris Lattner d82ae12d8c rename test
llvm-svn: 157794
2012-06-01 04:58:50 +00:00
Owen Anderson ff458f89aa Make this testcase independent of register allocation.
llvm-svn: 157761
2012-05-31 18:07:02 +00:00
Manman Ren 9bccb64e56 X86: replace SUB with CMP if possible
This patch will optimize the following
        movq    %rdi, %rax
        subq    %rsi, %rax
        cmovsq  %rsi, %rdi
        movq    %rdi, %rax
to
        cmpq    %rsi, %rdi
        cmovsq  %rsi, %rdi
        movq    %rdi, %rax

Perform this optimization if the actual result of SUB is not used.

rdar: 11540023
llvm-svn: 157755
2012-05-31 17:20:29 +00:00
Elena Demikhovsky 602f3a26d6 Added FMA3 Intel instructions.
I disabled FMA3 autodetection, since the result may differ from expected for some benchmarks.
I added tests for GodeGen and intrinsics.
I did not change llvm.fma.f32/64 - it may be done later.

llvm-svn: 157737
2012-05-31 09:20:20 +00:00
Craig Topper c1ac05dad5 Add intrinsic for pclmulqdq instruction.
llvm-svn: 157731
2012-05-31 04:37:40 +00:00
Jakob Stoklund Olesen 05e2245fc6 Prioritize smaller register classes for urgent evictions.
It helps compile exotic inline asm. In the test case, normal GR32
virtual registers use up eax-edx so the final GR32_ABCD live range has
no registers left. Since all the live ranges were tiny, we had no way of
prioritizing the smaller register class.

This patch allows tiny unspillable live ranges to be evicted by tiny
unspillable live ranges from a smaller register class.

<rdar://problem/11542429>

llvm-svn: 157715
2012-05-30 21:46:58 +00:00
Eric Christopher f481ab3877 Add support for the mips inline asm 'm' output modifier.
Patch by Jack Carter.

llvm-svn: 157709
2012-05-30 19:05:19 +00:00
Owen Anderson 0eda3e1de6 Switch the canonical FMA term operand order to match both the comment I wrote and the usual LLVM convention.
llvm-svn: 157708
2012-05-30 18:54:50 +00:00
Owen Anderson c7aaf523e1 Teach DAGCombine to canonicalize the position of a constant in the term operands of an FMA node.
llvm-svn: 157707
2012-05-30 18:50:39 +00:00
Chris Lattner 1622a99e58 it's pointed out that R11 can be used for magic things, and doing things just for 64-bit registers is silly. Just optimize 3 more.
llvm-svn: 157699
2012-05-30 18:08:02 +00:00
Chris Lattner 04d722a68d Extend the (abi-irrelevant) return convention to be able to return more than two values in
integer registers.  This is already supported by the fastcc convention, but it doesn't
hurt to support it in the standard conventions as well.

In cases where we can cheat at the calling convention, this allows us to avoid returning
things through memory in more cases.

llvm-svn: 157698
2012-05-30 17:50:14 +00:00
Chad Rosier 820d248c4d [arm-fast-isel] Add support for the llvm.frameaddress() intrinsic.
Patch by Jush Lu <jush.msn@gmail.com>.

llvm-svn: 157696
2012-05-30 17:23:22 +00:00
Evan Cheng bc2453dd3d Teach taildup to update livein set. rdar://11538365
llvm-svn: 157663
2012-05-30 00:42:39 +00:00
Bob Wilson 33e5188c27 Add an insertPass API to TargetPassConfig. <rdar://problem/11498613>
Besides adding the new insertPass function, this patch uses it to
enhance the existing -print-machineinstrs so that the MachineInstrs
after a specific pass can be printed.

Patch by Bin Zeng!

llvm-svn: 157655
2012-05-30 00:17:12 +00:00
Benjamin Kramer ef479ea854 Add intrinsics, code gen, assembler and disassembler support for the SSE4a extrq and insertq instructions.
This required light surgery on the assembler and disassembler
because the instructions use an uncommon encoding. They are
the only two instructions in x86 that use register operands
and two immediates.

llvm-svn: 157634
2012-05-29 19:05:25 +00:00
Peter Collingbourne 913869be45 Add llvm.fabs intrinsic.
llvm-svn: 157594
2012-05-28 21:48:37 +00:00
Chris Lattner f7f59b15aa These tests used intrinsics with the wrong prototype. They weren't caught because
the old verifier just checked that something "was a pointer", but not that the pointee
was correct.

llvm-svn: 157544
2012-05-27 19:35:41 +00:00
Benjamin Kramer f2beccf6b4 SelectionDAGBuilder: When emitting small compare chains for switches order them by using edge weights.
SimplifyCFG tends to form a lot of 2-3 case switches when merging branches. Move
the most likely condition to the front so it is checked first and the others can
be skipped. This is currently not as effective as it could be because SimplifyCFG
destroys profiling metadata when merging branches and switches. Merging branch
weight metadata is tricky though.

This code touches at most 3 cases so I didn't use a proper sorting algorithm.

llvm-svn: 157521
2012-05-26 20:01:32 +00:00
Justin Holewinski c98041d4d9 [NVPTX] Add a new test case for the newly-enabled call handling
NV_CONTRIB

llvm-svn: 157485
2012-05-25 17:20:38 +00:00
NAKAMURA Takumi 3eca973bf8 test/CodeGen/X86/bigstructret.ll: Suppress one test. It is msvc-incompatible. (compatible to mingw32 and netbsd, though)
llvm-svn: 157474
2012-05-25 15:40:54 +00:00
NAKAMURA Takumi 501dbd06ae test/CodeGen/X86/bigstructret.ll: Relax stack offsets for hosts of stack-align=8, eg. win32 and netbsd.
llvm-svn: 157471
2012-05-25 15:12:21 +00:00
Eli Friedman 315a0c79f3 Simplify code for calling a function where CanLowerReturn fails, fixing a small bug in the process.
llvm-svn: 157446
2012-05-25 00:09:29 +00:00
David Blaikie c575c80c3b Fix for CHECK-NOT misspelling.
Patch by Nicklas Bo Jensen.

llvm-svn: 157421
2012-05-24 22:08:29 +00:00
Justin Holewinski 907f7606f2 Remove the PTX back-end and all of its artifacts (triple, etc.)
This back-end was deprecated in favor of the NVPTX back-end.

NV_CONTRIB

llvm-svn: 157417
2012-05-24 21:38:21 +00:00
Akira Hatanaka a649cc75b3 Turn on mips16 pseudo op when compiling for mips16.
Expand test case for this.

Patch by Reed Kotler.

llvm-svn: 157410
2012-05-24 18:37:43 +00:00
Akira Hatanaka df98a7a34d Enable Mips16 compiler to compile a null program.
First code from the Mips16 compiler. Includes trivial test program.

Patch by Reed Kotler.

llvm-svn: 157408
2012-05-24 18:32:33 +00:00
Jakob Stoklund Olesen 41ebcda8f4 Add a test case for global live range splitting.
llvm-svn: 157357
2012-05-23 23:42:23 +00:00
Jakob Stoklund Olesen 0ce90494e6 Add a last resort tryInstructionSplit() to RAGreedy.
Live ranges with a constrained register class may benefit from splitting
around individual uses. It allows the remaining live range to use a
larger register class where it may allocate. This is like spilling to a
different register class.

This is only attempted on constrained register classes.

<rdar://problem/11438902>

llvm-svn: 157354
2012-05-23 22:37:27 +00:00
Jakob Stoklund Olesen 5b8f476037 Correctly deal with identity copies in RegisterCoalescer.
Now that the coalescer keeps live intervals and machine code in sync at
all times, it needs to deal with identity copies differently.

When merging two virtual registers, all identity copies are removed
right away. This means that other identity copies must come from
somewhere else, and they are going to have a value number.

Deal with such copies by merging the value numbers before erasing the
copy instruction. Otherwise, we leave dangling value numbers in the live
interval.

This fixes PR12927.

llvm-svn: 157340
2012-05-23 20:21:06 +00:00
Chad Rosier 223faf719c [arm-fast-isel] Add support for non-global callee.
Patch by Jush Lu <jush.msn@gmail.com>.

llvm-svn: 157336
2012-05-23 18:38:57 +00:00
Nuno Lopes ad40c0a425 revert my previous patches that introduced an additional parameter to the objectsize intrinsic.
After a lot of discussion, we realized it's not the best option for run-time bounds checking

llvm-svn: 157255
2012-05-22 15:25:31 +00:00
Jakob Stoklund Olesen 924279ca0e Only erase virtregs with no uses left.
Also make sure registers aren't erased twice if the dead def mentions
the register twice.

This fixes PR12911.

llvm-svn: 157254
2012-05-22 14:52:12 +00:00
Jim Grosbach da04fa0d02 FileCheck'ize test, and add a bit to test for r157221.
llvm-svn: 157222
2012-05-21 23:50:00 +00:00
Craig Topper e88f2fd4f7 Allow 256-bit shuffles to still be split even if only half of the shuffle comes from two 128-bit pieces.
llvm-svn: 157175
2012-05-21 06:40:16 +00:00
Peter Collingbourne 8eb05fd093 When legalising shifts, do not pre-build a list of operands which
may be RAUW'd by the recursive call to LegalizeOps; instead, retrieve
the other operands when calling UpdateNodeOperands.  Fixes PR12889.

llvm-svn: 157162
2012-05-20 18:36:15 +00:00
Hal Finkel 601f555eee Add a missing PPC 64-bit stwu pattern.
This seems to fix the remaining compile-time failures on PPC64 when
compiling with -enable-ppc-preinc.

llvm-svn: 157159
2012-05-20 17:11:24 +00:00
Jakob Stoklund Olesen 691ae3388f Use the right register class for LDRrs.
llvm-svn: 157152
2012-05-20 06:38:47 +00:00
Jakob Stoklund Olesen 4fd0e4f415 Transfer memory operands to the right instruction.
They need to go on the PICLDR as the verifier points out.

llvm-svn: 157151
2012-05-20 06:38:42 +00:00
Jakob Stoklund Olesen 1f1c6add10 Properly constrain register classes for sub-registers.
Not all GR64 registers have sub_8bit sub-registers.

llvm-svn: 157150
2012-05-20 06:38:37 +00:00
Jakob Stoklund Olesen a103a516c6 Properly constrain register classes in 2-addr.
X86 has 2-addr instructions with different constraints on the tied def
and use operands. One is GR32, one is GR32_NOSP.

llvm-svn: 157149
2012-05-20 06:38:32 +00:00
Jakob Stoklund Olesen a34a69ce0c Fix 12892.
Dead code elimination during coalescing could cause a virtual register
to be split into connected components. The following rewriting would be
confused about the already joined copies present in the code, but
without a corresponding value number in the live range.

Erase all joined copies instantly when joining intervals such that the
MI and LiveInterval representations are always in sync.

llvm-svn: 157135
2012-05-19 23:34:59 +00:00
Jakob Stoklund Olesen 25ced18407 Erase joined copies immediately.
The late dead code elimination is no longer necessary.

The test changes are cause by a register hint that can be either %rdi or
%rax. The choice depends on the use list order, which this patch changes.

llvm-svn: 157131
2012-05-19 20:54:07 +00:00
Nadav Rotem c93e91da27 On Haswell, perfer storing YMM registers using a single instruction.
llvm-svn: 157129
2012-05-19 20:30:08 +00:00
Nadav Rotem 900c7cb7ce Add support for additional in-reg vbroadcast patterns
llvm-svn: 157127
2012-05-19 19:57:37 +00:00
Eric Christopher bc5d24999c Add support for the 'd' mips inline asm output modifier.
Patch by Jack Carter.

llvm-svn: 157093
2012-05-19 00:51:56 +00:00
Jim Grosbach 4b63d2ae1d Refactor data-in-code annotations.
Use a dedicated MachO load command to annotate data-in-code regions.
This is the same format the linker produces for final executable images,
allowing consistency of representation and use of introspection tools
for both object and executable files.

Data-in-code regions are annotated via ".data_region"/".end_data_region"
directive pairs, with an optional region type.

data_region_directive := ".data_region" { region_type }
region_type := "jt8" | "jt16" | "jt32" | "jta32"
end_data_region_directive := ".end_data_region"

The previous handling of ARM-style "$d.*" labels was broken and has
been removed. Specifically, it didn't handle ARM vs. Thumb mode when
marking the end of the section.

rdar://11459456

llvm-svn: 157062
2012-05-18 19:12:01 +00:00
Eric Christopher 9ca26cfb5f Add support for the mips 'x' inline asm modifier.
Patch by Jack Carter.

llvm-svn: 157057
2012-05-18 17:39:35 +00:00
Craig Topper 92db928ee9 Simplify handling of v16i8 shuffles and fix a missed optimization.
llvm-svn: 157043
2012-05-18 06:42:06 +00:00
Evan Cheng 22d405f57b Teach two-address pass to update the "source" map so it doesn't perform a
non-profitable commute using outdated info. The test case would still fail
because of poor pre-RA schedule. That will be fixed by MI scheduler.

rdar://11472010

llvm-svn: 157038
2012-05-18 01:33:51 +00:00
Jakob Stoklund Olesen 874e401382 Remove a test that was only testing for physreg joining.
This is the same as the other tests: Clever tricks are required to make
the arguments and return value line up in a single-instruction function.
It rarely happens in real life.

We have plenty other examples of this behavior.

llvm-svn: 157030
2012-05-18 00:07:14 +00:00
Jakob Stoklund Olesen 589c6eb95c Remove -join-physregs from the test suite.
This option has been disabled for a while, and it is going away so I can
clean up the coalescer code.

The tests that required physreg joining to be enabled were almost all of
the form "tiny function with interference between arguments and return
value". Such functions are usually inlined in the real world.

The problem exposed by phys_subreg_coalesce-3.ll is real, but fairly
rare.

llvm-svn: 157027
2012-05-17 23:44:19 +00:00
Tim Northover af501a29d3 Remove incorrect pattern for ARM SMML instruction.
Patch by Meador Inge.

llvm-svn: 156989
2012-05-17 13:12:13 +00:00
Evan Cheng 58a95f0c8a Avoid creating a cycle when folding load / op with flag / store. PR11451474. rdar://11451474
llvm-svn: 156896
2012-05-16 01:54:27 +00:00
Jakob Stoklund Olesen 984997b3a0 Enable sub-sub-register copy coalescing.
It is now possible to coalesce weird skewed sub-register copies by
picking a super-register class larger than both original registers. The
included test case produces code like this:

  vld2.32 {d16, d17, d18, d19}, [r0]!
  vst2.32 {d18, d19, d20, d21}, [r0]

We still perform interference checking as if it were a normal full copy
join, so this is still quite conservative. In particular, the f1 and f2
functions in the included test case still have remaining copies because
of false interference.

llvm-svn: 156878
2012-05-15 23:31:35 +00:00
Sirish Pande 91856a1f15 Enable all Hexagon tests.
llvm-svn: 156824
2012-05-15 16:13:12 +00:00
Jakob Stoklund Olesen dc2e0cd44a Fix PR12821.
RAFast must add an <imp-def> operand when it is rewriting a sub-register
def that isn't a read-modify-write.

llvm-svn: 156777
2012-05-14 21:10:25 +00:00
Brendon Cahoon f6b687e5d1 Revert 156634 upon request until code improvement changes are made.
llvm-svn: 156775
2012-05-14 19:35:42 +00:00
Dan Gohman 164fe18cfe Rename @llvm.debugger to @llvm.debugtrap.
llvm-svn: 156774
2012-05-14 18:58:10 +00:00
Sirish Pande 4bd20c50eb Support for Hexagon feature, New Value Jump.
llvm-svn: 156698
2012-05-12 05:10:30 +00:00
Akira Hatanaka 763ab85690 Fix test cases.
llvm-svn: 156697
2012-05-12 03:25:16 +00:00
Akira Hatanaka 8f3573034b Make the following changes in MipsAsmPrinter.cpp:
- Remove code which lowers pseudo SETGP01.
- Fix LowerSETGP01. The first two of the three instructions that are emitted to
  initialize the global pointer register now use register $2.
- Stop emitting .cpload directive.

llvm-svn: 156689
2012-05-12 00:48:43 +00:00
Akira Hatanaka d918f77ba3 Insert instructions to the entry basic block which initializes the global
pointer register. 


This is the first of the series of patches which clean up the way global pointer
register is used. The patches will make the following improvements:

- Make $gp an allocatable temporary register rather than reserving it.
- Use a virtual register as the global pointer register and let the register
  allocator decide which register to assign to it or whether spill/reloads are
  needed.
- Make sure $gp is valid at the entry of a called function, which is necessary
  for functions using lazy binding.
- Remove the need for emitting .cprestore and .cpload directives.

llvm-svn: 156671
2012-05-12 00:17:17 +00:00
Akira Hatanaka 0661b81bca Do not replace operands of pseudo instructions with register $zero.
llvm-svn: 156663
2012-05-11 23:22:18 +00:00
Akira Hatanaka 5d60c36f37 Use regular expression to match register names.
llvm-svn: 156656
2012-05-11 23:00:40 +00:00
Chad Rosier aa9cb9df59 [fast-isel] Add support for selecting @llvm.trap().
llvm-svn: 156646
2012-05-11 21:33:49 +00:00
Brendon Cahoon 31f8723ef3 Hexagon constant extender support.
Patch by Jyotsna Verma.

llvm-svn: 156634
2012-05-11 19:56:59 +00:00
Chad Rosier 3268692aa8 [fast-isel] Remove -disable-arm-fast-isel option. -fast-isel=0 suffices. Minor cleanup.
llvm-svn: 156632
2012-05-11 19:40:25 +00:00
Chad Rosier 90f9afe659 [fast-isel] Cleaner fix for when we're unable to handle a non-double multi-reg
retval.  Hoists check before emitting the call to avoid unnecessary work.
rdar://11430407
PR12796

llvm-svn: 156628
2012-05-11 18:51:55 +00:00
Hans Wennborg addad7388d Fix test/CodeGen/X86/tls-pie.ll.
llvm-svn: 156612
2012-05-11 10:19:54 +00:00
Hans Wennborg f9d0e44b82 Implement initial-exec TLS model for 32-bit PIC x86
This fixes a TODO from 2007 :) Previously, LLVM would emit the wrong
code here (see the update to test/CodeGen/X86/tls-pie.ll).

llvm-svn: 156611
2012-05-11 10:11:01 +00:00
Manman Ren dc8ad0058f ARM: peephole optimization to remove cmp instruction
This patch will optimize the following cases:
  sub r1, r3 | sub r1, imm
  cmp r3, r1 or cmp r1, r3 | cmp r1, imm
  bge L1

TO
  subs r1, r3
  bge  L1 or ble L1

If the branch instruction can use flag from "sub", then we can replace
"sub" with "subs" and eliminate the "cmp" instruction.

rdar: 10734411
llvm-svn: 156599
2012-05-11 01:30:47 +00:00
Dan Gohman dfab443ae8 Define a new intrinsic, @llvm.debugger. It will be similar to __builtin_trap(),
but it generates int3 on x86 instead of ud2.

llvm-svn: 156593
2012-05-11 00:19:32 +00:00
Eric Christopher ed51b9ec0b Add support for the 'X' inline asm operand modifier.
Patch by Jack Carter.

llvm-svn: 156577
2012-05-10 21:48:22 +00:00
Sirish Pande 69295b8963 Hexagon V5 FP Support.
llvm-svn: 156568
2012-05-10 20:20:25 +00:00
Manman Ren b555b382bd Revert: 156550 "ARM: peephole optimization to remove cmp instruction"
This commit broke an external linux bot and gave a compile-time warning.

llvm-svn: 156556
2012-05-10 18:49:43 +00:00
Manman Ren c860887b2d ARM: peephole optimization to remove cmp instruction
This patch will optimize the following cases:
  sub r1, r3 | sub r1, imm
  cmp r3, r1 or cmp r1, r3 | cmp r1, imm
  bge L1

TO
  subs r1, r3
  bge  L1 or ble L1

If the branch instruction can use flag from "sub", then we can replace
"sub" with "subs" and eliminate the "cmp" instruction.

rdar: 10734411
llvm-svn: 156550
2012-05-10 16:48:21 +00:00
Nadav Rotem 15946e50c1 AVX2: Add an additional broadcast idiom.
llvm-svn: 156540
2012-05-10 12:39:13 +00:00
Nadav Rotem b86a3fb8d0 Generate AVX/AVX2 shuffles even when there is a memory op somewhere else in the program.
Starting r155461 we are able to select patterns for vbroadcast even when the load op is used by other users.

Fix PR11900.

llvm-svn: 156539
2012-05-10 12:22:05 +00:00
Danil Malyshev 47aba39004 Added a regress test for the bug #9964 before close it.
This bug was fixed by Jim Grosbach in #138879, thanks Jim!

llvm-svn: 156505
2012-05-09 19:07:04 +00:00
Nuno Lopes 01547b3ad2 change the objectsize intrinsic signature: add a 3rd parameter to denote the maximum runtime performance penalty that the user is willing to accept.
This commit only adds the parameter. Code taking advantage of it will follow.

llvm-svn: 156473
2012-05-09 15:52:43 +00:00
Akira Hatanaka ca41d13bbd Add another peephole pattern for conditional moves.
llvm-svn: 156460
2012-05-09 02:29:29 +00:00
Akira Hatanaka 05b9dad1e6 Make register FP allocatable if the compiled function does not have dynamic
allocas.

llvm-svn: 156458
2012-05-09 01:38:13 +00:00
Akira Hatanaka 0a8ab718cb Expand 64-bit shifts if target ABI is O32.
llvm-svn: 156457
2012-05-09 00:55:21 +00:00
Craig Topper 7daf897678 Remove 256-bit AVX non-temporal store intrinsics. Similar was previously done for 128-bit.
llvm-svn: 156375
2012-05-08 06:58:15 +00:00
Owen Anderson ab63d84252 Teach DAG combine to fold x-x to 0.0 when unsafe FP math is enabled.
llvm-svn: 156324
2012-05-07 20:51:25 +00:00
Chad Rosier d8287fec17 Fix a regression from r147481. This combine should only happen if there is a
single use.
rdar://11360370

llvm-svn: 156316
2012-05-07 18:47:44 +00:00
Manman Ren ef4e0479ec X86: optimization for -(x != 0)
This patch will optimize -(x != 0) on X86
FROM 
cmpl	$0x01,%edi
sbbl	%eax,%eax
notl	%eax
TO
negl %edi
sbbl %eax %eax

In order to generate negl, I added patterns in Target/X86/X86InstrCompiler.td:
def : Pat<(X86sub_flag 0, GR32:$src), (NEG32r GR32:$src)>;

rdar: 10961709
llvm-svn: 156312
2012-05-07 18:06:23 +00:00
Eric Christopher 9c492e6ebf Add support for the 'l' constraint.
Patch by Jack Carter.

llvm-svn: 156294
2012-05-07 06:25:15 +00:00
Eric Christopher e3c494de82 Add support for the 'c' constraint.
Patch by Jack Carter.

llvm-svn: 156293
2012-05-07 06:25:10 +00:00
Eric Christopher c18ae4a3b1 Add support for the 'P' constraint.
Patch by Jack Carter.

llvm-svn: 156292
2012-05-07 06:25:02 +00:00
Eric Christopher 470578a91b Add support for the 'O' constraint.
Patch by Jack Carter.

llvm-svn: 156285
2012-05-07 05:46:48 +00:00
Eric Christopher e07aa430b8 Add support for the 'N' inline asm constraint.
Patch by Jack Carter.

llvm-svn: 156284
2012-05-07 05:46:43 +00:00
Eric Christopher 1109b3406d Add support for the 'L' inline asm constraint.
Patch by Jack Carter.

llvm-svn: 156283
2012-05-07 05:46:37 +00:00
Eric Christopher 3ff88a05b7 Add support for the inline asm constraint 'K'.
llvm-svn: 156282
2012-05-07 05:46:29 +00:00
Craig Topper d4e1894ec1 Add SSE4A MOVNTSS/MOVNTSD instructions.
llvm-svn: 156281
2012-05-07 05:36:19 +00:00
Eric Christopher 7201e1b4b9 Support the 'J' constraint.
Patch by Jack Carter.

llvm-svn: 156280
2012-05-07 03:13:42 +00:00
Eric Christopher 1d6c89eea1 Add support for the 'I' inline asm constraint. Also add tests
from the previous 2 patches.

Patch by Jack Carter.

llvm-svn: 156279
2012-05-07 03:13:32 +00:00
Benjamin Kramer 3d38c17b59 Switch the select to branch transformation on by default.
The primitive conservative heuristic seems to give a slight overall
improvement while not regressing stuff. Make it available to wider
testing. If you notice any speed regressions (or significant code
size regressions) let me know!

llvm-svn: 156258
2012-05-06 14:25:16 +00:00
Benjamin Kramer 047d7ca0b1 CodeGenPrepare: Add a transform to turn selects into branches in some cases.
This came up when a change in block placement formed a cmov and slowed down a
hot loop by 50%:

	ucomisd	(%rdi), %xmm0
	cmovbel	%edx, %esi

cmov is a really bad choice in this context because it doesn't get branch
prediction. If we emit it as a branch, an out-of-order CPU can do a better job
(if the branch is predicted right) and avoid waiting for the slow load+compare
instruction to finish. Of course it won't help if the branch is unpredictable,
but those are really rare in practice.

This patch uses a dumb conservative heuristic, it turns all cmovs that have one
use and a direct memory operand into branches. cmovs usually save some code
size, so we disable the transform in -Os mode. In-Order architectures are
unlikely to benefit as well, those are included in the
"predictableSelectIsExpensive" flag.

It would be better to reuse branch probability info here, but BPI doesn't
support select instructions currently. It would make sense to use the same
heuristics as the if-converter pass, which does the opposite direction of this
transform.


Test suite shows a small improvement here and there on corei7-level machines,
but the actual results depend a lot on the used microarchitecture. The
transformation is currently disabled by default and available by passing the
-enable-cgp-select2branch flag to the code generator.

Thanks to Chandler for the initial test case to him and Evan Cheng for providing
me with comments and test-suite numbers that were more stable than mine :)

llvm-svn: 156234
2012-05-05 12:49:22 +00:00
Justin Holewinski ae556d3ef7 This patch adds a new NVPTX back-end to LLVM which supports code generation for NVIDIA PTX 3.0. This back-end will (eventually) replace the current PTX back-end, while maintaining compatibility with it.
The new target machines are:

nvptx (old ptx32) => 32-bit PTX
nvptx64 (old ptx64) => 64-bit PTX

The sources are based on the internal NVIDIA NVPTX back-end, and
contain more functionality than the current PTX back-end currently
provides.

NV_CONTRIB

llvm-svn: 156196
2012-05-04 20:18:50 +00:00
Sebastian Pop 2420e8b7d5 Added missing CMN case in Thumb2SizeReduction pass so that LLVM emits 16-bits encoding of CMN instructions.
llvm-svn: 156195
2012-05-04 19:53:56 +00:00
Craig Topper 42f2182366 Allow v16i16 and v32i8 shuffles to be rewritten as narrower shuffles.
llvm-svn: 156156
2012-05-04 04:44:49 +00:00
Sirish Pande f8e5e3c072 Support for target dependent Hexagon VLIW packetizer.
This patch creates and optimizes packets as per Hexagon ISA rules.

llvm-svn: 156109
2012-05-03 21:52:53 +00:00
Craig Topper 315a5cc789 Fix 256-bit vpshuflw and vpshufhw immediate encoding to handle undefs in the lower half correctly. Missed in r155982.
llvm-svn: 156059
2012-05-03 07:12:59 +00:00
Evan Cheng b64e7b778b Fix two-address pass's aggressive instruction commuting heuristics. It's meant
to catch cases like:
 %reg1024<def> = MOV r1
 %reg1025<def> = MOV r0
 %reg1026<def> = ADD %reg1024, %reg1025
 r0            = MOV %reg1026

By commuting ADD, it let coalescer eliminate all of the copies. However, there
was a bug in the heuristics where it ended up commuting the ADD in:

 %reg1024<def> = MOV r0
 %reg1025<def> = MOV 0
 %reg1026<def> = ADD %reg1024, %reg1025
 r0            = MOV %reg1026

That did no benefit but rather ensure the last MOV would not be coalesced.

rdar://11355268

llvm-svn: 156048
2012-05-03 01:45:13 +00:00
Owen Anderson 41b0665b5b Teach DAGCombine the same multiply-by-1.0 folding trick when doing FMAs, just like it now knows for FMULs.
llvm-svn: 156029
2012-05-02 22:17:40 +00:00
Owen Anderson b5f167c660 Teach DAG combine that multiplication by 1.0 can always be constant folded.
llvm-svn: 156023
2012-05-02 21:32:35 +00:00
Manman Ren f02efc8731 Revert r155853
The commit is intended to fix rdar://10961709.
But it is the root cause of PR12720.
Revert it for now.

llvm-svn: 155992
2012-05-02 15:24:32 +00:00
Craig Topper c73bc39c22 Add support for selecting AVX2 vpshuflw and vpshufhw. Add decoding support for AsmPrinter.
llvm-svn: 155982
2012-05-02 08:03:44 +00:00
Bill Wendling b6b50c6638 Strip the pointer casts off of allocas so that the selection DAG can find them.
PR10799

llvm-svn: 155954
2012-05-01 22:50:45 +00:00
Manman Ren 425a55c1ce X86: optimization for max-like struct
This patch will optimize the following cases on X86
(a > b) ? (a-b) : 0
(a >= b) ? (a-b) : 0
(b < a) ? (a-b) : 0
(b <= a) ? (a-b) : 0

FROM
movl    %edi, %ecx
subl    %esi, %ecx
cmpl    %edi, %esi
movl    $0, %eax
cmovll  %ecx, %eax
TO
xorl    %eax, %eax
subl    %esi, %edi
cmovll  %eax, %edi
movl    %edi, %eax

rdar: 10734411
llvm-svn: 155919
2012-05-01 17:16:15 +00:00
Jay Foad 8fc810c2ef Regression test for PR2960.
llvm-svn: 155912
2012-05-01 11:11:34 +00:00
Manman Ren 4f4d5c8fc8 X86: optimization for -(x != 0)
This patch will optimize -(x != 0) on X86
FROM 
cmpl	$0x01,%edi
sbbl	%eax,%eax
notl	%eax
TO
negl %edi
sbbl %eax %eax

llvm-svn: 155853
2012-04-30 22:51:25 +00:00
Manman Ren 5b7e08c9d8 test/CodeGen/X86/select.ll: remove spaces
llvm-svn: 155840
2012-04-30 18:54:27 +00:00
Derek Schuff b051adf263 Fix fastcc structure return with fast-isel on x86-32
On x86-32, structure return via sret lets the callee pop the hidden
pointer argument off the stack, which the caller then re-pushes.
However if the calling convention is fastcc, then a register is used
instead, and the caller should not adjust the stack. This is
implemented with a check of IsTailCallConvention
X86TargetLowering::LowerCall but is now checked properly in
X86FastISel::DoSelectCall.

(this time, actually commit what was reviewed!)

llvm-svn: 155825
2012-04-30 16:57:15 +00:00
Bob Wilson 9245c93656 Don't introduce illegal types when creating vmull operations. <rdar://11324364>
ARM BUILD_VECTORs created after type legalization cannot use i8 or i16
operands, since those types are not legal.  Instead use i32 operands, which
will be implicitly truncated by the BUILD_VECTOR to match the element type.

llvm-svn: 155824
2012-04-30 16:53:34 +00:00
Andrew Trick 833f04962a Reapply 155668: Fix the SD scheduler to avoid gluing the same node twice.
This time, also fix the caller of AddGlue to properly handle
incomplete chains. AddGlue had failure modes, but shamefully hid them
from its caller. It's luck ran out.

Fixes rdar://11314175: BuildSchedUnits assert.

llvm-svn: 155749
2012-04-28 01:03:23 +00:00
Derek Schuff a99b168145 Revert r155745
llvm-svn: 155746
2012-04-27 23:37:41 +00:00
Derek Schuff bbf8b83e90 Fix fastcc structure return with fast-isel on x86-32
On x86-32, structure return via sret lets the callee pop the hidden
pointer argument off the stack, which the caller then re-pushes.
However if the calling convention is fastcc, then a register is used
instead, and the caller should not adjust the stack. This is
implemented with a check of IsTailCallConvention
X86TargetLowering::LowerCall but is now checked properly in
X86FastISel::DoSelectCall.

llvm-svn: 155745
2012-04-27 23:27:17 +00:00
Andrew Trick 7a773ec053 Temporarily revert r155668: Fix the SD scheduler to avoid gluing.
This definitely caused regression with ARM -mno-thumb.

llvm-svn: 155743
2012-04-27 22:55:59 +00:00