Commit Graph

17126 Commits

Author SHA1 Message Date
Roman Divacky c9e23d93ae This patch corrects logic in PPCFrameLowering for save and restore of
nonvolatile condition register fields across calls under the SVR4 ABIs.                                            
                                                                                                                   
 * With the 64-bit ABI, the save location is at a fixed offset of 8 from                                           
the stack pointer.  The frame pointer cannot be used to access this                                                
portion of the stack frame since the distance from the frame pointer may                                           
change with alloca calls.                                                                                          
                                                                                                                   
 * With the 32-bit ABI, the save location is just below the general
register save area, and is accessed via the frame pointer like the rest
of the save areas.  This is an optional slot, so it must only be created                                           
if any of CR2, CR3, and CR4 were modified.                                                                      
                                                                                                                   
 * For both ABIs, save/restore logic is generated only if one of the     
nonvolatile CR fields were modified.                                   

I also took this opportunity to clean up an extra FIXME in
PPCFrameLowering.h.  Save area offsets for 32-bit GPRs are meaningless
for the 64-bit ABI, so I removed them for correctness and efficiency.


Fixes PR13708 and partially also PR13623. It lets us enable exception handling
on PPC64.

Patch by William J. Schmidt!

llvm-svn: 163713
2012-09-12 14:47:47 +00:00
Kristof Beyls e6b876f4e5 Fix constant folding through bitcasts by no longer relying on undefined behaviour (converting NaN values between float and double).
SelectionDAG::getConstantFP(double Val, EVT VT, bool isTarget);
should not be used when Val is not a simple constant (as the comment in
SelectionDAG.h indicates). This patch avoids using this function
when folding an unknown constant through a bitcast, where it cannot be
guaranteed that Val will be a simple constant.

llvm-svn: 163703
2012-09-12 11:25:02 +00:00
Nadav Rotem 8ff00989fc Stack coloring: remove lifetime intervals which contain escaped allocas.
The input program may contain intructions which are not inside lifetime
markers. This can happen due to a bug in the compiler or due to a bug in
user code (for example, returning a reference to a local variable).
This commit adds checks that all of the instructions in the function and
invalidates lifetime ranges which do not contain all of the instructions.

llvm-svn: 163678
2012-09-12 04:57:37 +00:00
Eric Christopher 97c0fdd116 Add some support for dealing with an object pointer on arguments.
Part of rdar://9797999

llvm-svn: 163667
2012-09-12 00:26:55 +00:00
Chad Rosier 1778831a3d [ms-inline asm] Split the parsing of IR asm strings into GCC and MS variants.
Add support in the EmitMSInlineAsmStr() function for handling integer consts.

llvm-svn: 163645
2012-09-11 19:09:56 +00:00
Manman Ren 571d9e4b80 SimplifyCFG: preserve branch-weight metadata when creating a new switch from
a pair of switch/branch where both depend on the value of the same variable and
the default case of the first switch/branch goes to the second switch/branch.

Code clean up and fixed a few issues:
1> handling the case where some cases of the 2nd switch are invalidated
2> correctly calculate the weight for the 2nd switch when it is a conditional eq

Testing case is modified from Alastair's original patch.

llvm-svn: 163635
2012-09-11 17:43:35 +00:00
Chad Rosier ab51c9de34 Formatting. No functional change intended.
llvm-svn: 163627
2012-09-11 16:33:10 +00:00
Nadav Rotem 65ba95ebf9 Stack Coloring: Dont crash on dbg values which use stack frames.
llvm-svn: 163616
2012-09-11 12:34:27 +00:00
Alex Rosenberg 04b43aab43 Add a pass that renames everything with metasyntatic names. This works well after using bugpoint to reduce the confusion presented by the original names, which no longer mean what they used to.
llvm-svn: 163592
2012-09-11 02:46:18 +00:00
Eric Christopher 9fd70c8fb3 Revert r160148 it seems to cause more problems than it should
right now. We'll fix PR13303 a different way.

llvm-svn: 163570
2012-09-10 23:34:06 +00:00
Chad Rosier e27921feb8 Add newline.
llvm-svn: 163565
2012-09-10 23:09:27 +00:00
NAKAMURA Takumi 8c72306cdb test/CodeGen/X86/ms-inline-asm.ll: Relax for non-darwin x86 targets. '##InlineAsm' could not be seen in other hosts.
llvm-svn: 163554
2012-09-10 22:04:54 +00:00
Chad Rosier 7641f58784 [ms-inline asm] Properly emit the asm directives when the AsmPrinterVariant
and InlineAsmVariant don't match.

llvm-svn: 163550
2012-09-10 21:36:05 +00:00
Chad Rosier 1c1319b9e7 Update test case for Release builds.
llvm-svn: 163549
2012-09-10 21:31:43 +00:00
Chad Rosier db20a41d99 [ms-inline asm] Pass the correct AsmVariant to the PrintAsmOperand() function
and update the printOperand() function accordingly.

llvm-svn: 163544
2012-09-10 21:10:49 +00:00
Chad Rosier 6f8d8b2406 [ms-inline asm] Add support for .att_syntax directive.
llvm-svn: 163542
2012-09-10 20:54:39 +00:00
Jakob Stoklund Olesen 8b9dce5c18 Don't attempt to use flags from predicated instructions.
The ARM backend can eliminate cmp instructions by reusing flags from a
nearby sub instruction with similar arguments.

Don't do that if the sub is predicated - the flags are not written
unconditionally.

<rdar://problem/12263428>

llvm-svn: 163535
2012-09-10 19:17:25 +00:00
Nadav Rotem 3c86b78ae4 Stack Coloring: Handle the case where END markers come before BEGIN markers properly.
llvm-svn: 163530
2012-09-10 18:51:09 +00:00
Michael Liao 400f7ef871 Enhance PR11334 fix to support extload from v2f32/v4f32
- Fix an remaining issue of PR11674 as well

llvm-svn: 163528
2012-09-10 18:33:51 +00:00
Michael Liao c3d5b21c39 Add boolean simplification support from CMOV
- If a boolean value is generated from CMOV and tested as boolean value,
  simplify the use of test result by referencing the original condition.
  RDRAND intrinisc is one of such cases.

llvm-svn: 163516
2012-09-10 16:36:16 +00:00
James Molloy 1e5c611815 Fix an assertion failure when optimising a shufflevector incorrectly into concat_vectors, and a followup bug with SelectionDAG::getNode() creating nodes with invalid types.
llvm-svn: 163511
2012-09-10 14:01:21 +00:00
Nadav Rotem 6731363185 Stack Coloring: Add support for multiple regions of the same slot, within a single basic block.
llvm-svn: 163507
2012-09-10 12:39:35 +00:00
Elena Demikhovsky 264fb0217e The VPSHUFB 256-bit instruction may be generated when one of input vector is undefined or zeroinitializer.
I've added the "zeroinitializer" case in this patch.

llvm-svn: 163506
2012-09-10 12:13:11 +00:00
Nadav Rotem d753a952ca Teach the DAGBuilder about lifetime markers which are generated from PHINodes.
llvm-svn: 163494
2012-09-10 08:43:23 +00:00
Craig Topper 03f39773e0 Teach DAG combiner to constant fold fneg of a BUILD_VECTOR of constants.
llvm-svn: 163483
2012-09-09 22:58:45 +00:00
Craig Topper 4ed79bd7d7 Add instruction selection for ffloor of vectors when SSE4.1 or AVX is enabled.
llvm-svn: 163473
2012-09-08 17:42:27 +00:00
Craig Topper 98f2e861a0 Add support for lowering FABS of vector types.
llvm-svn: 163461
2012-09-08 07:31:51 +00:00
Craig Topper 3e41a5bb31 Set operation action for FFLOOR to Expand for all vector types for X86. Set FFLOOR of v4f32 to Expand for ARM. v2f64 was already correct.
llvm-svn: 163458
2012-09-08 04:58:43 +00:00
Andrew Trick d3b4d2cb76 Remove an incorrect assert during branch weight propagation.
Patch and test case by Alastair Murray!

llvm-svn: 163437
2012-09-08 00:07:26 +00:00
Benjamin Kramer 68b9f0583f Fix alignment of .comm and .lcomm on mingw32.
For some reason .lcomm uses byte alignment and .comm log2 alignment so we can't
use the same setting for both. Fix this by reintroducing the LCOMM enum.
I verified this against mingw's gcc.

llvm-svn: 163420
2012-09-07 21:08:01 +00:00
Jack Carter 2c39b77f89 Initial relocations test for the Mips standalone assembler.
This is not an exhaustive set, but something we can build on.

Contributer: Vladimir Medic
llvm-svn: 163419
2012-09-07 20:38:18 +00:00
Benjamin Kramer 47f9ec92cb MC: Overhaul handling of .lcomm
- Darwin lied about not supporting .lcomm and turned it into zerofill in the
  asm parser. Push the zerofill-conversion down into macho-specific code.
- This makes the tri-state LCOMMType enum superfluous, there are no targets
  without .lcomm.
- Do proper error reporting when trying to use .lcomm with alignment on a target
  that doesn't support it.
- .comm and .lcomm alignment was parsed in bytes on COFF, should be power of 2.
- Fixes PR13755 (.lcomm crashes on ELF).

llvm-svn: 163395
2012-09-07 17:25:13 +00:00
Benjamin Kramer e3d658bb6c PR13754: llvm-mc/x86 crashes on .cfi directives without the % prefix for registers.
gas accepts this and it seems to be common enough to be worth supporting. This
doesn't affect the parsing of reg operands outside of .cfi directives.

llvm-svn: 163390
2012-09-07 14:51:35 +00:00
Nuno Lopes 1cf578d599 yet another attempt at fixing @OCAMLOPT@ for sed.
Patch by Rick Foos.

llvm-svn: 163380
2012-09-07 09:24:13 +00:00
Jack Carter 97158ca5c2 The Mips standalone assembler aliased instruction support.
The assembler can alias one instruction into another based
on the operands. For example the jump instruction "J" takes
and immediate operand, but if the operand is a register the
assembler will change it into a jump register "JR" instruction.

These changes are in the instruction td file.

Test cases included

Contributer: Vladimir Medic
llvm-svn: 163368
2012-09-07 01:42:38 +00:00
Jack Carter be33217bb8 The Mips standalone assembler intial directive support.
Actually these are just stubs for parsing the directives.
Semantic support will come later.

Test cases included

Contributer: Vladimir Medic
llvm-svn: 163364
2012-09-07 00:48:02 +00:00
Jack Carter a63b16ac1e The Mips standalone assembler fpu instruction support.
Test cases included

Contributer: Vladimir Medic
llvm-svn: 163363
2012-09-07 00:23:42 +00:00
Michael Liao 026f833368 Re-work bit/bits value resolving in tblgen
- This patch is inspired by the failure of the following code snippet
  which is used to convert enumerable values into encoding bits to
  improve the readability of td files.

  class S<int s> {
    bits<2> V = !if(!eq(s, 8),  {0, 0},
                !if(!eq(s, 16), {0, 1},
                !if(!eq(s, 32), {1, 0},
                !if(!eq(s, 64), {1, 1}, {?, ?}))));
  }

  Later, PR8330 is found to report not exactly the same bug relevant
  issue to bit/bits values.

- Instead of resolving bit/bits values separately through
  resolveBitReference(), this patch adds getBit() for all Inits and
  resolves bit value by resolving plus getting the specified bit. This
  unifies the resolving of bit with other values and removes redundant
  logic for resolving bit only. In addition,
  BitsInit::resolveReferences() is optimized to take advantage of this
  origanization by resolving VarBitInit's variable reference first and
  then getting bits from it.

- The type interference in '!if' operator is revised to support possible
  combinations of int and bits/bit in MHS and RHS.

- As there may be illegal assignments from integer value to bit, says
  assign 2 to a bit, but we only check this during instantiation in some
  cases, e.g.

  bit V = !if(!eq(x, 17), 0, 2);

  Verbose diagnostic message is generated when invalid value is
  resolveed to help locating the error.

- PR8330 is fixed as well.

llvm-svn: 163360
2012-09-06 23:32:48 +00:00
Jack Carter dc1e35d418 The Mips standalone assembler memory instruction support.
This includes sb,sc,sh,sw,lb,lw,lbu,lh,lhu,ll,lw

Test case included

Contributer: Vladimir Medic
llvm-svn: 163346
2012-09-06 20:00:02 +00:00
Jakob Stoklund Olesen 866908c42c Allow overlaps between virtreg and physreg live ranges.
The RegisterCoalescer understands overlapping live ranges where one
register is defined as a copy of the other. With this change, register
allocators using LiveRegMatrix can do the same, at least for copies
between physical and virtual registers.

When a physreg is defined by a copy from a virtreg, allow those live
ranges to overlap:

  %CL<def> = COPY %vreg11:sub_8bit; GR32_ABCD:%vreg11
  %vreg13<def,tied1> = SAR32rCL %vreg13<tied0>, %CL<imp-use,kill>

We can assign %vreg11 to %ECX, overlapping the live range of %CL.

llvm-svn: 163336
2012-09-06 18:15:23 +00:00
Tim Northover 00e071ad52 Diagnose invalid alignments on duplicating VLDn instructions.
Patch by Chris Lidbury.

llvm-svn: 163323
2012-09-06 15:27:12 +00:00
Tim Northover fb3cdd83b0 Check for invalid alignment values when decoding VLDn/VSTn (single ln) instructions.
Patch by Chris Lidbury.

llvm-svn: 163321
2012-09-06 15:17:49 +00:00
Arnold Schwaighofer 8dc34cfb99 BasicAA: Recognize cyclic NoAlias phis
Enhances basic alias analysis to recognize phis whose first incoming values are
NoAlias and whose other incoming values are just the phi node itself through
some amount of recursion.

Example: With this change basicaa reports that ptr_phi and ptr_phi2 do not alias
each other.

bb:
 ptr = ptr2 + 1

loop:
  ptr_phi = phi [bb, ptr], [loop, ptr_plus_one]
  ptr2_phi = phi [bb, ptr2], [loop, ptr2_plus_one]
  ...
  ptr_plus_one = gep ptr_phi, 1
  ptr2_plus_one = gep ptr2_phi, 1

This enables the elimination of one load in code like the following:

extern int foo;

int test_noalias(int *ptr, int num, int* coeff) {
  int *ptr2 = ptr;
  int result = (*ptr++) * (*coeff--);
  while (num--) {
    *ptr2++ = *ptr;
    result +=  (*coeff--) * (*ptr++);
  }
  *ptr = foo;
  return result;
}

Part 2/2 of fix for PR13564.

llvm-svn: 163319
2012-09-06 14:41:53 +00:00
Tim Northover 262f6f564f Use correct part of complex operand to encode VST1 alignment.
Patch by Chris Lidbury.

llvm-svn: 163318
2012-09-06 14:36:55 +00:00
Arnold Schwaighofer 76dca58c66 BasicAA: GEPs of NoAlias'ing base ptr with equivalent indices are NoAlias
If we can show that the base pointers of two GEPs don't alias each other using
precise analysis and the indices and base offset are equal then the two GEPs
also don't alias each other.
This is primarily needed for the follow up patch that analyses NoAlias'ing PHI
nodes.

Part 1/2 of fix for PR13564.

llvm-svn: 163317
2012-09-06 14:31:51 +00:00
Nadav Rotem 9e3cc9f884 Disable stack coloring by default in order to resolve the i386 failures.
llvm-svn: 163316
2012-09-06 14:27:06 +00:00
Elena Demikhovsky 42777877c2 AVX2 optimization.
Added generation of VPSHUB instruction for <32 x i8> vector shuffle when possible.

llvm-svn: 163312
2012-09-06 12:42:01 +00:00
Nadav Rotem ea0d36be95 Fix the test by specifying an exact cpu model.
llvm-svn: 163307
2012-09-06 10:33:33 +00:00
Hans Wennborg feb4d07d88 Fix switch_to_lookup_table.ll test from r163302.
The lookup tables did not get built in a deterministic order.
This makes them get built in the order that the corresponding phi nodes
were found.

llvm-svn: 163305
2012-09-06 10:10:35 +00:00
James Molloy 49bdbce8e1 Improve codegen for BUILD_VECTORs on ARM.
If we have a BUILD_VECTOR that is mostly a constant splat, it is often better to splat that constant then insertelement the non-constant lanes instead of insertelementing every lane from an undef base.

llvm-svn: 163304
2012-09-06 09:55:02 +00:00
Hans Wennborg 8a62fc5294 Build lookup tables for switches (PR884)
This adds a transformation to SimplifyCFG that attemps to turn switch
instructions into loads from lookup tables. It works on switches that
are only used to initialize one or more phi nodes in a common successor
basic block, for example:

  int f(int x) {
    switch (x) {
    case 0: return 5;
    case 1: return 4;
    case 2: return -2;
    case 5: return 7;
    case 6: return 9;
    default: return 42;
  }

This speeds up the code by removing the hard-to-predict jump, and
reduces code size by removing the code for the jump targets.

llvm-svn: 163302
2012-09-06 09:43:28 +00:00
Nadav Rotem 7c277da364 Add a new optimization pass: Stack Coloring, that merges disjoint static allocations (allocas). Allocas are known to be
disjoint if they are marked by disjoint lifetime markers (@llvm.lifetime.XXX intrinsics).

llvm-svn: 163299
2012-09-06 09:17:37 +00:00
James Molloy 34e9931bec Optimize codegen for VSETLNi{8,16,32} operating on Q registers. Degenerate to a VSETLN on D registers, instead of an (INSERT_SUBREG (VSETLN (EXTRACT_SUBREG ))) sequence to help the register coalescer.
llvm-svn: 163298
2012-09-06 09:16:01 +00:00
Craig Topper daa5ed1e0a Add patterns for converting stores of subvector_extracts of lower 128-bits of a 256-bit vector to VMOVAPSmr/VMOVUPSmr.
llvm-svn: 163292
2012-09-06 05:15:01 +00:00
Jim Grosbach 2c1b00a991 Revert "Enable MCJIT tests on Darwin."
This reverts commit 163278.

Works OK on x86_64, but not i386. Will re-enable when that's cleared up.

llvm-svn: 163290
2012-09-06 03:24:09 +00:00
Jim Grosbach 0fa7c01f8f Enable MCJIT tests on Darwin.
llvm-svn: 163278
2012-09-06 00:59:06 +00:00
Jack Carter 71e6a7492e Mips specific llvm assembler support for branch and jump instructions.
Test case included.

Contributer: Vladimir Medic
llvm-svn: 163277
2012-09-06 00:43:26 +00:00
Jakob Stoklund Olesen f831059f60 Use predication instead of pseudo-opcodes when folding into MOVCC.
Now that it is possible to dynamically tie MachineInstr operands,
predicated instructions are possible in SSA form:

  %vreg3<def> = SUBri %vreg1, -2147483647, pred:14, pred:%noreg, %opt:%noreg
  %vreg4<def,tied1> = MOVCCr %vreg3<tied0>, %vreg1, %pred:12, pred:%CPSR

Becomes a predicated SUBri with a tied imp-use:

  SUBri %vreg1, -2147483647, pred:13, pred:%CPSR, opt:%noreg, %vreg1<imp-use,tied0>

This means that any instruction that is safe to move can be folded into
a MOVCC, and the *CC pseudo-instructions are no longer needed.

The test case changes reflect that Thumb2SizeReduce recognizes the
predicated instructions. It didn't understand the pseudos.

llvm-svn: 163274
2012-09-05 23:58:02 +00:00
Nick Lewycky b82c0ec5a6 Add missing file for test.
llvm-svn: 163272
2012-09-05 23:52:20 +00:00
Nick Lewycky cfc2fe9163 Teach libObject about some more ELF relocations. llvm-objdump -r now knows
every relocation in C++ hello world built with debug info.

llvm-svn: 163271
2012-09-05 23:48:54 +00:00
Manman Ren f3fedb6935 JumpThreading: when default destination is the destination of some cases in a
switch, make sure we include the value for the cases when calculating edge
value from switch to the default destination.

rdar://12241132

llvm-svn: 163270
2012-09-05 23:45:58 +00:00
Jack Carter b4dbc17acd Mips specific llvm assembler support for ALU instructions. This includes
register support. Test case included.

Contributer: Vladimir Medic
llvm-svn: 163268
2012-09-05 23:34:03 +00:00
Tim Northover c8d867d42d Strip old MachineInstrs *after* we know we can put them back.
Previous patch accidentally decided it couldn't convert a VFP to a
NEON instruction after it had already destroyed the old one. Not a
good move.

llvm-svn: 163230
2012-09-05 18:37:53 +00:00
Pranav Bhandarkar 823f9ebaa3 LLVM Bug Fix 13709: Remove needless lsr(Rp, #32) instruction access the
subreg_hireg of register pair Rp.

	* lib/Target/Hexagon/HexagonPeephole.cpp(PeepholeDoubleRegsMap): New
	 DenseMap similar to PeepholeMap that additionally records subreg info
	 too.
        (runOnMachineFunction): Record information in PeepholeDoubleRegsMap
        and copy propagate the high sub-reg of Rp0 in Rp1 = lsr(Rp0, #32) to
	the instruction Rx = COPY Rp1:logreg_subreg.
	* test/CodeGen/Hexagon/remove_lsr.ll: New test.
	

llvm-svn: 163214
2012-09-05 16:01:40 +00:00
Silviu Baranga 3f40d87207 Fixed the DAG combiner to better handle the folding of AND nodes for vector types. The previous code was making the assumption that the length of the bitmask returned by isConstantSplat was equal to the size of the vector type. Now we first make sure that the splat value has at least the length of the vector lane type, then we only use as many fields as we have available in the splat value.
llvm-svn: 163203
2012-09-05 08:57:21 +00:00
Logan Chien eeaaf65cb6 Fix UseInitArray option for MIPS target.
llvm-svn: 163193
2012-09-05 06:17:17 +00:00
Dan Gohman df476e5e93 Make provenance checking conservative in cases when
pointers-to-strong-pointers may be in play. These can lead to retains and
releases happening in unstructured ways, foiling the optimizer. This fixes
rdar://12150909.

llvm-svn: 163180
2012-09-04 23:16:20 +00:00
Jakob Stoklund Olesen c7579cdded Move tie checks into MachineVerifier::visitMachineOperand.
llvm-svn: 163152
2012-09-04 18:38:28 +00:00
Preston Gurd cdf540d5d6 Generic Bypass Slow Div
- CodeGenPrepare pass for identifying div/rem ops
- Backend specifies the type mapping using addBypassSlowDivType
- Enabled only for Intel Atom with O2 32-bit -> 8-bit
- Replace IDIV with instructions which test its value and use DIVB if the value
is positive and less than 256.
- In the case when the quotient and remainder of a divide are used a DIV
and a REM instruction will be present in the IR. In the non-Atom case
they are both lowered to IDIVs and CSE removes the redundant IDIV instruction,
using the quotient and remainder from the first IDIV. However,
due to this optimization CSE is not able to eliminate redundant
IDIV instructions because they are located in different basic blocks.
This is overcome by calculating both the quotient (DIV) and remainder (REM)
in each basic block that is inserted by the optimization and reusing the result
values when a subsequent DIV or REM instruction uses the same operands.
- Test cases check for the presents of the optimization when calculating
either the quotient, remainder,  or both.

Patch by Tyler Nowicki!

llvm-svn: 163150
2012-09-04 18:22:17 +00:00
Sergei Larin 4d8986af12 Porting Hexagon MI Scheduler to the new API.
Change current Hexagon MI scheduler to use new converging
scheduler. Integrates DFA resource model into it.

llvm-svn: 163137
2012-09-04 14:49:56 +00:00
Arnold Schwaighofer f00fb1c581 Patch to implement UMLAL/SMLAL instructions for the ARM architecture
This patch corrects the definition of umlal/smlal instructions and adds support
for matching them to the ARM dag combiner.

Bug 12213

Patch by Yin Ma!

llvm-svn: 163136
2012-09-04 14:37:49 +00:00
Elena Demikhovsky cbe99bbb36 This patch optimizes shuffle instruction - generates 2 instructions instead of 4.
Since this specific shuffle is widely used in many workloads we have ~10% performance on them.

shufflevector <8 x float> %A, <8 x float> %B, <8 x i32> <i32 0, i32 8, i32 2, i32 10, i32 4, i32 12, i32 6, i32 14>

vmovaps (%rdx), %ymm0
vshufps $8, %ymm0, %ymm0, %ymm0
vmovaps (%rcx), %ymm1
vshufps $8, %ymm0, %ymm1, %ymm1
vunpcklps       %ymm0, %ymm1, %ymm0

vmovaps (%rcx), %ymm0
vmovsldup       (%rdx), %ymm1
vblendps        $85, %ymm0, %ymm1, %ymm0

llvm-svn: 163134
2012-09-04 12:49:02 +00:00
Nadav Rotem 03dcd85b56 LICM may hoist an instruction with undefined behavior above a trap.
Scan the body of the loop and find instructions that may trap.
Use this information when deciding if it is safe to hoist or sink instructions.
Notice that we can optimize the search of instructions that may throw in the case of nested loops.

rdar://11518836

llvm-svn: 163132
2012-09-04 10:25:04 +00:00
Alexey Samsonov c942e6b781 Add support for fetching inlining context (stack of source code locations)
by instruction address from DWARF.

Add --inlining flag to llvm-dwarfdump to demonstrate and test this functionality,
so that "llvm-dwarfdump --inlining --address=0x..." now works much like
"addr2line -i 0x...", provided that the binary has debug info
(Clang's -gline-tables-only *is* enough).

llvm-svn: 163128
2012-09-04 08:12:33 +00:00
Bob Wilson dcc54decd5 Fix more fallout from r158919, similar to PR13547.
This code used to only handle malloc-like calls, which do not read memory.
r158919 changed it to check isNoAliasFn(), which includes strdup-like and
realloc-like calls, but it was not checking for dependencies on the memory
read by those calls.

llvm-svn: 163106
2012-09-03 05:15:15 +00:00
Nuno Lopes 750f83a752 escape special char when handling CXX_FOR_OCAMLOPT
llvm-svn: 163098
2012-09-02 15:16:51 +00:00
Nuno Lopes 19e8933063 fix test's RUN lines
llvm-svn: 163097
2012-09-02 15:07:25 +00:00
Nadav Rotem 9d83202620 Not all targets have efficient ISel code generation for select instructions.
For example, the ARM target does not have efficient ISel handling for vector
selects with scalar conditions. This patch adds a TLI hook which allows the
different targets to report which selects are supported well and which selects
should be converted to CF duting codegen prepare.

llvm-svn: 163093
2012-09-02 12:10:19 +00:00
Benjamin Kramer 599a4bb6ea LoopRotation: Make the brute force DomTree update more brute force.
We update until we hit a fixpoint. This is probably slow but also
slightly simplifies the code. It should also fix the occasional
invalid domtrees observed when building with expensive checking.

I couldn't find a case where this had a measurable slowdown, but
if someone finds a pathological case where it does we may have
to find a cleverer way of updating dominators here.

Thanks to Duncan for the test case.

llvm-svn: 163091
2012-09-02 11:57:22 +00:00
Nadav Rotem 500d691d4a Generate better select code by allowing the target to use scalar select, and not sign-extend.
llvm-svn: 163086
2012-09-02 08:20:07 +00:00
Pete Cooper 2117ac40c9 Revert "Take account of boolean vector contents when promoting a build vector from i1 to some other type. rdar://problem/12210060"
This reverts commit 5dd9e214fb92847e947f9edab170f9b4e52b908f.

Thanks to Duncan for explaining how this should have been done.

Conflicts:

	test/CodeGen/X86/vec_select.ll

llvm-svn: 163064
2012-09-01 17:37:55 +00:00
Logan Chien cea0354c1b Fix Thumb2 fixup kind in the integrated-as.
llvm-svn: 163063
2012-09-01 15:06:36 +00:00
Owen Anderson 90e0eaffa8 Teach DAG combine a number of tricks to simplify FMA expressions in fast-math mode.
llvm-svn: 163051
2012-09-01 06:04:27 +00:00
NAKAMURA Takumi d35a4ff88b llvm/test/CodeGen/X86/fp-fast.ll: Suppress FMA4 on AMD Bulldozer host, corresponding to r162999.
llvm-svn: 163041
2012-09-01 00:26:28 +00:00
Manman Ren 3590361bf0 Fix Atom bots for r163036.
llvm-svn: 163040
2012-09-01 00:17:06 +00:00
Manman Ren 26c5d0f607 SelectionDAG: when constructing VZEXT_LOAD from other loads, make sure its
output chain is correctly setup.

As an example, if the original load must happen before later stores, we need
to make sure the constructed VZEXT_LOAD is constrained to be before the stores.

rdar://11457792

llvm-svn: 163036
2012-08-31 23:16:57 +00:00
Craig Topper 908e685102 Mark FMA4 instructions as commutable and add them to the folding tables.
llvm-svn: 163035
2012-08-31 23:10:34 +00:00
Michael Liao 3224543bf9 Fix PR12359
- In addition to undefined, if V2 is zero vector, skip 2nd PSHUFB and POR as
  well as PSHUFB will zero elements with negative indices.

  Patch by Sriram Murali <sriram.murali@intel.com>

llvm-svn: 163018
2012-08-31 20:12:31 +00:00
Jack Carter b3f3b17e16 The instruction DINS may be transformed into DINSU or DEXTM depending
on the size of the extraction and its position in the 64 bit word.

This patch allows support of the dext transformations with mips64 direct
object output.

0 <= msb < 32 0 <= lsb < 32 0 <= pos < 32 1 <= size <= 32
DINS
The field is entirely contained in the right-most word of the doubleword

32 <= msb < 64 0 <= lsb < 32 0 <= pos < 32 2 <= size <= 64
DINSM
The field straddles the words of the doubleword

32 <= msb < 64 32 <= lsb < 64 32 <= pos < 64 1 <= size <= 32
DINSU
The field is entirely contained in the left-most word of the doubleword

llvm-svn: 163010
2012-08-31 18:06:48 +00:00
Craig Topper c0387f6b23 Mark FMA3 instructions as commutable so that the operands to the multiply part can be commuted.
llvm-svn: 163001
2012-08-31 16:31:13 +00:00
Craig Topper c30fdbc46c Add support for converting llvm.fma to fma4 instructions.
llvm-svn: 162999
2012-08-31 15:40:30 +00:00
Jakob Stoklund Olesen 96f87069c4 Don't enforce ordered inline asm operands.
I was too optimistic, inline asm can have tied operands that don't
follow the def order.

Fixes PR13742.

llvm-svn: 162998
2012-08-31 15:34:59 +00:00
NAKAMURA Takumi 2762dadf2c llvm/test/CodeGen/X86/vec_select.ll: Fix failure on xmm-less hosts, to add -mattr=+sse2.
FIXME: Should this be tested with both +avx and -avx,+sse2?
llvm-svn: 162983
2012-08-31 10:02:22 +00:00
Jakob Stoklund Olesen d3bda3c5b9 Fix a couple of typos in EmitAtomic.
Thumb2 instructions are mostly constrained to rGPR, not tGPR which is
for Thumb1.

rdar://problem/12203728

llvm-svn: 162968
2012-08-31 02:08:34 +00:00
Jim Grosbach e423e865fe X86: Fix encoding of 'movd %xmm0, %rax'
The assembly string for the VMOVPQIto64rr instruction incorrectly lacked the 'v'
prefix, resulting in mis-assembly of the vanilla movd instruction.

llvm-svn: 162963
2012-08-31 00:30:30 +00:00
Pete Cooper e969340fea Take account of boolean vector contents when promoting a build vector from i1 to some other type. rdar://problem/12210060
llvm-svn: 162960
2012-08-30 23:58:52 +00:00
Owen Anderson d1545e3715 Try to make this test more generic to unbreak buildbots.
llvm-svn: 162958
2012-08-30 23:51:20 +00:00
Owen Anderson cc61f87cf7 Teach the DAG combiner to turn chains of FADDs (x+x+x+x+...) into FMULs by constants. This is only enabled in unsafe FP math mode, since it does not preserve rounding effects for all such constants.
llvm-svn: 162956
2012-08-30 23:35:16 +00:00
Michael Gottesman 2dc1120f15 [llvm] Updated the test fold-vector-select so that we test the vector selects exhaustively.
llvm-svn: 162953
2012-08-30 23:11:49 +00:00
Nadav Rotem ea973bda26 Currently targets that do not support selects with scalar conditions and vector operands - scalarize the code. ARM is such a target
because it does not support CMOV of vectors. To implement this efficientlyi, we broadcast the condition bit and use a sequence of NAND-OR
to select between the two operands. This is the same sequence we use for targets that don't have vector BLENDs (like SSE2).

rdar://12201387

llvm-svn: 162926
2012-08-30 19:17:29 +00:00
Michael Liao bbd10792c2 Introduce 'UseSSEx' to force SSE legacy encoding
- Add 'UseSSEx' to force SSE legacy insn not being selected when AVX is
  enabled.

  As the penalty of inter-mixing SSE and AVX instructions, we need
  prevent SSE legacy insn from being generated except explicitly
  specified through some intrinsics. For patterns supported by both
  SSE and AVX, so far, we force AVX insn will be tried first relying on
  AddedComplexity or position in td file. It's error-prone and
  introduces bugs accidentally.

  'UseSSEx' is disabled when AVX is turned on. For SSE insns inherited
  by AVX, we need this predicate to force VEX encoding or SSE legacy
  encoding only.

  For insns not inherited by AVX, we still use the previous predicates,
  i.e. 'HasSSEx'. So far, these insns fall into the following
  categories:
  * SSE insns with MMX operands
  * SSE insns with GPR/MEM operands only (xFENCE, PREFETCH, CLFLUSH,
    CRC, and etc.)
  * SSE4A insns.
  * MMX insns.
  * x87 insns added by SSE.

2 test cases are modified:

 - test/CodeGen/X86/fast-isel-x86-64.ll
   AVX code generation is different from SSE one. 'vcvtsi2sdq' cannot be
   selected by fast-isel due to complicated pattern and fast-isel
   fallback to materialize it from constant pool.

 - test/CodeGen/X86/widen_load-1.ll
   AVX code generation is different from SSE one after fixing SSE/AVX
   inter-mixing. Exec-domain fixing prefers 'vmovapd' instead of
   'vmovaps'.

llvm-svn: 162919
2012-08-30 16:54:46 +00:00
Benjamin Kramer 17ce7117d0 Fix test case.
llvm-svn: 162913
2012-08-30 15:42:45 +00:00
Benjamin Kramer afdfdb5cff LoopRotate: Also rotate loops with multiple exits.
The old PHI updating code in loop-rotate was replaced with SSAUpdater a while
ago, it has no problems with comples PHIs. What had to be fixed is detecting
whether a loop was already rotated and updating dominators when multiple exits
were present.

This change increases overall code size a bit, mostly due to additional loop
unrolling opportunities. Passes test-suite and selfhost with -verify-dom-info.
Fixes PR7447.

Thanks to Andy for the input on the domtree updating code.

llvm-svn: 162912
2012-08-30 15:39:42 +00:00
Nadav Rotem d5f5777b77 It is illegal to transform (sdiv (ashr X c1) c2) -> (sdiv x (2^c1 * c2)),
because C always rounds towards zero.

Thanks Dirk and Ben.

llvm-svn: 162899
2012-08-30 11:23:20 +00:00
Tim Northover ca9f384ff8 Add support for moving pure S-register to NEON pipeline if desired
llvm-svn: 162898
2012-08-30 10:17:45 +00:00
Michael Liao 271f11b571 Should put test case under test/ExecutionEngine/MCJIT/
llvm-svn: 162885
2012-08-30 00:43:57 +00:00
Michael Liao 3c8980646b Fix PR13727
- The root cause is that target constant materialization in X86 fast-isel
  creates a PC-rel addressing which may overflow 32-bit range in non-Small code
  model if .rodata section is allocated too far away from code segment in
  MCJIT, which uses Large code model so far.
- Follow the similar logic to fix non-Small code model in fast-isel by skipping
  non-Small code model.

llvm-svn: 162881
2012-08-30 00:30:16 +00:00
Hal Finkel 1859d26528 Reserve space for the mandatory traceback fields on PPC64.
We need to reserve space for the mandatory traceback fields,
though leaving them as zero is appropriate for now.

Although the ABI calls for these fields to be filled in fully, no
compiler on Linux currently does this, and GDB does not read these
fields.  GDB uses the first word of zeroes during exception handling to
find the end of the function and the size field, allowing it to compute
the beginning of the function.  DWARF information is used for everything
else.  We need the extra 8 bytes of pad so the size field is found in
the right place.

As a comparison, GCC fills in a few of the fields -- language, number
of saved registers -- but ignores the rest.  IBM's proprietary OSes do
make use of the full traceback table facility.

Patch by Bill Schmidt.

llvm-svn: 162854
2012-08-29 20:22:24 +00:00
Benjamin Kramer 8bcc971174 Make MemoryBuiltins aware of TargetLibraryInfo.
This disables malloc-specific optimization when -fno-builtin (or -ffreestanding)
is specified. This has been a problem for a long time but became more severe
with the recent memory builtin improvements.

Since the memory builtin functions are used everywhere, this required passing
TLI in many places. This means that functions that now have an optional TLI
argument, like RecursivelyDeleteTriviallyDeadFunctions, won't remove dead
mallocs anymore if the TLI argument is missing. I've updated most passes to do
the right thing.

Fixes PR13694 and probably others.

llvm-svn: 162841
2012-08-29 15:32:21 +00:00
Jush Lu e87e559e62 [arm-fast-isel] Add support for ARM PIC.
llvm-svn: 162823
2012-08-29 02:41:21 +00:00
NAKAMURA Takumi e9205613f6 Create llvm/test/Object/Mips/lit.local.cfg to check Mips in targets_to_build.
llvm-svn: 162819
2012-08-29 01:37:57 +00:00
NAKAMURA Takumi e1f9fc11d4 llvm/test: [CMake] Add profile_rt-shared to deps.
llvm-svn: 162813
2012-08-29 00:37:56 +00:00
NAKAMURA Takumi 87abb0ee34 llvm/test/Analysis/Profiling: Mark 3 of them as REQUIRES: loadable_module.
FIXME: profile_rt.dll could be built on win32.
llvm-svn: 162811
2012-08-29 00:37:46 +00:00
Jack Carter fc93516642 Moved input for objdump test from Mips to Inputs.
llvm-svn: 162808
2012-08-29 00:10:48 +00:00
Manman Ren abbb01abea Profile: set branch weight metadata with data generated from profiling.
This patch implements ProfileDataLoader which loads profile data generated by
-insert-edge-profiling and updates branch weight metadata accordingly.

Patch by Alastair Murray.

llvm-svn: 162799
2012-08-28 22:21:25 +00:00
Jack Carter cd6b0e1368 The instruction DEXT may be transformed into DEXTU or DEXTM depending
on the size of the extraction and its position in the 64 bit word.

This patch allows support of the dext transformations with mips64 direct
object output.

0 <= msb < 32 0 <= lsb < 32 0 <= pos < 32 1 <= size <= 32
DINS
The field is entirely contained in the right-most word of the doubleword

32 <= msb < 64 0 <= lsb < 32 0 <= pos < 32 2 <= size <= 64
DINSM
The field straddles the words of the doubleword

32 <= msb < 64 32 <= lsb < 64 32 <= pos < 64 1 <= size <= 32
DINSU
The field is entirely contained in the left-most word of the doubleword

llvm-svn: 162782
2012-08-28 20:07:41 +00:00
Jack Carter 551efd7fd9 Some of the instructions in the Mips instruction set are revision
delimited. llvm-mc -disassemble access these through the -mattr
option.

llvm-objdump -disassemble had no such way to set the attribute so
some instructions were just not recognized for disassembly.

This patch accepts llvm-mc mechanism for specifying the attributes.

llvm-svn: 162781
2012-08-28 19:24:49 +00:00
Jack Carter c20a21b855 Some instructions are passed to the assembler to be
transformed to the final instruction variant. An
example would be dsrll which is transformed into 
dsll32 if the shift value is greater than 32.

For direct object output we need to do this transformation
in the codegen. If the instruction was inside branch
delay slot, it was being missed. This patch corrects this
oversight.

llvm-svn: 162779
2012-08-28 19:07:39 +00:00
Roman Divacky 8c4b6a307e Emit word of zeroes after the last instruction as a start of the mandatory
traceback table on PowerPC64. This helps gdb handle exceptions. The other
mandatory fields are ignored by gdb and harder to implement so just add
there a FIXME.

Patch by Bill Schmidt. PR13641.

llvm-svn: 162778
2012-08-28 19:06:55 +00:00
Hal Finkel 742b535e40 Add PPC Freescale e500mc and e5500 subtargets.
Add subtargets for Freescale e500mc (32-bit) and e5500 (64-bit) to
the PowerPC backend.

Patch by Tobias von Koch.

llvm-svn: 162764
2012-08-28 16:12:39 +00:00
Benjamin Kramer 9c0a807c27 InstCombine: Guard the transform introduced in r162743 against large ints and non-const shifts.
llvm-svn: 162751
2012-08-28 13:08:13 +00:00
Nadav Rotem d457787fed Make sure that we don't call getZExtValue on values > 64 bits.
Thanks Benjamin for noticing this.

llvm-svn: 162749
2012-08-28 12:23:22 +00:00
Nadav Rotem 11935b29f3 Teach InstCombine to canonicalize [SU]div+[AL]shl patterns.
For example:
  %1 = lshr i32 %x, 2
  %2 = udiv i32 %1, 100

rdar://12182093

llvm-svn: 162743
2012-08-28 10:01:43 +00:00
Bill Wendling cc56718038 The commutative flag is already correctly set within the multiclass. If we set
it here, then a 'register-memory' version would wrongly get the commutative
flag.
<rdar://problem/12180135>

llvm-svn: 162741
2012-08-28 07:36:46 +00:00
Craig Topper bd509eea4a Merge AVX_SET0PSY/AVX_SET0PDY/AVX2_SET0 into a single post-RA pseudo.
llvm-svn: 162738
2012-08-28 07:05:28 +00:00
NAKAMURA Takumi cdfe1d1cdb llvm/test/CodeGen/X86/pr12312.ll: Add -mtriple=x86_64-unknown-unknown.
llvm-svn: 162736
2012-08-28 04:04:29 +00:00
Michael Liao b7d85b6328 Fix PR12312
- Add a target-specific DAG optimization to recognize a pattern PTEST-able.
  Such a pattern is a OR'd tree with X86ISD::OR as the root node. When
  X86ISD::OR node has only its flag result being used as a boolean value and
  all its leaves are extracted from the same vector, it could be folded into an
  X86ISD::PTEST node.

llvm-svn: 162735
2012-08-28 03:34:40 +00:00
Jakob Stoklund Olesen 87cb471e52 Remove extra MayLoad/MayStore flags from atomic_load/store.
These extra flags are not required to properly order the atomic
load/store instructions. SelectionDAGBuilder chains atomics as if they
were volatile, and SelectionDAG::getAtomic() sets the isVolatile bit on
the memory operands of all atomic operations.

The volatile bit is enough to order atomic loads and stores during and
after SelectionDAG.

This means we set mayLoad on atomic_load, mayStore on atomic_store, and
mayLoad+mayStore on the remaining atomic read-modify-write operations.

llvm-svn: 162733
2012-08-28 03:11:32 +00:00
Akira Hatanaka b5af7121b1 Fix mips' long branch pass.
Instructions emitted to compute branch offsets now use immediate operands
instead of symbolic labels. This change was needed because there were problems
when R_MIPS_HI16/LO16 relocations were used to make shared objects.

llvm-svn: 162731
2012-08-28 03:03:05 +00:00
Akira Hatanaka adb14f56c7 Fix bug 13532.
In SelectionDAGLegalize::ExpandLegalINT_TO_FP, expand INT_TO_FP nodes without
using any f64 operations if f64 is not a legal type.

Patch by Stefan Kristiansson. 

llvm-svn: 162728
2012-08-28 02:12:42 +00:00
Hal Finkel 686f2ee226 Allow remat of LI on PPC.
Allow load-immediates to be rematerialised in the register coalescer for
PPC. This makes test/CodeGen/PowerPC/big-endian-formal-args.ll fail,
because it relies on a register move getting emitted. The immediate load is
equivalent, so change this test case.

Patch by Tobias von Koch.

llvm-svn: 162727
2012-08-28 02:10:33 +00:00
Hal Finkel 5ab378037f Eliminate redundant CR moves on PPC32.
The 32-bit ABI requires CR bit 6 to be set if the call has fp arguments and
unset if it doesn't. The solution up to now was to insert a MachineNode to
set/unset the CR bit, which produces a CR vreg. This vreg was then copied
into CR bit 6. When the register allocator saw a bunch of these in the same
function, it allocated the set/unset CR bit in some random CR register (1
extra instruction) and then emitted CR moves before every vararg function
call, rather than just setting and unsetting CR bit 6 directly before every
vararg function call. This patch instead inserts a PPCcrset/PPCcrunset
instruction which are then matched by a dedicated instruction pattern.

Patch by Tobias von Koch.

llvm-svn: 162725
2012-08-28 02:10:27 +00:00
Hal Finkel e39526a789 Optimize zext on PPC64.
The zeroextend IR instruction is lowered to an 'and' node with an immediate
mask operand, which in turn gets legalised to a sequence of ori's & ands.
This can be done more efficiently using the rldicl instruction.

Patch by Tobias von Koch.

llvm-svn: 162724
2012-08-28 02:10:15 +00:00
Bill Wendling 988a47d7e5 Make sure we add the predicate after all of the registers are added.
<rdar://problem/12183003>

llvm-svn: 162703
2012-08-27 22:12:44 +00:00
NAKAMURA Takumi fee50c8cf1 llvm/test/CodeGen/X86/fma.ll: Add -march=x86, or two tests would fail on non-x86 hosts.
llvm-svn: 162667
2012-08-27 11:50:26 +00:00
NAKAMURA Takumi 10eb4cfc3e llvm/test/CodeGen/X86/fma_patterns.ll: Add -mtriple=x86_64. It was incompatible on i686 and Windows x64.
llvm-svn: 162664
2012-08-27 09:37:54 +00:00
Craig Topper bfc1d0ed48 Commit test change for r162658.
llvm-svn: 162660
2012-08-27 07:55:50 +00:00
Alexey Samsonov 034e57a297 Add basic support for .debug_ranges section to LLVM's DebugInfo library.
This section (introduced in DWARF-3) is used to define instruction address
ranges for functions that are not contiguous and can't be described
by low_pc/high_pc attributes (this is the usual case for inlined subroutines).
The patch is the first step to support fetching complete inlining info from DWARF.

Reviewed by Benjamin Kramer.

llvm-svn: 162657
2012-08-27 07:17:47 +00:00
Anitha Boyapati 0dd589c5f1 FMA3 tests on bdver2 target for changes made in rev 162012. Also made
corresponding changes to existing tests for darwin triple to ensure that
same pattern is tested for bdver2 target.

llvm-svn: 162655
2012-08-27 06:59:01 +00:00
Craig Topper 9e4f0aae17 Make sure that FMA3 is favored even when FMA4 is also enabled. Test case for r162454.
llvm-svn: 162653
2012-08-27 03:38:15 +00:00
Jakob Stoklund Olesen c2272df1be Infer instruction properties from single-instruction patterns.
Previously, instructions without a primary patterns wouldn't get their
properties inferred. Now, we use all single-instruction patterns for
inference, including 'def : Pat<>' instances.

This causes a lot of instruction flags to change.

- Many instructions no longer have the UnmodeledSideEffects flag because
  their flags are now inferred from a pattern.

- Instructions with intrinsics will get a mayStore flag if they already
  have UnmodeledSideEffects and a mayLoad flag if they already have
  mayStore. This is because intrinsics properties are linear.

- Instructions with atomic_load patterns get a mayStore flag because
  atomic loads can't be reordered. The correct workaround is to create
  pseudo-instructions instead of using normal loads. PR13693.

llvm-svn: 162614
2012-08-24 22:46:53 +00:00
Akira Hatanaka 4a08a4a8b6 Disable Mips' delay slot filler when optimization level is O0.
llvm-svn: 162589
2012-08-24 20:40:15 +00:00
Akira Hatanaka e8e4ef102d In MipsDAGToDAGISel::SelectAddr, fold add node into address operand, if its
second operand is MipsISD::GPRel.

llvm-svn: 162584
2012-08-24 20:21:49 +00:00
Manman Ren cf10446ffa BranchProb: modify the definition of an edge in BranchProbabilityInfo to handle
the case of multiple edges from one block to another.

A simple example is a switch statement with multiple values to the same
destination. The definition of an edge is modified from a pair of blocks to
a pair of PredBlock and an index into the successors.

Also set the weight correctly when building SelectionDAG from LLVM IR,
especially when converting a Switch.
IntegersSubsetMapping is updated to calculate the weight for each cluster.

llvm-svn: 162572
2012-08-24 18:14:27 +00:00
Roman Divacky ace4707ea6 Lower constant pools and jump tables via TOC on PPC64/SVR4.
In collaboration with Adhemerval Zanella.

llvm-svn: 162562
2012-08-24 16:26:02 +00:00
Eric Christopher bb69a27dbc Use DW_FORM_flag_present to save space in debug information if we're
not in darwin gdb compat mode.

Fixes rdar://10975088

llvm-svn: 162526
2012-08-24 01:14:27 +00:00
Eric Christopher acb7115bde Remove the DW_AT_MIPS_linkage name attribute when we don't need it
output (we're emitting a specification already and the information
isn't changing) and we're not in old gdb compat mode.

Saves 1% on the debug information for a build of llvm.

Fixes rdar://11043421

llvm-svn: 162493
2012-08-23 22:52:55 +00:00
Eric Christopher 20b76a77c3 Turn these two options in to trinary state so that they can be
turned on and off separate from the platform if you're on darwin.

llvm-svn: 162487
2012-08-23 22:36:40 +00:00
Eric Christopher 35ceacbcc2 Make this darwin specific to try to silence the bots.
llvm-svn: 162435
2012-08-23 07:18:46 +00:00
Eric Christopher 7782618271 Emit pubtypes only when going for darwin gdb compatibility.
rdar://10393214

llvm-svn: 162434
2012-08-23 07:10:56 +00:00
Eric Christopher 6cb5594979 Filecheck-ize.
llvm-svn: 162433
2012-08-23 07:10:51 +00:00
Benjamin Kramer e07728b936 SimplifyLibCalls: Give all safely-shrinkable libcalls the same treatment.
llvm-svn: 162383
2012-08-22 19:39:15 +00:00
Chad Rosier 1df1fb511f Whitespace.
llvm-svn: 162370
2012-08-22 17:34:11 +00:00
Chad Rosier 671dc096ae Add test case for r162368.
llvm-svn: 162369
2012-08-22 17:31:04 +00:00
Stepan Dyatkovskiy 99120e04be Rejected 169195. As Duncan commented, bitcasting to proper type is wrong approach. We need to insert some valid TRANCATE node here.
llvm-svn: 162354
2012-08-22 09:33:55 +00:00
Akira Hatanaka ad4950258b Add register Mips::GP to the list of reserved registers if target is bare-metal
to prevent it from being clobbered. mips uses $gp to access small data section.

This bug was originally reported by Carl Norum.

llvm-svn: 162340
2012-08-22 03:18:13 +00:00
Akira Hatanaka 9d957842e1 Add option disable-mips-delay-filler. Turn on mips' delay slot filler by
default.

Patch by Carl Norum.

llvm-svn: 162339
2012-08-22 02:51:28 +00:00
Jack Carter 77064c0590 For mips64 switch statements in subroutines could generate
within the codegen EK_GPRel64BlockAddress. This was not 
supported for direct object output and resulted in an assertion.

This change adds support for EK_GPRel64BlockAddress for 
direct object.

One fallout from this is to turn on rela relocations 
for mips64 to match gas.

llvm-svn: 162334
2012-08-22 00:49:30 +00:00
Rafael Espindola 2c06448360 Fix macros arguments with an underscore, dot or dollar in them. This is based
on a patch by Andy/PaX. I added the support for dot and dollar.

llvm-svn: 162298
2012-08-21 18:29:30 +00:00
Rafael Espindola af6da83a2c Make the wording in of the "expected identifier" error in the .macro directive
consistent with the other "expected identifier" errors.
Extracted from the Andy/PaX patch. I added the test.

llvm-svn: 162291
2012-08-21 17:12:05 +00:00
Tim Northover f39c1a3f72 Add correct set of regression tests for r162094 commit.
llvm-svn: 162276
2012-08-21 12:43:03 +00:00
Chandler Carruth c908ca1766 Port the global copy optimization from the SROA pass to InstCombine.
This optimization is really just replacing allocas wholesale with
globals, there is no scalarization.

The underlying motivation for this patch is to simplify the SROA pass
and focus it on splitting and promoting allocas.

llvm-svn: 162271
2012-08-21 08:39:44 +00:00
Kostya Serebryany f4be019fba [asan] add code to detect global initialization fiasco in C/C++. The sub-pass is off by default for now. Patch by Reid Watson. Note: this patch changes the interface between LLVM and compiler-rt parts of asan. The corresponding patch to compiler-rt will follow.
llvm-svn: 162268
2012-08-21 08:24:25 +00:00
Jakob Stoklund Olesen 74e6f9fc65 Add a missing def flag.
*** Bad machine code: Explicit definition marked as use ***
- function:    test_cos
- basic block: BB#0 L.entry (0x7ff2a2024fd0)
- instruction: VSETLNi32 %D11, %D11<undef>, %R0, 0, pred:14, pred:%noreg, %Q5<imp-use,kill>, %Q5<imp-def>
- operand 0:   %D11

llvm-svn: 162247
2012-08-21 00:34:53 +00:00
Jakob Stoklund Olesen 7d33c5739f Don't add CFG edges for redundant conditional branches.
IR that hasn't been through SimplifyCFG can look like this:

  br i1 %b, label %r, label %r

Make sure we don't create duplicate Machine CFG edges in this case.

Fix the machine code verifier to accept conditional branches with a
single CFG edge.

llvm-svn: 162230
2012-08-20 21:39:52 +00:00
Michael Liao 10ff96ce8c fix a case where all operands of BUILD_VECTOR are undefined
llvm-svn: 162214
2012-08-20 17:59:18 +00:00
Stepan Dyatkovskiy 6ee89aafc8 Forget to add testcase for r162195. Sorry.
llvm-svn: 162196
2012-08-20 08:03:18 +00:00
Nadav Rotem 178250ad87 When unsafe math is used, we can use commutative FMAX and FMIN. In some cases
this allows for better code generation.

Added a new DAGCombine transformation to convert FMAX and FMIN to FMANC and
FMINC, which are commutative.

For example:

  movaps  %xmm0, %xmm1
  movsd LC(%rip), %xmm0
  minsd %xmm1, %xmm0

becomes:

  minsd LC(%rip), %xmm0

llvm-svn: 162187
2012-08-19 13:06:16 +00:00
Benjamin Kramer 9d03242fcf InstCombine: Fix a crasher when encountering a function pointer.
llvm-svn: 162180
2012-08-18 22:04:34 +00:00
Jakob Stoklund Olesen dded061f85 Also combine zext/sext into selects for ARM.
This turns common i1 patterns into predicated instructions:

  (add (zext cc), x) -> (select cc (add x, 1), x)
  (add (sext cc), x) -> (select cc (add x, -1), x)

For a function like:

  unsigned f(unsigned s, int x) {
    return s + (x>0);
  }

We now produce:

  cmp r1, #0
  it  gt
  addgt.w r0, r0, #1

Instead of:

  movs  r2, #0
  cmp r1, #0
  it  gt
  movgt r2, #1
  add r0, r2

llvm-svn: 162177
2012-08-18 21:25:22 +00:00
Jakob Stoklund Olesen aab43dbfbb Also pass logical ops to combineSelectAndUse.
Add these transformations to the existing add/sub ones:

  (and (select cc, -1, c), x) -> (select cc, x, (and, x, c))
  (or  (select cc, 0, c), x)  -> (select cc, x, (or, x, c))
  (xor (select cc, 0, c), x)  -> (select cc, x, (xor, x, c))

The selects can then be transformed to a single predicated instruction
by peephole.

This transformation will make it possible to eliminate the ISD::CAND,
COR, and CXOR custom DAG nodes.

llvm-svn: 162176
2012-08-18 21:25:16 +00:00
Benjamin Kramer 8c2a733c55 InstCombine: Add a couple of fabs identities for comparing with 0.0.
llvm-svn: 162174
2012-08-18 20:06:47 +00:00
Benjamin Kramer 000132454c SimplifyLibcalls: Add fabs and trunc to the list of libcalls that are safe to shrink from double to float.
llvm-svn: 162173
2012-08-18 19:27:32 +00:00
Nadav Rotem a136939fa9 Reapply r162160 with a fix: Optimize Arith->Trunc->SETCC sequence to allow better compare/branch code.
llvm-svn: 162172
2012-08-18 17:53:03 +00:00
Nadav Rotem c324af609e Revert r162160 because it made a few buildbots fail.
llvm-svn: 162164
2012-08-18 05:02:36 +00:00
Nadav Rotem 2cb14a5c4b The X86 backend has a number of optimizations for SETCC nodes which use
arithmetic instructions. However, when small data types are used, a truncate
node appears between the SETCC node and the arithmetic operation. This patch
adds support for this pattern.

Before:
  xorl  %esi, %edi
  testb %dil, %dil
  setne %al
  ret

After:
  xorb  %dil, %sil
  setne %al
  ret

rdar://12081007

llvm-svn: 162160
2012-08-18 02:43:28 +00:00
Eli Friedman 79a6b30d8a Make atomic load and store of pointers work. Tighten verification of atomic operations
so other unexpected operations don't slip through.  Based on patch by Logan Chien.
PR11786/PR13186.

llvm-svn: 162146
2012-08-17 23:24:29 +00:00
Jakob Stoklund Olesen 7b1a2e8f02 Avoid folding ADD instructions with FI operands.
PEI can't handle the pseudo-instructions. This can be removed when the
pseudo-instructions are replaced by normal predicated instructions.

Fixes PR13628.

llvm-svn: 162130
2012-08-17 20:55:34 +00:00
Benjamin Kramer 34764fe2e4 MemoryBuiltins: Properly guard ObjectSizeOffsetVisitor against cycles in the IR.
The previous fix only checked for simple cycles, use a set to catch longer
cycles too.

Drop the broken check from the ObjectSizeOffsetEvaluator. The BoundsChecking
pass doesn't have to deal with invalid IR like InstCombine does.

llvm-svn: 162120
2012-08-17 19:26:41 +00:00
Bill Wendling 34bc34ecae Change the `linker_private_weak_def_auto' linkage to `linkonce_odr_auto_hide' to
make it more consistent with its intended semantics.

The `linker_private_weak_def_auto' linkage type was meant to automatically hide
globals which never had their addresses taken. It has nothing to do with the
`linker_private' linkage type, which outputs the symbols with a `l' (ell) prefix
among other things.

The intended semantic is more like the `linkonce_odr' linkage type.

Change the name of the linkage type to `linkonce_odr_auto_hide'. And therefore
changing the semantics so that it produces the correct output for the linker.

Note: The old linkage name `linker_private_weak_def_auto' will still parse but
is not a synonym for `linkonce_odr_auto_hide'. This should be removed in 4.0.
<rdar://problem/11754934>

llvm-svn: 162114
2012-08-17 18:33:14 +00:00
Rafael Espindola 9a16735e22 Assert that dominates is not given a multiple edge. Finding out if we have
multiple edges between two blocks is linear. If the caller is iterating all
edges leaving a BB that would be a square time algorithm. It is more efficient
to have the callers handle that case.

Currently the only callers are:
* GVN: already avoids the multiple edge case.
* Verifier: could only hit this assert when looking at an invalid invoke. Since
it already rejects the invoke, just avoid computing the dominance for it.

llvm-svn: 162113
2012-08-17 18:21:28 +00:00
Benjamin Kramer ca7ca4f6c6 TargetLowering: Use the large shift amount during legalize types. The legalizer may call us with an overly large type.
llvm-svn: 162101
2012-08-17 15:54:21 +00:00
Benjamin Kramer 4901f0d2a2 Guard MemoryBuiltins against self-looping GEPs, which can occur in unreachable code due to constant propagation.
Fixes PR13621.

llvm-svn: 162098
2012-08-17 14:16:37 +00:00
Benjamin Kramer 2f47a3fb07 Fix broken check lines.
I really need to find a way to automate this, but I can't come up with a regex
that has no false positives while handling tricky cases like custom check
prefixes.

llvm-svn: 162097
2012-08-17 12:28:26 +00:00
Tim Northover f66181530f Implement NEON domain switching for scalar <-> S-register vmovs on ARM
llvm-svn: 162094
2012-08-17 11:32:52 +00:00
Jakob Stoklund Olesen 0ea1fce6b4 Add ADD and SUB to the predicable ARM instructions.
It is not my plan to duplicate the entire ARM instruction set with
predicated versions. We need a way of representing predicated
instructions in SSA form without requiring a separate opcode.

Then the pseudo-instructions can go away.

llvm-svn: 162061
2012-08-16 23:21:55 +00:00
Rafael Espindola cc80cdebb9 Teach GVN to reason about edges dominating uses. This allows it to handle cases
where some fact lake a=b dominates a use in a phi, but doesn't dominate the
basic block itself.

This feature could also be implemented by splitting critical edges, but at least
with the current algorithm reasoning about the dominance directly is faster.

The time for running "opt -O2" in the testcase in pr10584 is 1.003 times slower
and on gcc as a single file it is 1.0007 times faster.

llvm-svn: 162023
2012-08-16 15:09:43 +00:00
Jush Lu 26088cb30e [arm-fast-isel] Add support for fastcc.
Without fastcc support, the caller just falls through to CallingConv::C
for fastcc, but callee still uses fastcc, this inconsistency of calling
convention is a problem, and fastcc support can fix it.

llvm-svn: 162013
2012-08-16 05:15:53 +00:00
Akira Hatanaka 269d3fd101 Test case for r162008.
llvm-svn: 162009
2012-08-16 03:48:41 +00:00
Jakob Stoklund Olesen 6cb96120f1 Fold predicable instructions into MOVCC / t2MOVCC.
The ARM select instructions are just predicated moves. If the select is
the only use of an operand, the instruction defining the operand can be
predicated instead, saving one instruction and decreasing register
pressure.

This implementation can turn AND/ORR/EOR instructions into their
corresponding ANDCC/ORRCC/EORCC variants. Ideally, we should be able to
predicate any instruction, but we don't yet support predicated
instructions in SSA form.

llvm-svn: 161994
2012-08-15 22:16:39 +00:00
Bill Wendling 2c8685e327 Rework test so that it reproduces the error without the horrible flag.
llvm-svn: 161989
2012-08-15 21:10:18 +00:00
Bill Wendling d63f1f5a9c Remove invalid test. This test requires that dead basic blocks be kept
around. That's not how we do things. Besides, the commit message tells us that
it is covered by the GCC test suite.

------------------------------------------------------------------------
r127497 | zwarich | 2011-03-11 13:51:56 -0800 (Fri, 11 Mar 2011) | 3 lines

Fix the GCC test suite issue exposed by r127477, which was caused by stack
protector insertion not working correctly with unreachable code. Since that
revision was rolled out, this test doesn't actual fail before this fix.
------------------------------------------------------------------------

llvm-svn: 161985
2012-08-15 20:54:09 +00:00
Evan Cheng eec6bc6270 Use vld1/vst1 to load/store f64 if alignment is < 4 and the target allows unaligned access. rdar://12091029
llvm-svn: 161962
2012-08-15 17:44:53 +00:00
Michael Liao 69e172a6f0 fix infinite loop in instcombine with more than 4GB memcpy
- memcpy size is wrongly truncated into 32-bit and treat 8GB memcpy is
  0-sized memcpy
- as 0-sized memcpy/memset is already removed before SimplifyMemTransfer
  and SimplifyMemSet in visitCallInst, replace 0 checking with
  assertions.
- replace getZExtValue() with getLimitedValue() according to
  Eli Friedman

llvm-svn: 161923
2012-08-15 03:49:59 +00:00
Anton Korobeynikov c6d945b11a The names of VFP variants of half-to-float conversion instructions were
reversed. This leads to wrong codegen for float-to-half conversion
intrinsics which are used to support storage-only fp16 type.
NEON variants of same instructions are fine.

llvm-svn: 161907
2012-08-14 23:36:01 +00:00
Michael Liao 34107b9177 fix PR11334
- FP_EXTEND only support extending from vectors with matching elements.
  This results in the scalarization of extending to v2f64 from v2f32,
  which will be legalized to v4f32 not matching with v2f64.
- add X86-specific VFPEXT supproting extending from v4f32 to v2f64.
- add BUILD_VECTOR lowering helper to recover back the original
  extending from v4f32 to v2f64.
- test case is enhanced to include different vector width.

llvm-svn: 161894
2012-08-14 21:24:47 +00:00
Kostya Serebryany bf479714f9 [asan] insert crash basic blocks inline as opposed to inserting them at the end of the function. This doesn't seem to fix or break anything, but is considered to be more friendly to downstream passes (test change)
llvm-svn: 161871
2012-08-14 14:05:50 +00:00
Craig Topper 2a40418a99 Change greater than to greater than or equal so that an identical sized store to the same offset is treated as completing overwriting.
llvm-svn: 161857
2012-08-14 07:32:05 +00:00
Nadav Rotem 70409991bc During the CodeGenPrepare we often lower intrinsics (such as objsize)
and allow some optimizations to turn conditional branches into unconditional.
This commit adds a simple control-flow optimization which merges two consecutive
basic blocks which are connected by a single edge. This allows the codegen to
operate on larger basic blocks.

rdar://11973998

llvm-svn: 161852
2012-08-14 05:19:07 +00:00
NAKAMURA Takumi 245920463d llvm/test/CodeGen/ARM/floorf.ll: Add explicit -mtriple=arm-unknown-unknown, or it fails on msvc.
llvm-svn: 161825
2012-08-14 00:56:06 +00:00