Commit Graph

16812 Commits

Author SHA1 Message Date
Michael Liao e7e828fd64 fix PR13577, an issue introduced by r161687
- FCMOV only supports a subset of X86 conditions. Skip boolean
  simplification if X86 condition is not valid for FCMOV.
- add a minimal test case for PR13577.

llvm-svn: 161732
2012-08-11 23:47:06 +00:00
Benjamin Kramer ef6494f24d PR13578: Teach MachineCSE that instructions that use a constant register can be CSE'd safely.
This is common e.g. when doing rip-relative addressing on x86_64.

llvm-svn: 161728
2012-08-11 19:05:13 +00:00
Manman Ren 1acb6707cd X86: when we are auto-detecting the subtarget features, make sure we turn on
FeatureFastUAMem for Nehalem, Westmere and Sandy Bridge.

FeatureFastUAMem is already on if we pass in nehalem or westmere as a command
argument.

rdar: 7252306
llvm-svn: 161717
2012-08-10 23:43:32 +00:00
Eli Friedman 4c923b3b3f The normal edge of an invoke is not allowed to branch to a block with a
landingpad.  Enforce it in the verifier, and fix the regression tests to match.

llvm-svn: 161697
2012-08-10 20:55:20 +00:00
Michael Liao 5248e9913f add X86-specific DAG optimization to simplify boolean test
- if a boolean test (X86ISD::CMP or X86ISD:SUB) checks a boolean value
  generated from X86ISD::SETCC, try to simplify the boolean value
  generation and checking by reusing the original EFLAGS with proper
  condition code
- add hooks to X86 specific SETCC/BRCOND/CMOV, the major 3 places
  consuming EFLAGS

part of patches fixing PR12312

llvm-svn: 161687
2012-08-10 19:58:13 +00:00
Pete Cooper 0deca6be79 Fix crash when when do lto on Bullet. Dynamic GEPs in SROA were incorrectly being applied to all accesses to an alloca, not just the ones which read from the GEP. Thanks to Evan for reducing the test. rdar://11861001
llvm-svn: 161654
2012-08-10 03:26:36 +00:00
Jakob Stoklund Olesen 8c28ac9ec9 Update edge weights correctly in replaceSuccessor().
When replacing Old with New, it can happen that New is already a
successor. Add the old and new edge weights instead of creating a
duplicate edge.

llvm-svn: 161653
2012-08-10 03:23:27 +00:00
Jakob Stoklund Olesen d9b66506a3 Reapply r161633-161634 "Partition use lists so defs always come before uses.""
No changes to these patches, MRI needed to be notified when changing
uses into defs and vice versa.

llvm-svn: 161644
2012-08-10 00:21:30 +00:00
Jakob Stoklund Olesen acd27c9279 Revert r161633-161634 "Partition use lists so defs always come before uses."
These commits broke a number of buildbots.

llvm-svn: 161640
2012-08-09 23:31:36 +00:00
Jakob Stoklund Olesen df01e00710 Partition use lists so defs always come before uses.
This makes it possible to speed up def_iterator by stopping at the first
use. This makes def_empty() and getUniqueVRegDef() much faster when
there are many uses.

In a +Asserts build, LiveVariables is 100x faster in one case because
getVRegDef() has an assertion that would scan to the end of a
def_iterator chain.

Spill weight calculation is significantly faster (300x in one case)
because isTriviallyReMaterializable() calls MRI->isConstantPhysReg(%RIP)
which calls def_empty(%RIP).

llvm-svn: 161634
2012-08-09 22:49:46 +00:00
Jakob Stoklund Olesen 7d7051ca3c Don't use pointer-pointers for the register use lists.
Use a more conventional doubly linked list where the Prev pointers form
a cycle. This means it is no longer necessary to adjust the Prev
pointers when reallocating the VRegInfo array.

The test changes are required because the register allocation hint is
using the use-list order to break ties.

llvm-svn: 161633
2012-08-09 22:49:42 +00:00
Jakob Stoklund Olesen 4238a89db8 Don't modify MO while use_iterator is still pointing to it.
llvm-svn: 161626
2012-08-09 22:08:24 +00:00
Chandler Carruth 4cbe653ace Teach the LLVM test makefile to run the extra Clang tools' test suites
as part of check-all.

llvm-svn: 161610
2012-08-09 20:26:41 +00:00
Jack Carter 120a30a732 Another 32 to 64 bit sign extension bug.
The fields in the td definition were switched.

llvm-svn: 161607
2012-08-09 19:43:18 +00:00
Arnold Schwaighofer 81b2eec1ab Patch to implement UMLAL/SMLAL instructions for the ARM architecture
This patch corrects the definition of umlal/smlal instructions and adds support
for matching them to the ARM dag combiner.

Bug 12213

Patch by Yin Ma!

llvm-svn: 161581
2012-08-09 15:25:52 +00:00
Nadav Rotem e0f84d31c8 Fix the legalization of ExtLoad on ARM. ExpandUnalignedLoad did not properly
handle the cases where the memory value type was illegal. 
PR 13111. 

llvm-svn: 161565
2012-08-09 01:56:44 +00:00
Bob Wilson 4c65c505e0 Add test triples to fix win32 failures. Revert workaround from r161292.
I don't have a win32 system to test, so hopefully I got them all fixed here.

llvm-svn: 161519
2012-08-08 20:31:37 +00:00
NAKAMURA Takumi ec934cd64d llvm/test/MC/COFF/seh.s: Fixup corresponding to r161487.
llvm-svn: 161489
2012-08-08 13:27:04 +00:00
Bill Wendling ce4fe41d21 Add `.pushsection', `.popsection', and `.previous' directives to Darwin ASM.
There are situations where inline ASM may want to change the section -- for
instance, to create a variable in the .data section. However, it cannot do this
without (potentially) restoring to the wrong section. E.g.:

  asm volatile (".section __DATA, __data\n\t"
                ".globl _fnord\n\t"
                "_fnord: .quad 1f\n\t"
                ".text\n\t"
                "1:" :::);

This may be wrong if this is inlined into a function that has a "section"
attribute. The user should use `.pushsection' and `.popsection' here instead.

The addition of `.previous' is added for completeness.
<rdar://problem/12048387>

llvm-svn: 161477
2012-08-08 06:30:30 +00:00
Eli Friedman 08ec0a8122 isAllocLikeFn is allowed to return true for functions which read memory; make
sure we account for that correctly in DeadStoreElimination.  Fixes a regression
from r158919.  PR13547.

llvm-svn: 161468
2012-08-08 02:17:32 +00:00
Manman Ren 1be131ba27 X86: enable CSE between CMP and SUB
We perform the following:
1> Use SUB instead of CMP for i8,i16,i32 and i64 in ISel lowering.
2> Modify MachineCSE to correctly handle implicit defs.
3> Convert SUB back to CMP if possible at peephole.

Removed pattern matching of (a>b) ? (a-b):0 and like, since they are handled
by peephole now.

rdar://11873276

llvm-svn: 161462
2012-08-08 00:51:41 +00:00
Dan Gohman b948736002 Avoid recomputing the unique exit blocks and their insert points when doing
multiple scalar promotions on a single loop. This also has the effect of
preserving the order of stores sunk out of loops, which is aesthetically
pleasing, and it happens to fix the testcase in PR13542, though it doesn't
fix the underlying problem.

llvm-svn: 161459
2012-08-08 00:00:26 +00:00
Bob Wilson 61f3ad5759 Fix a serious typo in InstCombine's optimization of comparisons.
An unsigned value converted to floating-point will always be greater than
a negative constant.  Unfortunately InstCombine reversed the check so that
unsigned values were being optimized to always be greater than all positive
floating-point constants.  <rdar://problem/12029145>

llvm-svn: 161452
2012-08-07 22:35:16 +00:00
Evan Cheng fbdd25c135 X86 cmp lowering is looking past truncate on the condition node. It should only
do so when the high bits are known zero. This caused a subtle miscompilation.

rdar://12027825 

llvm-svn: 161451
2012-08-07 22:21:00 +00:00
Alexey Samsonov 947228c4f7 Fix the representation of debug line table in DebugInfo LLVM library,
and "instruction address -> file/line" lookup.

Instead of plain collection of rows, debug line table for compilation unit is now
treated as the number of row ranges, describing sequences (series of contiguous machine
instructions). The sequences are not always listed in the order of increasing
address, so previously used std::lower_bound() sometimes produced wrong results.
Now the instruction address lookup consists of two stages: finding the correct
sequence, and searching for address in range of rows for this sequence.

llvm-svn: 161414
2012-08-07 11:46:57 +00:00
Benjamin Kramer c99d0e9186 PR13095: Give an inline cost bonus to functions using byval arguments.
We give a bonus for every argument because the argument setup is not needed
anymore when the function is inlined. With this patch we interpret byval
arguments as a compact representation of many arguments. The byval argument
setup is implemented in the backend as an inline memcpy, so to model the
cost as accurately as possible we take the number of pointer-sized elements
in the byval argument and give a bonus of 2 instructions for every one of
those. The bonus is capped at 8 elements, which is the number of stores
at which the x86 backend switches from an expanded inline memcpy to a real
memcpy. It would be better to use the real memcpy threshold from the backend,
but it's not available via TargetData.

This change brings the performance of c-ray in line with gcc 4.7. The included
test case tries to reproduce the c-ray problem to catch regressions for this
benchmark early, its performance is dominated by the inline decision of a
specific call.

This only has a small impact on most code, more on x86 and arm than on x86_64
due to the way the ABI works. When building LLVM for x86 it gives a small
inline cost boost to virtually any function using StringRef or STL allocators,
but only a 0.01% increase in overall binary size. The size of gcc compiled by
clang actually shrunk by a couple bytes with this patch applied, but not
significantly.

llvm-svn: 161413
2012-08-07 11:13:19 +00:00
Chandler Carruth 2f6cf4884c Fix PR13412, a nasty miscompile due to the interleaved
instsimplify+inline strategy.

The crux of the problem is that instsimplify was reasonably relying on
an invariant that is true within any single function, but is no longer
true mid-inline the way we use it. This invariant is that an argument
pointer != a local (alloca) pointer.

The fix is really light weight though, and allows instsimplify to be
resiliant to these situations: when checking the relation ships to
function arguments, ensure that the argumets come from the same
function. If they come from different functions, then none of these
assumptions hold. All credit to Benjamin Kramer for coming up with this
clever solution to the problem.

llvm-svn: 161410
2012-08-07 10:59:59 +00:00
Chandler Carruth 881d0a7966 Add a much more conservative strategy for aligning branch targets.
Previously, MBP essentially aligned every branch target it could. This
bloats code quite a bit, especially non-looping code which has no real
reason to prefer aligned branch targets so heavily.

As Andy said in review, it's still a bit odd to do this without a real
cost model, but this at least has much more plausible heuristics.

Fixes PR13265.

llvm-svn: 161409
2012-08-07 09:45:24 +00:00
Manman Ren cb36b8c2e6 MachineCSE: Update the heuristics for isProfitableToCSE.
If the result of a common subexpression is used at all uses of the candidate
expression, CSE should not increase the live range of the common subexpression.

rdar://11393714 and rdar://11819721

llvm-svn: 161396
2012-08-07 06:16:46 +00:00
Jack Carter f4946cfbb9 The define for 64 bit sign extension neglected to
initialize fields of the class that it used.

The result was nonsense code.

Before:
0000000000000000 <foo>:
   0:    00441100     0x441100
   4:    03e00008     jr    ra
   8:    00000000     nop

After:
0000000000000000 <foo>:
   0:    00041000     sll    v0,a0,0x0
   4:    03e00008     jr    ra
   8:    00000000     nop 

llvm-svn: 161377
2012-08-07 00:35:22 +00:00
Jack Carter 612c66314c The Mips64InstrInfo.td definitions DynAlloc64 LEA_ADDiu64
were using a class defined for 32 bit instructions and 
thus the instruction was for addiu instead of daddiu.

This was corrected by adding the instruction opcode as a 
field in the  base class to be filled in by the defs.

llvm-svn: 161359
2012-08-06 23:29:06 +00:00
Jack Carter 84491abb20 Mips relocations R_MIPS_HIGHER and R_MIPS_HIGHEST.
These 2 relocations gain access to the 
highest and the second highest 16 bits
of a 64 bit object.

R_MIPS_HIGHER %higher(A+S)
The %higher(x) function is [ (((long long) x + 0x80008000LL) >> 32) & 0xffff ]. 

R_MIPS_HIGHEST %highest(A+S)
The %highest(x) function is [ (((long long) x + 0x800080008000LL) >> 48) & 0xffff ]. 

llvm-svn: 161348
2012-08-06 21:26:03 +00:00
Hal Finkel 33e529d56b MFTB on PPC64 should really be encoded using MFSPR.
The MFTB instruction itself is being phased out, and its functionality
is provided by MFSPR. According to the ISA docs, using MFSPR works on all known
chips except for the 601 (which did not have a timebase register anyway)
and the POWER3.

Thanks to Adhemerval Zanella for pointing this out!

llvm-svn: 161346
2012-08-06 21:21:44 +00:00
Craig Topper ab47fe4e16 Implement proper handling for pcmpistri/pcmpestri intrinsics. Requires custom handling in DAGISelToDAG due to limitations in TableGen's implicit def handling. Fixes PR11305.
llvm-svn: 161318
2012-08-06 06:22:36 +00:00
Craig Topper 812005e562 Update test to check for r161305
llvm-svn: 161307
2012-08-05 09:06:28 +00:00
Hal Finkel 70381a7b18 Add readcyclecounter lowering on PPC64.
On PPC64, this can be done with a simple TableGen pattern.
To enable this, I've added the (otherwise missing) readcyclecounter
SDNode definition to TargetSelectionDAG.td.

llvm-svn: 161302
2012-08-04 14:10:46 +00:00
Anton Korobeynikov 218aaf6d04 Add stack spill / reload instructions for DTriple and DQuad register classes, which
were missed for no reason. This fixes PR13377

llvm-svn: 161299
2012-08-04 13:16:12 +00:00
Bob Wilson 874886cd66 Refactor and check "onlyReadsMemory" before optimizing builtins.
This patch is mostly just refactoring a bunch of copy-and-pasted code, but
it also adds a check that the call instructions are readnone or readonly.
That check was already present for sin, cos, sqrt, log2, and exp2 calls, but
it was missing for the rest of the builtins being handled in this code.

llvm-svn: 161282
2012-08-03 23:29:17 +00:00
Akira Hatanaka 22bec282e9 1. Redo mips16 instructions to avoid multiple opcodes for same instruction.
Change these to patterns.
2. Add another 16 instructions.

Patch by Reed Kotler.

llvm-svn: 161272
2012-08-03 22:57:02 +00:00
Bob Wilson fa59485b94 Fix memcmp code-gen to honor -fno-builtin.
I noticed that SelectionDAGBuilder::visitCall was missing a check for memcmp
in TargetLibraryInfo, so that it would use custom code for memcmp calls even
with -fno-builtin.  I also had to add a new -disable-simplify-libcalls option
to llc so that I could write a test for this.

llvm-svn: 161262
2012-08-03 21:26:18 +00:00
Bob Wilson 3e6fa462f3 Fall back to selection DAG isel for calls to builtin functions.
Fast isel doesn't currently have support for translating builtin function
calls to target instructions.  For embedded environments where the library
functions are not available, this is a matter of correctness and not
just optimization.  Most of this patch is just arranging to make the
TargetLibraryInfo available in fast isel.  <rdar://problem/12008746>

llvm-svn: 161232
2012-08-03 04:06:28 +00:00
Jush Lu 4705da9020 [arm-fast-isel] Add support for shl, lshr, and ashr.
llvm-svn: 161230
2012-08-03 02:37:48 +00:00
NAKAMURA Takumi edac6ee6aa [CMake] Add yaml2obj to check-llvm.
llvm-svn: 161229
2012-08-03 00:45:32 +00:00
Matt Beaumont-Gay 99fc2e19a6 Move test yaml files under Inputs until they are converted to be the actual
test files.

llvm-svn: 161219
2012-08-02 21:52:49 +00:00
Manman Ren ba8122cc25 X86 Peephole: fold loads to the source register operand if possible.
Add more comments and use early returns to reduce nesting in isLoadFoldable.
Also disable folding for V_SET0 to avoid introducing a const pool entry and
a const pool load.

rdar://10554090 and rdar://11873276

llvm-svn: 161207
2012-08-02 19:37:32 +00:00
Michael J. Spencer 1ffd9de4f1 Add yaml2obj. A utility to convert YAML to binaries.
yaml2obj takes a textual description of an object file in YAML format
and outputs the binary equivalent. This greatly simplifies writing
tests that take binary object files as input.

llvm-svn: 161205
2012-08-02 19:16:56 +00:00
Akira Hatanaka fffad897f2 Set transient stack alignment in constructor of MipsFrameLowering and re-enable
test o32_cc_vararg.ll.

llvm-svn: 161189
2012-08-02 18:15:13 +00:00
Jiangning Liu fa18005a4c Support fpv4 for ARM Cortex-M4.
llvm-svn: 161163
2012-08-02 08:35:55 +00:00
Jiangning Liu 6a43bf7d74 Fix #13035, a bug around Thumb instruction LDRD/STRD with negative #0 offset index issue.
llvm-svn: 161162
2012-08-02 08:29:50 +00:00
Jiangning Liu 288e1af8c8 Fix #13138, a bug around ARM instruction DSB encoding and decoding issue.
llvm-svn: 161161
2012-08-02 08:21:27 +00:00
Jiangning Liu 10dd40e42d Fix #13241, a bug around shift immediate operand for ARM instruction ADR.
llvm-svn: 161159
2012-08-02 08:13:13 +00:00
NAKAMURA Takumi 7020f51622 llvm/test/CodeGen/X86/fold-pcmpeqd-1.ll: Make sure this is testing without +avx.
FIXME: Could +avx be checked here too?
llvm-svn: 161156
2012-08-02 06:36:56 +00:00
NAKAMURA Takumi aaca1e690d llvm/test/CodeGen/X86/fold-pcmpeqd-1.ll: Rewrite expressions to pass regardless of PR11031.
- Relax to match even if epilogue (pop %ebp) were emitted.
  - Assume the return value is stored to %xmm0.

llvm-svn: 161155
2012-08-02 06:33:58 +00:00
Manman Ren 5759d01230 X86 Peephole: fold loads to the source register operand if possible.
Machine CSE and other optimizations can remove instructions so folding
is possible at peephole while not possible at ISel.

This patch is a rework of r160919 and was tested on clang self-host on my local
machine.

rdar://10554090 and rdar://11873276

llvm-svn: 161152
2012-08-02 00:56:42 +00:00
Eric Christopher b1b9451337 Temporarily revert c23b933d5f8be9b51a1d22e717c0311f65f87dcd. It's causing
failures in the debug testsuite and possibly PR13486.

llvm-svn: 161121
2012-08-01 18:19:01 +00:00
Matt Beaumont-Gay 7947aecaf1 Line endings.
llvm-svn: 161117
2012-08-01 16:42:35 +00:00
Nuno Lopes b1c473998d fix 'make check' when ocamlopt returns the compiler path with CFLAGS (and there's a cflag with a = char)
llvm-svn: 161114
2012-08-01 15:50:34 +00:00
Elena Demikhovsky 3cb3b0045c Added FMA functionality to X86 target.
llvm-svn: 161110
2012-08-01 12:06:00 +00:00
Nick Lewycky fb78083b1c Stay rational; don't assert trying to take the square root of a negative value.
If it's negative, the loop is already proven to be infinite. Fixes PR13489!

llvm-svn: 161107
2012-08-01 09:14:36 +00:00
Akira Hatanaka d1c43cee24 Add definitions of two subclasses of MipsFrameLowering, Mips16FrameLowering and
MipsSEFrameLowering.

Implement MipsSEFrameLowering::hasReservedCallFrame. Call frames will not be
reserved if there is a call with a large call frame or there are variable sized
objects on the stack.

llvm-svn: 161090
2012-07-31 22:50:19 +00:00
Akira Hatanaka 02de0e4425 Let PEI::calculateFrameObjectOffsets compute the final stack size rather than
computing it in MipsFrameLowering::emitPrologue.

llvm-svn: 161078
2012-07-31 21:28:49 +00:00
Akira Hatanaka 33a25af5a8 Expand DYNAMIC_STACKALLOC nodes rather than doing custom-lowering.
The frame object which points to the dynamically allocated area will not be
needed after changes are made to cease reserving call frames.

llvm-svn: 161076
2012-07-31 20:54:48 +00:00
Akira Hatanaka beda2241a4 When store nodes or memcpy nodes are created to copy the function call
arguments to the stack in MipsISelLowering::LowerCall, use stack pointer and
integer offset operands rather than frame object operands.

llvm-svn: 161068
2012-07-31 18:46:41 +00:00
Chad Rosier 710be7df71 [x86 frame lowering] In 32-bit mode, use ESI as the base pointer.
Previously, we were using EBX, but PIC requires the GOT to be in EBX before 
function calls via PLT GOT pointer.

llvm-svn: 161066
2012-07-31 18:29:21 +00:00
Akira Hatanaka 4ce7c4060d Fix type of LUXC1 and SUXC1. These instructions were incorrectly defined as
single-precision load and store.

Also avoid selecting LUXC1 and SUXC1 instructions during isel. It is incorrect
to map unaligned floating point load/store nodes to these instructions.

llvm-svn: 161063
2012-07-31 18:16:49 +00:00
Manman Ren 8c549b586c MachineSink: Sort the successors before trying to find SuccToSinkTo.
One motivating example is to sink an instruction from a basic block which has
two successors: one outside the loop, the other inside the loop. We should try
to sink the instruction outside the loop.

rdar://11980766

llvm-svn: 161062
2012-07-31 18:10:39 +00:00
Jakob Stoklund Olesen 0c807dfae2 Clear kill flags in removeCopyByCommutingDef().
We are extending live ranges, so kill flags are not accurate. They
aren't needed until they are recomputed after RA anyway.

<rdar://problem/11950722>

llvm-svn: 161023
2012-07-31 02:47:24 +00:00
Manman Ren 2b6a0dfd4c Reverse order of the two branches at end of a basic block if it is profitable.
We branch to the successor with higher edge weight first.
Convert from
     je    LBB4_8  --> to outer loop
     jmp   LBB4_14 --> to inner loop
to
     jne   LBB4_14
     jmp   LBB4_8

PR12750
rdar: 11393714

llvm-svn: 161018
2012-07-31 01:11:07 +00:00
Jim Grosbach 20666162e9 Keep empty assembly macro argument values in the middle of the list.
Empty macro arguments at the end of the list should be as-if not specified at
all, but those in the middle of the list need to be kept so as not to screw
up the positional numbering. E.g.:
.macro foo
foo_-bash___:
  nop
.endm

foo 1, 2, 3, 4
foo 1, , 3, 4

Should create two labels, "foo_1_2_3_4" and "foo_1__3_4".

rdar://11948769

llvm-svn: 161002
2012-07-30 22:44:17 +00:00
Pete Cooper 91244268d7 Consider address spaces for hashing and CSEing DAG nodes. Otherwise two loads from different x86 segments but the same address would get CSEd
llvm-svn: 160987
2012-07-30 20:23:19 +00:00
Kevin Enderby 5c490f1b8f Fix a bug in ARMMachObjectWriter::RecordRelocation() in ARMMachObjectWriter.cpp
where the other_half of the movt and movw relocation entries needs to get set
and only with the 16 bits of the other half.

rdar://10038370

llvm-svn: 160978
2012-07-30 18:46:15 +00:00
Nadav Rotem 77f1b9c477 When constant folding GEP expressions, keep the address space information of pointers.
Together with Ran Chachick <ran.chachick@intel.com>

llvm-svn: 160954
2012-07-30 07:25:20 +00:00
Manman Ren f87dd7c01b Revert r160920 and r160919 due to dragonegg and clang selfhost failure
llvm-svn: 160927
2012-07-29 02:44:09 +00:00
Nick Lewycky d2c3bdd269 Add testcases for GlobalOpt changes in r160693 and r160757.
llvm-svn: 160925
2012-07-29 01:15:37 +00:00
Manman Ren 9de95e779c X86 Peephole: fold loads to the source register operand if possible.
Trying to fix the bot by specifying a triple in the failing testing cases.

llvm-svn: 160920
2012-07-28 17:51:24 +00:00
Manman Ren 0fa3ab88ba X86 Peephole: fold loads to the source register operand if possible.
Machine CSE and other optimizations can remove instructions so folding
is possible at peephole while not possible at ISel.

rdar://10554090 and rdar://11873276

llvm-svn: 160919
2012-07-28 16:48:01 +00:00
Manman Ren 32367c063b X86 Peephole: fix PR13475 in optimizeCompare.
It is possible that an instruction can use and update EFLAGS.
When checking the safety, we should check the usage of EFLAGS first before
declaring it is safe to optimize due to the update.

llvm-svn: 160912
2012-07-28 03:15:46 +00:00
Eric Christopher 86ca9f9e11 Add a DW_AT_high_pc for CUs that are a single address range. Update
all tests accordingly.

Fixes PR13351.

Patch by shinichiro hamaji!

llvm-svn: 160899
2012-07-27 22:00:05 +00:00
Evan Cheng 249716e8ae Teach CodeGenPrep to look past bitcast when it's duplicating return instruction
into predecessor blocks to enable tail call optimization.

rdar://11958338

llvm-svn: 160894
2012-07-27 21:21:26 +00:00
Jakob Stoklund Olesen bc65e8f94e Add <imp-def> of super-register when lowering SUBREG_TO_REG.
Patch by Tyler Nowicki!

llvm-svn: 160888
2012-07-27 20:19:49 +00:00
Nuno Lopes 85591f899d fix PR13390: do not loop forever with self-referencing self instructions
llvm-svn: 160876
2012-07-27 18:21:15 +00:00
Nuno Lopes 20c7eb3549 fix infinite loop in instcombine in the presence of a (malformed) self-referencing select inst.
This can happen as long as the instruction is not reachable. Instcombine does generate these unreachable malformed selects when doing RAUW

llvm-svn: 160874
2012-07-27 18:03:57 +00:00
Pete Cooper abc13af9c6 Simplify demanded bits of select sources where the condition is a constant vector
llvm-svn: 160835
2012-07-26 23:10:24 +00:00
Pete Cooper e807e45bff Teach SimplifyDemandedBits how to look through fpext and fptrunc to simplify their operand
llvm-svn: 160823
2012-07-26 22:37:04 +00:00
Jakob Stoklund Olesen ceee4a9d0c Eliminate a batch of uses of sub_ss and sub_sd in the X86 target.
These idempotent sub-register indices don't do anything --- They simply
map XMM registers to themselves.  They no longer affect register classes
either since the SubRegClasses field has been removed from Target.td.

This patch replaces XMM->XMM EXTRACT_SUBREG and INSERT_SUBREG patterns
with COPY_TO_REGCLASS patterns which simply become COPY instructions.

The number of IMPLICIT_DEF instructions before register allocation is
reduced, and that is the cause of the test case changes.

llvm-svn: 160816
2012-07-26 21:40:42 +00:00
Duncan Sands 5651452076 Stop reassociate from looking through expressions of arbitrary complexity. This
is a temporary measure until my fix for PR13021 is ready.

llvm-svn: 160778
2012-07-26 09:26:40 +00:00
Craig Topper c7690ac7ac Make l/q suffixes on AVX forms of scalar convert instructions consistent with their non-AVX forms.
llvm-svn: 160775
2012-07-26 07:48:28 +00:00
Akira Hatanaka 64626fc20f Fix call setup for PIC.
Patch by Reed Kotler.

llvm-svn: 160774
2012-07-26 02:24:43 +00:00
Manman Ren e8c6b15137 Update testing case for Atom when disabling rematerialization in
TwoAddressInstructionPass.

The generated code for Atom has a different code sequence. This is realted
to commit r160749.

llvm-svn: 160755
2012-07-25 20:17:14 +00:00
Nuno Lopes f0626f2205 revert r160742: it's breaking CMake build
original commit msg:
MemoryBuiltins: add support to determine the size of strdup'ed non-constant strings

llvm-svn: 160751
2012-07-25 18:49:28 +00:00
Manman Ren cc1dc6dc11 Disable rematerialization in TwoAddressInstructionPass.
It is redundant; RegisterCoalescer will do the remat if it can't eliminate
the copy. Collected instruction counts before and after this. A few extra
instructions are generated due to spilling but it is normal to see these kinds
of changes with almost any small codegen change, according to Jakob.

This also fixed rdar://11830760 where xor is expected instead of movi0.

llvm-svn: 160749
2012-07-25 18:28:13 +00:00
Nuno Lopes f0441e04bd MemoryBuiltins: add support to determine the size of strdup'ed non-constant strings
llvm-svn: 160742
2012-07-25 17:29:22 +00:00
Rafael Espindola 11c38b9657 When a return struct pointer is passed in registers, the called has nothing
to pop.

llvm-svn: 160725
2012-07-25 13:41:10 +00:00
Duncan Sands 77a1f3b564 Don't perform an overaligned load in this test, since that's undefined
behaviour that might be exploited one day.

llvm-svn: 160714
2012-07-25 09:45:37 +00:00
Duncan Sands 0b875a0c29 When folding a load from a global constant, if the load started in the middle
of an array element (rather than at the beginning of the element) and extended
into the next element, then the load from the second element was being handled
wrong due to incorrect updating of the notion of which byte to load next.  This
fixes PR13442.  Thanks to Chris Smowton for reporting the problem, analyzing it
and providing a fix.

llvm-svn: 160711
2012-07-25 09:14:54 +00:00
Akira Hatanaka 5a69c235ae Eliminate the stack slot used to save the global base register.
The long branch pass (fixed in r160601) no longer uses the global base register
to compute addresses of branch destinations, so it is not necessary to reserve
a slot on the stack.

llvm-svn: 160703
2012-07-25 03:16:47 +00:00
Rafael Espindola a92cf29f0d Add a cpu to the test. Should fix the atom bot.
llvm-svn: 160701
2012-07-24 22:56:06 +00:00
Rafael Espindola f30e9bfb90 Add a triple to the test.
llvm-svn: 160698
2012-07-24 21:55:04 +00:00
Rafael Espindola a44e193a11 In order to correctly compile
struct s {
  double x1;
  float x2;
};
__attribute__((regparm(3))) struct s f(int a, int b, int c);
void g(void) {
  f(41, 42, 43);
}

We need to be able to represent passing the address of s to f (sret) in a
register (inreg). Turns out that all that is needed is to not mark them as
mutually incompatible.

llvm-svn: 160695
2012-07-24 21:40:17 +00:00
David Chisnall 5b8c1680de ELF does not imply GNU/Linux. Do not assume GNU conventions just because we
are targeting an ELF platform.  Only fold gs-relative (and fs-relative) loads
if it is actually sensible to do so for the target platform.

This fixes PR13438.

llvm-svn: 160687
2012-07-24 20:04:16 +00:00