Commit Graph

15877 Commits

Author SHA1 Message Date
Craig Topper 66a3597a4a Add vmfunc instruction to X86 assembler and disassembler.
llvm-svn: 150899
2012-02-19 01:39:49 +00:00
Rafael Espindola 82d957593e Don't skip debug instructions when looking for the insertion point of
the cast. If we do, we can end up with

   inst1
   ---------------  < Insertion point
   dbg inst
   new inst

instead of the desired

   inst1
   new inst
   ---------------  < Insertion point
   dbg inst

Another option would be for InsertNoopCastOfTo (or its callers) to move the
insertion point and we would end up with

   inst1
   dbg inst
   new inst
   ---------------  < Insertion point

but that complicates the callers. This fixes PR12018 (and firefox's build).

llvm-svn: 150884
2012-02-18 17:22:58 +00:00
Craig Topper ed7aa46366 Add X86 assembler and disassembler support for AMD SVM instructions. Original patch by Kay Tiong Khoo. Few tweaks by me for code density and to reduce replication.
llvm-svn: 150873
2012-02-18 08:19:49 +00:00
Eli Friedman 952d1f9f40 Fix a rather nasty regression from r150690: LHS != RHS does not imply LHS->stripPointerCasts() != RHS->stripPointerCasts().
llvm-svn: 150863
2012-02-18 03:29:25 +00:00
Eric Christopher c767c51b5d Testcase for the previous commit.
llvm-svn: 150852
2012-02-18 00:05:45 +00:00
Dan Gohman 0155f30a9c Calls and invokes with the new clang.arc.no_objc_arc_exceptions
metadata may still unwind, but only in ways that the ARC
optimizer doesn't need to consider. This permits more
aggressive optimization.

llvm-svn: 150829
2012-02-17 18:59:53 +00:00
David Chisnall 8fa1716508 It turns out that putting an 8-byte symbol in a 4-byte section makes Solaris ld sulk. GNU ld is perfectly happy with it, which is worrying for a whole other set of reasons...
Thanks to Anton, Duncan and Rafael for helping me track this down.
Pointy hat to Rafael for introducing the bug in the first place.

llvm-svn: 150811
2012-02-17 16:05:50 +00:00
Nick Lewycky aed0553360 Remove question.
llvm-svn: 150809
2012-02-17 09:55:20 +00:00
Nick Lewycky 68f9f9d9c8 Add support for invariant.start inside the static constructor evaluator. This is
useful to represent a variable that is const in the source but can't be constant
in the IR because of a non-trivial constructor. If globalopt evaluates the
constructor, and there was an invariant.start with no matching invariant.end
possible, it will mark the global constant afterwards.

llvm-svn: 150794
2012-02-17 06:59:21 +00:00
Chad Rosier fcd29ae390 [fast-isel] Add support for returning non-legal types with no sign- or zero-
entend flag.

llvm-svn: 150774
2012-02-17 01:21:28 +00:00
Bill Wendling ec713ee4b4 Use –mcpu=generic, so that the test will not fail when run on an Intel Atom
processor, due to the Atom scheduler producing an instruction sequence that is
different from that which is expected.
Patch by Michael Spencer!

llvm-svn: 150736
2012-02-16 22:42:48 +00:00
Benjamin Kramer b0d75c2f4e Disable machine copy propagation for now. It's known to be buggy (PR11940) and introduces subtle miscompiles in many places.
llvm-svn: 150703
2012-02-16 17:29:50 +00:00
Benjamin Kramer ea51f62e4b InstSimplify: Ignore pointer casts when constant folding compares between pointers.
llvm-svn: 150690
2012-02-16 13:49:39 +00:00
Eli Bendersky 924f9a671d Replace all instances of dg.exp file with lit.local.cfg, since all tests are run with LIT now and now Dejagnu. dg.exp is no longer needed.
Patch reviewed by Daniel Dunbar. It will be followed by additional cleanup patches.

llvm-svn: 150664
2012-02-16 06:28:33 +00:00
Eli Friedman c458885c58 loop-rotate shouldn't hoist alloca instructions out of a loop. Patch by Patrik Hägglund, with slightly modified test. Issue reported by Patrik Hägglund on llvmdev.
llvm-svn: 150642
2012-02-16 00:41:10 +00:00
Bill Wendling b028143119 Remove extraneous tests.
llvm-svn: 150636
2012-02-15 23:44:05 +00:00
Bill Wendling 6a26ab7f99 Add a test for generating Objective-C metadata from module flags.
llvm-svn: 150635
2012-02-15 23:43:37 +00:00
Bill Wendling 0b73456cc5 Add a test for the Objective-C garbage collection metadata stuff.
llvm-svn: 150626
2012-02-15 22:44:10 +00:00
David Meyer 44ec69efe0 For ELF, also call fixSymbolsInTLSFixups() on expressions passed to EmitValue (literal values). Previously only called on expressions in instructions. New test cases added to tls.s, tls-i386.s. Resolves PR11981.
llvm-svn: 150582
2012-02-15 15:09:06 +00:00
Pete Cooper c21ebf5c41 Stop custom lowering forr x86 DEC64m from happening if the load in the lowered sequence has more than 1 user
llvm-svn: 150537
2012-02-15 00:33:37 +00:00
Lang Hames 595111f221 Tighten physical register invariants: Allocatable physical registers can
only be live in to a block if it is the function entry point or a landing pad.

llvm-svn: 150494
2012-02-14 18:51:53 +00:00
Nadav Rotem 29984ba033 Fix PR12000. Some vector operations may use scalar operands with types
that are greater than the vector element type. For example BUILD_VECTOR
of type <1 x i1> with a constant i8 operand.
This patch fixes the assertion.

llvm-svn: 150477
2012-02-14 13:06:32 +00:00
Bill Wendling acb0f4795d Change error tests to coincide with message changes.
llvm-svn: 150467
2012-02-14 09:29:21 +00:00
Kostya Serebryany dc84eae1e0 [asan] fix asan-vs-gvn.ll test (it did not actually check much before this change)
llvm-svn: 150441
2012-02-14 00:02:35 +00:00
Andrew Trick 10cc45336d Add simplifyLoopLatch to LoopRotate pass.
This folds a simple loop tail into a loop latch. It covers the common (in fortran) case of postincrement loops. It's a "free" way to expose this type of loop to downstream loop optimizations that bail out on non-canonical loops (getLoopLatch is a heavily used check).

llvm-svn: 150439
2012-02-14 00:00:23 +00:00
Devang Patel 698452bc7e Check against umin while converting fcmp into an icmp.
llvm-svn: 150425
2012-02-13 23:05:18 +00:00
Dan Gohman eb6e01533a Just like in regular escape analysis, loads and stores through
(but not of) a block pointer do not cause the block pointer to
escape. This fixes rdar://10803830.

llvm-svn: 150424
2012-02-13 22:57:02 +00:00
Kostya Serebryany e2a0e4163a ThreadSanitizer, a race detector. First LLVM commit.
Clang patch (flags) will follow shortly.
The run-time library will also follow, but not immediately.

llvm-svn: 150423
2012-02-13 22:50:51 +00:00
Nadav Rotem 0c65064dbe Fix a bug in DAGCombine for the optimization of BUILD_VECTOR. We cant generate a shuffle node from two vectors of different types.
llvm-svn: 150383
2012-02-13 12:42:26 +00:00
Craig Topper 38f2c7f14f Revert accidental commit of a pruned testcase from r150360.
llvm-svn: 150361
2012-02-13 04:33:33 +00:00
Craig Topper 87119fa37f Update CanXFormVExtractWithShuffleIntoLoad to ensure bitcasts of loads only have one use. Matches DAGCombiner and prevents vector_shuffles from reaching isel.
llvm-svn: 150360
2012-02-13 04:30:38 +00:00
Pete Cooper 71be57bb32 Fixed bug when custom lowering DEC64m on x86.
If the DEC node had more than one user, it was doing this lowering but
leaving the original DEC node around and so decrementing twice.

Fixes PR11964.

llvm-svn: 150356
2012-02-13 00:10:03 +00:00
Nadav Rotem 34ca89afa8 This patch addresses the problem of poor code generation for the zext
v8i8 -> v8i32 on AVX machines. The codegen often scalarizes ANY_EXTEND nodes.
The DAGCombiner has two optimizations that can mitigate the problem. First,
if all of the operands of a BUILD_VECTOR node are extracted from an ZEXT/ANYEXT
nodes, then it is possible to create a new simplified BUILD_VECTOR which uses
UNDEFS/ZERO values to eliminate the scalar ZEXT/ANYEXT nodes.
Second, another dag combine optimization lowers BUILD_VECTOR into a shuffle
vector instruction.

In the case of zext v8i8->v8i32 on AVX, a value in an XMM register is to be
shuffled into a wide YMM register.

This patch modifes the second optimization and allows the creation of
shuffle vectors even when the newly generated vector and the original vector
from which we extract the values are of different types.

llvm-svn: 150340
2012-02-12 15:05:31 +00:00
Anton Korobeynikov c6b4017ce2 Add support for implicit TLS model used with MS VC runtime.
Patch by Kai Nacke!

llvm-svn: 150307
2012-02-11 17:26:53 +00:00
Bill Wendling 66f02413ef [WIP] Initial code for module flags.
Module flags are key-value pairs associated with the module. They include a
'behavior' value, indicating how module flags react when mergine two
files. Normally, it's just the union of the two module flags. But if two module
flags have the same key, then the resulting flags are dictated by the behaviors.

Allowable behaviors are:

     Error
       Emits an error if two values disagree.

     Warning
       Emits a warning if two values disagree.

     Require
       Emits an error when the specified value is not present
       or doesn't have the specified value. It is an error for
       two (or more) llvm.module.flags with the same ID to have
       the Require behavior but different values. There may be
       multiple Require flags per ID.

     Override
       Uses the specified value if the two values disagree. It
       is an error for two (or more) llvm.module.flags with the
       same ID to have the Override behavior but different
       values.

llvm-svn: 150300
2012-02-11 11:38:06 +00:00
Hal Finkel 1bde3f86d1 Update BBVectorize to use aliasesUnknownInst.
This allows BBVectorize to check the "unknown instruction" list in the
alias sets. This is important to prevent instruction fusing from reordering
function calls. Resolves PR11920.

llvm-svn: 150250
2012-02-10 15:52:40 +00:00
Duncan Sands 26641d7c02 Fix PR11948: the result type of an icmp may be a vector of boolean -
don't assume it is a boolean.

llvm-svn: 150247
2012-02-10 14:31:24 +00:00
Duncan Sands bf48ac622a Revert commit 149912 (lattner) and add a testcase that shows the problem (which
is that patterns no longer match for vectors of booleans, because you only get
ConstantDataVector when the vector element type is i8, i16, etc, not when it is
i1).  Original commit message:
Remove some dead code and tidy things up now that vectors use ConstantDataVector
instead of always using ConstantVector.

llvm-svn: 150246
2012-02-10 14:26:42 +00:00
Andrew Trick d3f8fe81f4 RegAlloc superpass: includes phi elimination, coalescing, and scheduling.
Creates a configurable regalloc pipeline.

Ensure specific llc options do what they say and nothing more: -reglloc=... has no effect other than selecting the allocator pass itself. This patch introduces a new umbrella flag, "-optimize-regalloc", to enable/disable the optimizing regalloc "superpass". This allows for example testing coalscing and scheduling under -O0 or vice-versa.

When a CodeGen pass requires the MachineFunction to have a particular property, we need to explicitly define that property so it can be directly queried rather than naming a specific Pass. For example, to check for SSA, use MRI->isSSA, not addRequired<PHIElimination>.

CodeGen transformation passes are never "required" as an analysis

ProcessImplicitDefs does not require LiveVariables.

We have a plan to massively simplify some of the early passes within the regalloc superpass.

llvm-svn: 150226
2012-02-10 04:10:36 +00:00
Benjamin Kramer 487a3962c7 GlobalOpt: Be more aggressive about elminating side-effect free static dtors.
GlobalOpt runs early in the pipeline (before inlining) and complex class
hierarchies often introduce bitcasts or GEPs which weren't optimized away.
Teach it to ignore side-effect free instructions instead of depending on
other passes to remove them.

llvm-svn: 150174
2012-02-09 14:26:06 +00:00
James Molloy d9ba4fd48f Teach the MC and disassembler about SoftFail, and hook it up to UNPREDICTABLE on ARM. Wire this to tBLX in order to provide test coverage.
llvm-svn: 150169
2012-02-09 10:56:31 +00:00
NAKAMURA Takumi d5a504e52b test/CodeGen/X86/atom-lea-sp.ll: Add explicit -mtriple=i686-linux.
llvm-svn: 150151
2012-02-09 05:12:58 +00:00
Evan Cheng 55e7d6aeff Commit Andy Zhang's test for the lea patch.
llvm-svn: 150107
2012-02-08 22:33:17 +00:00
Kostya Serebryany 154a54d972 [asan] unpoison the stack before every noreturn call. Fixes asan issue 37. llvm part
llvm-svn: 150102
2012-02-08 21:36:17 +00:00
Elena Demikhovsky 1adc1d53dd Fixed a bug in printing "cmp" pseudo ops.
> This IR code
> %res = call <8 x float> @llvm.x86.avx.cmp.ps.256(<8 x float> %a0, <8 x float> %a1, i8 14)
> fails with assertion:
>
> llc: X86ATTInstPrinter.cpp:62: void llvm::X86ATTInstPrinter::printSSECC(const llvm::MCInst*, unsigned int, llvm::raw_ostream&): Assertion `0 && "Invalid ssecc argument!"' failed.
> 0  llc             0x0000000001355803
> 1  llc             0x0000000001355dc9
> 2  libpthread.so.0 0x00007f79a30575d0
> 3  libc.so.6       0x00007f79a23a1945 gsignal + 53
> 4  libc.so.6       0x00007f79a23a2f21 abort + 385
> 5  libc.so.6       0x00007f79a239a810 __assert_fail + 240
> 6  llc             0x00000000011858d5 llvm::X86ATTInstPrinter::printSSECC(llvm::MCInst const*, unsigned int, llvm::raw_ostream&) + 119

I added the full testing for all possible pseudo-ops of cmp.
I extended X86AsmPrinter.cpp and X86IntelInstPrinter.cpp.

You'l also see lines alignments (unrelated to this fix) in X86IselLowering.cpp from my previous check-in.

llvm-svn: 150068
2012-02-08 08:37:26 +00:00
Chad Rosier 0ee8c513f7 [fast-isel] Add support for SUBs with non-legal types.
llvm-svn: 150047
2012-02-08 02:45:44 +00:00
Chad Rosier bfe7393d7d Add comment to test case.
llvm-svn: 150046
2012-02-08 02:30:12 +00:00
Chad Rosier bd471255a9 [fast-isel] Add support for ORs with non-legal types.
llvm-svn: 150045
2012-02-08 02:29:21 +00:00
Chad Rosier ded4c99f2e [fast-isel] Add support for indirect branches.
llvm-svn: 150014
2012-02-07 23:56:08 +00:00
Craig Topper b27fd77c3f Add instruction selection for 256-bit VPSHUFD and 128-bit VPERMILPS/VPERMILPD.
llvm-svn: 149968
2012-02-07 06:28:42 +00:00
Chad Rosier 685b20c114 [fast-isel] Add support for ADDs with non-legal types.
llvm-svn: 149934
2012-02-06 23:50:07 +00:00
Kostya Serebryany 9e0d377400 The patch resolves the conflict between AddressSanitizer and load widening (GVN).
The problem initially reported by Mozilla folks (http://code.google.com/p/address-sanitizer/issues/detail?id=20),
but it also prevents us from enabling LLVM bootstrap with AddressSanitizer.

llvm-svn: 149925
2012-02-06 22:48:56 +00:00
Bill Wendling 9ebc0896e1 The 'unwind' instruction is deprecated and will be removed, making this test
obsolete.

llvm-svn: 149880
2012-02-06 18:18:47 +00:00
Nick Lewycky 52da72b12a Teach GlobalOpt to handle atomic accesses to globals.
* Most of the transforms come through intact by having each transformed load or
store copy the ordering and synchronization scope of the original.
 * The transform that turns a global only accessed in main() into an alloca
(since main is non-recursive) with a store of the initial value uses an
unordered store, since it's guaranteed to be the first thing to happen in main.
(Threads may have started before main (!) but they can't have the address of a
function local before the point in the entry block we insert our code.)
 * The heap-SRoA transforms are disabled in the face of atomic operations. This
can probably be improved; it seems odd to have atomic accesses to an alloca
that doesn't have its address taken.

AnalyzeGlobal keeps track of the strongest ordering found in any use of the
global. This is more information than we need right now, but it's cheap to
compute and likely to be useful.

llvm-svn: 149847
2012-02-05 19:56:38 +00:00
Duncan Sands be65578460 Testcase for commit 149833 (use of an uninitialized variable noticed
by GCC).

llvm-svn: 149840
2012-02-05 19:27:57 +00:00
Duncan Sands 4b613497f0 Reduce the number of dom queries made by GVN's conditional propagation
logic by half: isOnlyReachableViaThisEdge was trying to be clever and
handle the case of a branch to a basic block which is contained in a
loop.  This costs a domtree lookup and is completely useless due to
GVN's position in the pass pipeline: all loops have preheaders at this
point, which means it is enough for isOnlyReachableViaThisEdge to check
that Dst has only one predecessor.  (I checked this theoretical argument
by running over the entire nightly testsuite, and indeed it is so!).

llvm-svn: 149838
2012-02-05 18:25:50 +00:00
Benjamin Kramer 93492b8765 Testing vector code without sse doesn't make much sense.
Should bring arm and ppc testers back to life (they default to -mcpu=generic)

llvm-svn: 149821
2012-02-05 11:19:39 +00:00
Chris Lattner 6987fdb620 Add a test for the miscompilation my recent ConstantDataArray patches introduced, to make sure
we don't regress on it in the future.

llvm-svn: 149803
2012-02-05 02:37:36 +00:00
Craig Topper 4daa67483d Remove most of the intrinsics for XOP VPCMOV instruction. They all aliased to the same instruction with different types. This would be better accomplished with casts in the not yet created xopintrin.h header file.
llvm-svn: 149795
2012-02-05 00:55:56 +00:00
Hal Finkel 135cac922c Boost the effective chain depth of loads and stores.
By default, boost the chain depth contribution of loads and stores. This will allow a load/store pair to vectorize even when it would not otherwise be long enough to satisfy the chain depth requirement.

llvm-svn: 149761
2012-02-04 04:14:04 +00:00
Chad Rosier 6d68c7cf79 [fast-isel] HandlePHINodesInSuccessorBlocks() can promite i8 and i16 types too.
llvm-svn: 149730
2012-02-04 00:39:19 +00:00
Chad Rosier 41f0e78b6c [fast-isel] Add support for FPToUI. Also add test cases for FPToSI.
llvm-svn: 149706
2012-02-03 20:27:51 +00:00
Chad Rosier a8a8ac5d47 [fast-isel] Add support for selecting UIToFP.
llvm-svn: 149704
2012-02-03 19:42:52 +00:00
Nadav Rotem 5399f4d6bf The type-legalizer often scalarizes code. One of the common patterns is extract-and-truncate.
In this patch we optimize this pattern and convert the sequence into extract op of a narrow type.
This allows the BUILD_VECTOR dag optimizations to construct efficient shuffle operations in many cases.

llvm-svn: 149692
2012-02-03 13:18:25 +00:00
Akira Hatanaka f0b08445f6 Add a new MachineJumpTableInfo entry type, EK_GPRel64BlockAddress, which is
needed to emit a 64-bit gp-relative relocation entry. Make changes necessary
for emitting jump tables which have entries with directive .gpdword. This patch
does not implement the parts needed for direct object emission or JIT.

llvm-svn: 149668
2012-02-03 04:33:00 +00:00
Dan Gohman 2d54ce841c Fix SSAUpdaterImpl's RecordMatchingPHI to record exactly the
PHI nodes which were matched, rather than climbing up the
original PHI node's operands to rediscover PHI nodes for
recording, since the PHI nodes found that are not
necessarily part of the matched set.
This fixes rdar://10589171.

llvm-svn: 149654
2012-02-03 01:07:01 +00:00
Jim Grosbach 0ab54184d7 Revert "Disable InstCombine unsafe folding bitcasts of calls w/ varargs."
This reverts commit d0e277d272d517ca1cda368267d199f0da7cad95.

llvm-svn: 149647
2012-02-03 00:00:50 +00:00
Matt Beaumont-Gay 1e8d720743 Unix line endings
llvm-svn: 149615
2012-02-02 19:00:49 +00:00
NAKAMURA Takumi 9df3de538b Move test/CodeGen/Generic/2012-02-01-CoalescerBug.ll to CodeGen/ARM, for now. It requires TARGETS=arm.
I cannot reproduce a fixed issue with other targets.

llvm-svn: 149604
2012-02-02 11:44:58 +00:00
Elena Demikhovsky fb44980b41 Optimization for SIGN_EXTEND operation on AVX.
Special handling was added for v4i32 -> v4i64 and v8i16 -> v8i32
extensions.

llvm-svn: 149600
2012-02-02 09:10:43 +00:00
Lang Hames 0269caafa6 Set EFLAGS correctly in EmitLoweredSelect on X86.
llvm-svn: 149597
2012-02-02 07:48:37 +00:00
Lang Hames 3a20bc3652 PR11868. The previous loop in LiveIntervals::join would sometimes fall over if
more than two adjacent ranges needed to be merged. The new version should be
able to handle an arbitrary sequence of adjancent ranges.

llvm-svn: 149588
2012-02-02 05:37:34 +00:00
Andrew Trick 8523b16ff5 Instruction scheduling itinerary for Intel Atom.
Adds an instruction itinerary to all x86 instructions, giving each a default latency of 1, using the InstrItinClass IIC_DEFAULT.

Sets specific latencies for Atom for the instructions in files X86InstrCMovSetCC.td, X86InstrArithmetic.td, X86InstrControl.td, and X86InstrShiftRotate.td. The Atom latencies for the remainder of the x86 instructions will be set in subsequent patches.

Adds a test to verify that the scheduler is working.

Also changes the scheduling preference to "Hybrid" for i386 Atom, while leaving x86_64 as ILP.

Patch by Preston Gurd!

llvm-svn: 149558
2012-02-01 23:20:51 +00:00
Mon P Wang 9f05206659 Avoid creating an extract element to an illegal type after LegalizeTypes has run.
llvm-svn: 149548
2012-02-01 22:15:20 +00:00
Andrew Trick d06df96a7c VLIW specific scheduler framework that utilizes deterministic finite automaton (DFA).
This new scheduler plugs into the existing selection DAG scheduling framework. It is a top-down critical path scheduler that tracks register pressure and uses a DFA for pipeline modeling.

Patch by Sergei Larin!

llvm-svn: 149547
2012-02-01 22:13:57 +00:00
NAKAMURA Takumi fe8186f8b0 test/CodeGen/X86/avx-minmax.ll: Relax expressions for Win32 targets. YMM arguments are passed as indirect on Win32 x64.
llvm-svn: 149505
2012-02-01 14:35:29 +00:00
Elena Demikhovsky 824eed70a6 Passing AVX 256-bit structures in Win64 was wrong.
Fixed Win64 calling conventions.

llvm-svn: 149494
2012-02-01 10:46:14 +00:00
Elena Demikhovsky 0e48c70ba7 Optimization for "truncate" operation on AVX.
Truncating v4i64 -> v4i32 and v8i32 -> v8i16 may be done with set of shuffles.

llvm-svn: 149485
2012-02-01 07:56:44 +00:00
Hal Finkel c34e51132c Add a basic-block autovectorization pass.
This is the initial checkin of the basic-block autovectorization pass along with some supporting vectorization infrastructure.
Special thanks to everyone who helped review this code over the last several months (especially Tobias Grosser).

llvm-svn: 149468
2012-02-01 03:51:43 +00:00
Jim Grosbach 9fa0481569 Disable InstCombine unsafe folding bitcasts of calls w/ varargs.
Changing arguments from being passed as fixed to varargs is unsafe, as
the ABI may require they be handled differently (stack vs. register, for
example).

Remove two tests which rely on the bitcast being folded into the direct
call, which is exactly the transformation that's unsafe.

llvm-svn: 149457
2012-02-01 00:08:17 +00:00
Kevin Enderby e1a12cf3ee Fixed a crash in llvm-mc for Mach-O when a symbol difference expression uses a
symbol from an assignment.  In this case the symbol did not have a fragment so
MCObjectWriter::IsSymbolRefDifferenceFullyResolved() should not have been
calling IsSymbolRefDifferenceFullyResolvedImpl() with a NULL fragment and should
just have returned false in that case.

llvm-svn: 149442
2012-01-31 23:02:57 +00:00
Craig Topper b85e40f738 Remove pcmpgt/pcmpeq intrinsics as clang is not using them.
llvm-svn: 149367
2012-01-31 06:52:44 +00:00
Bill Wendling d7cd9727ee Remove all references to the old EH.
There was always the current EH. -- Ministry of Truth

llvm-svn: 149335
2012-01-31 02:09:07 +00:00
Bill Wendling f98a6f4aab Update test to new EH model.
llvm-svn: 149333
2012-01-31 02:05:13 +00:00
Bill Wendling eb4adf3a27 Update test to new EH model.
llvm-svn: 149332
2012-01-31 02:04:20 +00:00
Chandler Carruth 2c469ff14a Chris's constant data sequence refactoring actually enabled printing
vectors of all one bits to be printed more cleverly in the AsmPrinter.
Unfortunately, the byte value for all one bits is the same with
-fsigned-char as the error return of '-1'. Force this to be the unsigned
byte value when returning it to avoid this problem, and update the test
case for the shiny new behavior.

Yay for building LLVM and Clang with -funsigned-char.

Chris, please review, and let me know if there is any reason to not
desire this change. It seems good on the surface, and certainly intended
based on the code written.

llvm-svn: 149299
2012-01-30 23:47:44 +00:00
Devang Patel 7cdb2ff6b5 Intel syntax. Adjust special code, used to recognize cmp<comparison code>{ss,sd,ps,pd}, for intel syntax.
llvm-svn: 149291
2012-01-30 22:47:12 +00:00
Devang Patel 9a9bb5c5db Intel syntax. Support .intel_syntax directive.
llvm-svn: 149270
2012-01-30 20:02:42 +00:00
Craig Topper 516cba3380 Fix pattern for memory form of PSHUFD for use with FP vectors to remove bitcast to an integer vector that normal code wouldn't have. Also remove bitcasts from code that turns splat vector loads into a shuffle as it was making the broken pattern necessary.
llvm-svn: 149232
2012-01-30 07:50:31 +00:00
NAKAMURA Takumi 185de44959 CMake: Promote the testing targets out of folders on IDE.
llvm-svn: 149220
2012-01-30 03:15:47 +00:00
James Molloy b47489d4ef Ensure .AliasedSymbol() is called on all uses of getSymbol(). Affects ARM and MIPS ELF backends.
Fixes PR11877

llvm-svn: 149180
2012-01-28 15:58:32 +00:00
Rafael Espindola 004725876e Small improvement to the recursion detection logic from the previous commit.
llvm-svn: 149175
2012-01-28 06:22:14 +00:00
Rafael Espindola 72f5f170c6 Handle recursive variable definitions directly. This gives us better error
messages and allows us to fix PR11865.

llvm-svn: 149174
2012-01-28 05:57:00 +00:00
Rafael Espindola bb893fea6b Add r149110 back with a fix for when the vector and the int have the same
width.

llvm-svn: 149151
2012-01-27 23:33:07 +00:00
Rafael Espindola a4062624d1 Revert r149110 and add a testcase that was crashing since that revision.
Unfortunately I also had to disable constant-pool-sharing.ll the code it tests has been
updated to use the IL logic.

llvm-svn: 149148
2012-01-27 22:42:48 +00:00
Devang Patel 63fe5697f4 Intel Syntax: Parse mem operand with seg reg. QWORD PTR FS:[320]
llvm-svn: 149142
2012-01-27 19:48:28 +00:00
Matt Beaumont-Gay f3a9a38db4 Unix line endings
llvm-svn: 149115
2012-01-27 02:31:29 +00:00
Chris Lattner 111d6ee655 enhance constant folding to be able to constant fold bitcast of
ConstantVector's to integer type.

llvm-svn: 149110
2012-01-27 01:44:03 +00:00
Lang Hames 286fed5c8c Rewrite instruction operands in AdjustCopiesBackFrom. Fixes PR11861.
llvm-svn: 149097
2012-01-27 00:05:42 +00:00
Jakob Stoklund Olesen fc9dce25f7 Handle call-clobbered ymm registers on Win64.
The Win64 calling convention has xmm6-15 as callee-saved while still
clobbering all ymm registers.

Add a YMM_HI_6_15 pseudo-register that aliases the clobbered part of the
ymm registers, and mark that as call-clobbered.  This allows live xmm
registers across calls.

This hack wouldn't be necessary with RegisterMask operands representing
the call clobbers, but they are not quite operational yet.

llvm-svn: 149088
2012-01-26 22:59:28 +00:00
Chad Rosier 1a1531d65e Replace the use of isPredicable() with isPredicated() in
MachineBasicBlock::canFallThrough().  We're interested in the state of the
instruction (i.e., is this a barrier or not?), not if the instruction is
predicable or not.
rdar://10501092

llvm-svn: 149070
2012-01-26 18:24:25 +00:00
Jakob Stoklund Olesen 8c139a5125 Clear kill flags before propagating a copy.
The live range of the source register may be extended when a redundant
copy is eliminated. Make sure any kill flags between the two copies are
cleared.

This fixes PR11765.

llvm-svn: 149069
2012-01-26 17:52:15 +00:00
James Molloy 6685c08e5f Add support for the R_ARM_TARGET1 relocation, which should be given to relocations applied to all C++ constructors and destructors.
This enables the linker to match concrete relocation types (absolute or relative) with whatever library or C++ support code is being linked against.

llvm-svn: 149057
2012-01-26 09:25:43 +00:00
Victor Umansky 5f29b0e57b Fix for the following bug in AVX codegen for double-to-int conversions:
.	"fptosi" and "fptoui" IR instructions are defined with round-to-zero rounding mode.
.	Currently for AVX mode for <4xdouble> and <8xdouble>  the "VCVTPD2DQ.128" and "VCVTPD2DQ.256" instructions are selected (for .fp_to_sint. DAG node operation ) by AVX codegen. However they use round-to-nearest-even rounding mode.
.	Consequently, the conversion produces incorrect numbers.
 
The fix is to replace selection of VCVTPD2DQ instructions with VCVTTPD2DQ instructions. The latter use truncate (i.e. round-to-zero) rounding mode. 
As .fp_to_sint. DAG node operation is used only for lowering of  "fptosi" and "fptoui" IR instructions, the fix in X86InstrSSE.td definition file doesn.t have an impact on other LLVM flows.
 
The patch includes changes in the .td file, LIT test for the changes and a fix in a legacy LIT test (which produced asm code conflicting with LLVN IR spec). 

llvm-svn: 149056
2012-01-26 08:51:39 +00:00
Jakob Stoklund Olesen 4864a81aa3 Improve sub-register def handling in ProcessImplicitDefs.
This boils down to using MachineOperand::readsReg() more.

This fixes PR11829 where a use ended up after the first def when
lowering REG_SEQUENCE instructions involving IMPLICIT_DEFs.

llvm-svn: 148996
2012-01-25 23:36:27 +00:00
Anton Korobeynikov 7722a2d4e3 Properly emit ctors / dtors with priorities into desired sections
and let linker handle the rest.

This finally fixes PR5329

llvm-svn: 148990
2012-01-25 22:24:19 +00:00
Jim Grosbach 82f76d1275 ARM assemly parsing and validation of IT instruction.
"Although a Thumb2 instruction, the IT mnemonic shall be permitted in
ARM mode, and the condition verified to match the condition code(s)
on the following instruction(s)."

PR11853

llvm-svn: 148969
2012-01-25 19:52:01 +00:00
Nick Lewycky 70d50ee8fb Support pointer comparisons against constants, when looking at the inline-cost
savings from a pointer argument becoming an alloca. Sometimes callees will even
compare a pointer to null and then branch to an otherwise unreachable block!
Detect these cases and compute the number of saved instructions, instead of
bailing out and reporting no savings.

llvm-svn: 148941
2012-01-25 08:27:40 +00:00
Akira Hatanaka 01d3c42f90 Modify MipsFrameLowering::emitPrologue and emitEpilogue.
- Use MipsAnalyzeImmediate to expand immediates that do not fit in 16-bit.
- Change the types of variables so that they are sufficiently large to handle
  64-bit pointers.
- Emit instructions to set register $28 in a function prologue after
  instructions which store callee-saved registers have been emitted. 
 

llvm-svn: 148917
2012-01-25 04:12:04 +00:00
Akira Hatanaka 86d5fadd57 Lower 64-bit immediates using MipsAnalyzeImmediate that has just been added.
Add a test case to show fewer instructions are needed to load an immediate
with the new way of loading immediates.

llvm-svn: 148908
2012-01-25 03:01:35 +00:00
Jim Grosbach 086cbfac7d NEON VLD4(all lanes) assembly parsing and encoding.
llvm-svn: 148884
2012-01-25 00:01:08 +00:00
Jim Grosbach b78403ce48 NEON VLD3(all lanes) assembly parsing and encoding.
llvm-svn: 148882
2012-01-24 23:47:04 +00:00
Jakob Stoklund Olesen 1b8e437ab6 Set correct <def,undef> flags when lowering REG_SEQUENCE.
A REG_SEQUENCE instruction is lowered into a sequence of partial defs:

  %vreg7:ssub_0<def,undef> = COPY %vreg20:ssub_0
  %vreg7:ssub_1<def> = COPY %vreg2
  %vreg7:ssub_2<def> = COPY %vreg2
  %vreg7:ssub_3<def> = COPY %vreg2

The first def needs an <undef> flag to indicate it is the beginning of
the live range, while the other defs are read-modify-write.  Previously,
we depended on LiveIntervalAnalysis to notice and fix the missing
<def,undef>, but that solution was never robust, it was causing problems
with ProcessImplicitDefs and the lowering of chained REG_SEQUENCE
instructions.

This fixes PR11841.

llvm-svn: 148879
2012-01-24 23:28:42 +00:00
Akira Hatanaka 77dbd786c8 Pattern for f32 to i64 conversion.
llvm-svn: 148869
2012-01-24 22:05:25 +00:00
Jim Grosbach 35bc8f9159 ARM Darwin symbol ref differences w/o subsection-via-symbols.
When not using subsections via symbols, the assembler can resolve
symbol differences (including pcrel references) to non-local
labels at assembly time, not just those in the same atom.

llvm-svn: 148865
2012-01-24 21:45:25 +00:00
Devang Patel a410ed3ced Intel Syntax: Extend special hand coded logic, to recognize special instructions, for intel syntax.
llvm-svn: 148864
2012-01-24 21:43:36 +00:00
Akira Hatanaka 9f7ec1538f 64-bit sign extension in register instructions.
llvm-svn: 148862
2012-01-24 21:41:09 +00:00
Kostya Serebryany c11d1dd133 [asan] enable asan only for the functions that have Attribute::AddressSafety
llvm-svn: 148846
2012-01-24 19:34:43 +00:00
Jim Grosbach 8e2722cdb0 NEON VST4(one lane) assembly parsing and encoding.
llvm-svn: 148836
2012-01-24 18:53:13 +00:00
Jim Grosbach 14952a0e32 NEON VLD4(one lane) assembly parsing and encoding.
llvm-svn: 148832
2012-01-24 18:37:25 +00:00
Jakob Stoklund Olesen 60e70e8fcf Add an (interleave A, B, ...) SetTheory operator.
This will interleave the elements from two or more lists.

llvm-svn: 148824
2012-01-24 18:06:05 +00:00
Jim Grosbach 3cfef8d467 NEON Two-operand assembly aliases for VSRA.
llvm-svn: 148821
2012-01-24 17:55:36 +00:00
Jim Grosbach 9ca3a067c8 Remove redundant test file.
llvm-svn: 148820
2012-01-24 17:55:32 +00:00
Jim Grosbach 7ae12cc546 NEON Two-operand assembly aliases for VSLI.
llvm-svn: 148819
2012-01-24 17:49:15 +00:00
Jim Grosbach 7b6f0f67aa NEON Two-operand assembly aliases for VSRI.
llvm-svn: 148818
2012-01-24 17:46:58 +00:00
Jim Grosbach 1395b67e3b Tidy up.
llvm-svn: 148817
2012-01-24 17:46:54 +00:00
Elena Demikhovsky 0b0c5d8c4c ZERO_EXTEND operation is optimized for AVX.
v8i16 -> v8i32, v4i32 -> v4i64 - used vpunpck* instructions.

llvm-svn: 148803
2012-01-24 13:54:13 +00:00
Evgeniy Stepanov 33a7e2f2a1 An option to selectively enable part of ARM EHABI support.
This change adds an new option --arm-enable-ehabi-descriptors that
enables emitting unwinding descriptors. This provides a mode with a
working backtrace() without the (currently broken) exception support.

llvm-svn: 148800
2012-01-24 13:05:33 +00:00
Eric Christopher df1666e4e6 Fix the testcases for the previous patch.
rdar://10278198

llvm-svn: 148795
2012-01-24 10:11:49 +00:00
Jim Grosbach da70eac268 NEON VST4(multiple 4 element structures) assembly parsing.
llvm-svn: 148764
2012-01-24 00:58:13 +00:00
Jim Grosbach ed561fc850 NEON VLD4(multiple 4 element structures) assembly parsing.
llvm-svn: 148762
2012-01-24 00:43:17 +00:00
Chandler Carruth ed975232bc Revert r148686 (and r148694, a fix to it) due to a serious layering
violation -- MC cannot depend on CodeGen.

Specifically, the MCTargetDesc component of each target is actually
a subcomponent of the MC library. As such, it cannot depend on the
target-independent code generator, because MC itself cannot depend on
the target-independent code generator. This change moved a flag from the
ARM MCTargetDesc file ARMMCAsmInfo.cpp to the CodeGen layer in
ARMException.cpp, leaving behind an 'extern' to refer back to it. That
layering order isn't viable givin the constraints outlined above.
Commandline flags are designed to be static specifically to avoid these
types of bugs.

Fixing this is likely going to require some non-trivial refactoring.

llvm-svn: 148759
2012-01-24 00:30:17 +00:00
Jim Grosbach d3d36d9315 NEON VST3(single element from one lane) assembly parsing.
llvm-svn: 148755
2012-01-24 00:07:41 +00:00
Jim Grosbach 1a74724fc9 NEON VST3(multiple 3-element structures) assembly parsing.
llvm-svn: 148748
2012-01-23 23:45:44 +00:00
Jim Grosbach ac2af3ffab NEON VLD3(multiple 3-element structures) assembly parsing.
llvm-svn: 148745
2012-01-23 23:20:46 +00:00
Devang Patel cf893a437e Intel syntax: Robustify parsing of memory operand's displacement experssion.
llvm-svn: 148737
2012-01-23 22:35:25 +00:00
Jim Grosbach a8b444b08b NEON VLD3 lane-indexed assembly parsing and encoding.
llvm-svn: 148734
2012-01-23 21:53:26 +00:00
Rafael Espindola 3c47e37387 Add support for .cfi_signal_frame. Fixes pr11762.
llvm-svn: 148733
2012-01-23 21:51:52 +00:00
Jakob Stoklund Olesen 20948fab69 Fix PR11829. PostRA LICM was too aggressive.
This fixes a typo in r148589.

llvm-svn: 148724
2012-01-23 21:01:15 +00:00
Devang Patel e660fdd953 Intel syntax: Parse memory operand with empty base reg, e.g. DWORD PTR [4*RDI]
llvm-svn: 148721
2012-01-23 20:20:06 +00:00
Jim Grosbach d28ef9ac46 Simplify some NEON assembly pseudo definitions.
Let the generic token alias definitions handle the data subtype
suffices. We don't need explicit versions for each.

llvm-svn: 148718
2012-01-23 19:39:08 +00:00
Devang Patel 880bc1644b Intel syntax: Parse segment registers.
llvm-svn: 148712
2012-01-23 18:31:58 +00:00
Evgeniy Stepanov 482cdc4ebd An option to selectively enable parts of ARM EHABI support.
This change adds an new value to the --arm-enable-ehabi option that
disables emitting unwinding descriptors. This mode gives a working
backtrace() without the (currently broken) exception support.

llvm-svn: 148686
2012-01-23 07:57:39 +00:00
Nick Lewycky c31ceda7d9 Make Value::isDereferenceablePointer() handle unreachable code blocks. (This
returns false in the event the computation feeding into the pointer is
unreachable, which maybe ought to be true -- but this is at least consistent
with undef->isDereferenceablePointer().) Fixes PR11825!

llvm-svn: 148671
2012-01-23 00:05:17 +00:00
Anton Korobeynikov 5482b9f535 Add fused multiple+add instructions from VFPv4.
Patch by Ana Pazos!

llvm-svn: 148658
2012-01-22 12:07:33 +00:00
Devang Patel ce6a2ca8c8 Intel syntax: Robustify register parsing.
llvm-svn: 148591
2012-01-20 22:32:05 +00:00
Andrew Trick b9c822ab0b Handle a corner case with IV chain collection with bailout instead of assert.
Fixes PR11783: bad cast to AddRecExpr.

llvm-svn: 148572
2012-01-20 21:23:40 +00:00
Andrew Trick 16abc8a1e2 Test case comments missing from my previous checkin.
llvm-svn: 148571
2012-01-20 21:21:27 +00:00
Devang Patel d0930fff85 Intel syntax: Parse ... PTR [-8]
llvm-svn: 148570
2012-01-20 21:21:01 +00:00
Devang Patel f36613cb45 Intel syntax: For now, disable ambiguous JMP64pcrel32 for intel syntax.
llvm-svn: 148569
2012-01-20 21:14:06 +00:00
Bob Wilson 6c7aaec077 ARM vector any_extends need to be selected to vmovl. <rdar://problem/10723651>
We have patterns for vector sext and zext operations but were missing
anyext.  Without those patterns, codegen will fail when the selection DAG
has any_extend nodes.

llvm-svn: 148568
2012-01-20 20:59:56 +00:00
Jim Grosbach 90f5780fc1 VST2 four-register w/ update pseudos for fixed/register update.
rdar://10724489

llvm-svn: 148560
2012-01-20 19:16:00 +00:00
Jim Grosbach a9d36fbca7 NEON use vmov.i32 to splat some f32 values into vectors.
For bit patterns that aren't representable using the 8-bit floating point
representation for vmov.f32, but are representable via vmov.i32, treat
the .f32 syntax as an alias. Most importantly, this covers the case
'vmov.f32 Vd, #0.0'.

rdar://10616677

llvm-svn: 148556
2012-01-20 18:09:51 +00:00
Nick Lewycky e8415fea4b Fix CountCodeReductionForAlloca to more accurately represent what SROA can and
can't handle. Also don't produce non-zero results for things which won't be
transformed by SROA at all just because we saw the loads/stores before we saw
the use of the address.

llvm-svn: 148536
2012-01-20 08:35:20 +00:00
Andrew Trick c908b43d9f SCEVExpander fixes. Affects LSR and indvars.
LSR has gradually been improved to more aggressively reuse existing code, particularly existing phi cycles. This exposed problems with the SCEVExpander's sloppy treatment of its insertion point. I applied some rigor to the insertion point problem that will hopefully avoid an endless bug cycle in this area. Changes:

- Always used properlyDominates to check safe code hoisting.

- The insertion point provided to SCEV is now considered a lower bound. This is usually a block terminator or the use itself. Under no cirumstance may SCEVExpander insert below this point.

- LSR is reponsible for finding a "canonical" insertion point across expansion of different expressions.

- Robust logic to determine whether IV increments are in "expanded" form and/or can be safely hoisted above some insertion point.

Fixes PR11783: SCEVExpander assert.

llvm-svn: 148535
2012-01-20 07:41:13 +00:00
Craig Topper 3469212c82 Add support for selecting 256-bit PALIGNR.
llvm-svn: 148532
2012-01-20 05:53:00 +00:00
Eli Friedman b359e67d2d Remove a low-quality test which was failing on Windows; test/CodeGen/X86/sret.ll is a better test for the relevant behavior.
llvm-svn: 148526
2012-01-20 02:06:40 +00:00
Eli Friedman 32c7c25dcb Support MSVC x86-32 sret convention. PR11688. Patch by Joe Groff.
llvm-svn: 148513
2012-01-20 00:05:46 +00:00
Dan Gohman 8ee108bf98 Set the "tail" flag on pattern-matched objc_storeStrong calls.
rdar://10531041.

llvm-svn: 148490
2012-01-19 19:14:36 +00:00
Devang Patel f83dcfd052 Post process 'and', 'sub' instructions and select better encoding, if available.
llvm-svn: 148489
2012-01-19 18:40:55 +00:00
Devang Patel 2529dd9e00 Intel syntax: There is no need to create unary expr for simple negative displacement.
llvm-svn: 148486
2012-01-19 18:15:51 +00:00
Devang Patel 4a62ff9bcb Post process 'xor', 'or' and 'cmp' instructions and select better encoding, if available.
llvm-svn: 148485
2012-01-19 17:53:25 +00:00
Evgeniy Stepanov 4c7eb477b5 Emit ARM EHABI unwinding instructions for 3 more Thumb instructions.
llvm-svn: 148473
2012-01-19 12:53:06 +00:00
Jim Grosbach 29ecaa944d Add testcase.
llvm-svn: 148454
2012-01-19 01:36:59 +00:00
Nick Lewycky 9e2c7f659e Space after punctuation.
llvm-svn: 148451
2012-01-19 01:13:47 +00:00
Nick Lewycky ecc0084f72 Add a TargetOption for disabling tail calls.
llvm-svn: 148442
2012-01-19 00:34:10 +00:00
Jim Grosbach 94298a906a Thumb2 alternate syntax for LDR(literal) and friends.
Explicit pc-relative syntax. For example, "ldrb r2, [pc, #-22]".

rdar://10250964

llvm-svn: 148432
2012-01-18 22:46:46 +00:00
Devang Patel de47cced25 Process instructions after match to select alternative encoding which may be more desirable.
llvm-svn: 148431
2012-01-18 22:42:29 +00:00
Jim Grosbach cb80eb2e75 Thumb2 relaxation for LDR(literal).
If the fixup is out of range for the Thumb1 instruction, relax it
to the Thumb2 encoding instead.

rdar://10711829

llvm-svn: 148424
2012-01-18 21:54:16 +00:00
Dan Gohman 82041c2e60 Use llvm.global_ctors to locate global constructors instead
of recognizing them by name.

llvm-svn: 148416
2012-01-18 21:19:38 +00:00
Nadav Rotem 3b8f0cc9fa Fix a bug in the type-legalization of vector integers. When we bitcast one vector type to another, we must not bitcast the result if one type is widened while the other is promoted.
llvm-svn: 148383
2012-01-18 08:33:18 +00:00
Andrew Trick c193b16ea2 Test case rename
llvm-svn: 148344
2012-01-17 22:27:45 +00:00
Jim Grosbach 4045507fea MC tweak symbol difference resolution for non-local symbols.
When the non-local symbol in the expression is in the same fragment
as the second symbol, the assembler can still evaluate the expression
without needing a relocation.

For example, on ARM:
_foo:
	ldr lr, (_foo - 4)

rdar://10348687

llvm-svn: 148341
2012-01-17 22:14:39 +00:00
Jim Grosbach 354f48f201 Tidy up.
llvm-svn: 148339
2012-01-17 22:03:42 +00:00
Devang Patel c9ed518792 Intel syntax: Fix parser match class to check memory operand size.
llvm-svn: 148338
2012-01-17 21:48:03 +00:00
Nadav Rotem fb6ddee0e9 Transform: (EXTRACT_VECTOR_ELT( VECTOR_SHUFFLE )) -> EXTRACT_VECTOR_ELT.
llvm-svn: 148337
2012-01-17 21:44:01 +00:00
Devang Patel a7143b6a2b Intel syntax: Parse "BYTE PTR [RDX + RCX]"
llvm-svn: 148334
2012-01-17 21:25:10 +00:00
Dan Gohman e7a243fea5 Add a new ObjC ARC optimization pass to eliminate unneeded
autorelease push+pop pairs.

llvm-svn: 148330
2012-01-17 20:52:24 +00:00
Devang Patel 8b39be79ad Intel syntax: Do not unncessarily create plus expression for memory operand displacement.
llvm-svn: 148321
2012-01-17 19:08:07 +00:00
Devang Patel a77c03be54 Intel syntax: Ignore mnemonic aliases.
llvm-svn: 148316
2012-01-17 18:30:45 +00:00
Eli Bendersky 83c0088faa Remove "XFAIL: arm" from test/ExecutionEngine/test-return.ll
The test passes on ARM bots

llvm-svn: 148315
2012-01-17 18:21:05 +00:00
Devang Patel 41b9ddeb7a Intel syntax: Robustify memory operand parsing.
llvm-svn: 148312
2012-01-17 18:00:18 +00:00
Eli Bendersky a8ad6e5666 Additional ExecutionEngine tests, as part of bringing up the MCJIT on ELF
implementation.
Currently lit still executes ExecutionEngine tests with JIT (not MCJIT) by
default. MCJIT tests can be executed manually by calling llvm-lit with
--param jit_impl=mcjit

llvm-svn: 148299
2012-01-17 09:14:54 +00:00
Nadav Rotem 86e5390dbf Fix 11769.
In CanXFormVExtractWithShuffleIntoLoad we assumed that EXTRACT_VECTOR_ELT can be later handled by the DAGCombiner.
However, in some cases on AVX, the EXTRACT_VECTOR_ELT is legalized to EXTRACT_SUBVECTOR + EXTRACT_VECTOR_ELT, which
currently is not handled by the DAGCombiner. In this patch I added a check that we only extract from the XMM part.

llvm-svn: 148298
2012-01-17 09:13:19 +00:00
Andrew Trick 12728f04ca LSR fix: broaden the check for loop preheaders.
It's becoming clear that LoopSimplify needs to unconditionally create loop preheaders. But that is a bigger fix. For now, continuing to hack LSR.
Fixes rdar://10701050 "Cannot split an edge from an IndirectBrInst" assert.

llvm-svn: 148288
2012-01-17 06:45:52 +00:00
Hal Finkel 8606e3c7e3 AggressiveAntiDepBreaker needs to skip debug values because a debug value does not have a corresponding SUnit
llvm-svn: 148260
2012-01-16 22:53:41 +00:00
Eli Friedman 206ca569aa Make sure the non-SSE lowering for fences correctly clobbers EFLAGS. PR11768.
llvm-svn: 148240
2012-01-16 16:42:21 +00:00
Eli Bendersky 4c647587b1 Adding a basic ELF dynamic loader and MC-JIT for ELF. Functionality is currently basic and will be enhanced with future patches.
Patch developed by Andy Kaylor and Daniel Malea. Reviewed on llvm-commits.

llvm-svn: 148231
2012-01-16 08:56:09 +00:00
Nadav Rotem 57935243bd [AVX] Optimize x86 VSELECT instructions using SimplifyDemandedBits.
We know that the blend instructions only use the MSB, so if the mask is
sign-extended then we can convert it into a SHL instruction. This is a
common pattern because the type-legalizer sign-extends the i1 type which
is used by the LLVM-IR for the condition.

Added a new optimization in SimplifyDemandedBits for SIGN_EXTEND_INREG -> SHL.

llvm-svn: 148225
2012-01-15 19:27:55 +00:00
Chandler Carruth fd63436553 Relax the FileCheck assertion a bit -- all we really care about is that
we're loading from the global array, not how it is spelled in the asm.
This should fix the MSVC bots.

llvm-svn: 148214
2012-01-15 09:38:59 +00:00
Chandler Carruth 4bb2eb15d4 FileCheck-ize a test, make it more specific to directly test the shift
removal desired.

llvm-svn: 148213
2012-01-15 09:32:57 +00:00
Andrew Trick 23ef0d6c40 Fix a corner case hit by redundant phi elimination running after LSR.
Fixes PR11761: bad IR w/ redundant Phi elim

llvm-svn: 148177
2012-01-14 03:17:23 +00:00
Evan Cheng 6bb95253eb After r147827 and r147902, it's now possible for unallocatable registers to be
live across BBs before register allocation. This miscompiled 197.parser
when a cmp + b are optimized to a cbnz instruction even though the CPSR def
is live-in a successor.
        cbnz    r6, LBB89_12
...
LBB89_12:
        ble     LBB89_1

The fix consists of two parts. 1) Teach LiveVariables that some unallocatable
registers might be liveouts so don't mark their last use as kill if they are.
2) ARM constantpool island pass shouldn't form cbz / cbnz if the conditional
branch does not kill CPSR.

rdar://10676853

llvm-svn: 148168
2012-01-14 01:53:46 +00:00
Chad Rosier d027d528b5 Cleanup test case by adding checks for test names.
llvm-svn: 148166
2012-01-14 01:46:51 +00:00
Rafael Espindola 78a6477b28 Add a test showing how the Leh_func_endN symbol is used.
llvm-svn: 148161
2012-01-14 00:12:59 +00:00
Devang Patel 5d85276e30 Add new test.
llvm-svn: 148128
2012-01-13 18:45:31 +00:00
NAKAMURA Takumi 1b845f712a test/CodeGen/ARM/test-sharedidx.ll: Fix for -Asserts.
llvm-svn: 148107
2012-01-13 07:03:55 +00:00
Craig Topper 9f14d9f939 Add patterns for v16i16 and v32i8 immAllZerosV to select VPXOR to match v4i64 and v8i32.
llvm-svn: 148106
2012-01-13 06:59:47 +00:00
Evan Cheng fa8326334b DAGCombine's logic for forming pre- and post- indexed loads / stores were being
overly conservative. It was concerned about cases where it would prohibit
folding simple [r, c] addressing modes. e.g.
  ldr r0, [r2]
  ldr r1, [r2, #4]
=>
  ldr r0, [r2], #4
  ldr r1, [r2]
Change the logic to look for such cases which allows it to form indexed memory
ops more aggressively.

rdar://10674430

llvm-svn: 148086
2012-01-13 01:37:24 +00:00
Dan Gohman 728db4997a Implement proper ObjC ARC objc_retainBlock "escape" analysis, so that
the optimizer doesn't eliminate objc_retainBlock calls which are needed
for their side effect of copying blocks onto the heap.
This implements rdar://10361249.

llvm-svn: 148076
2012-01-13 00:39:07 +00:00
Elena Demikhovsky 060f6ccdb8 Fixed a bug in LowerVECTOR_SHUFFLE caused assertion failure
lc: X86ISelLowering.cpp:6480: llvm::SDValue llvm::X86TargetLowering::LowerVECTOR_SHUFFLE(llvm::SDValue, llvm::SelectionDAG&) const: Assertion `V1.getOpcode() != ISD::UNDEF&&  "Op 1 of shuffle should not be undef"' failed.
Added a test.

llvm-svn: 148044
2012-01-12 20:33:10 +00:00
Rafael Espindola 9a167eeaff Add error-reporting tests for platforms that don't support segmented stacks.
Patch by Brian Anderson.

llvm-svn: 148042
2012-01-12 20:26:13 +00:00
Rafael Espindola 00e861ed57 Support segmented stacks on 64-bit FreeBSD.
This patch uses tcb_spare field in the tcb structure to store info.
Patch by Jyun-Yan You.

llvm-svn: 148041
2012-01-12 20:24:30 +00:00
Rafael Espindola 10745d3381 Support segmented stacks on win32.
Uses the pvArbitrary slot of the TIB, which is reserved for applications. We
only support frames with a static size.

llvm-svn: 148040
2012-01-12 20:22:08 +00:00
Devang Patel b04a09a515 Remove test case, as Chris suggested.
llvm-svn: 148039
2012-01-12 19:54:02 +00:00
Devang Patel 0014e38a8b Add test case to check intel syntax parsing.
llvm-svn: 148034
2012-01-12 18:40:46 +00:00
Nadav Rotem 0a0a829bea Fix a bug in the AVX 256-bit shuffle code in cases where the splat element is on the boundary of two 128-bit vectors.
The attached testcase was stuck in an endless loop.

llvm-svn: 148027
2012-01-12 15:31:55 +00:00
Benjamin Kramer 5b3aa60b44 X86: Generalize the x << (y & const) optimization to also catch masks with more set bits set than 31 or 63.
llvm-svn: 148024
2012-01-12 12:41:34 +00:00
Nadav Rotem b5ce6ee835 On AVX, we can load v8i32 at a time. The bug happens when two uneven loads are used.
When we load the v12i32 type, the GenWidenVectorLoads method generates two loads: v8i32 and v4i32 
and attempts to use CONCAT_VECTORS to join them. In this fix I concat undef values to widen 
the smaller value. The test "widen_load-2.ll" also exposes this bug on AVX.

llvm-svn: 147964
2012-01-11 20:19:17 +00:00
Bill Wendling bb3c4ad135 Check to make sure that the CFString's back store ends up in the correct section.
llvm-svn: 147961
2012-01-11 19:33:37 +00:00
Rafael Espindola d90466bcbf Support segmented stacks on mac.
This uses TLS slot 90, which actually belongs to JavaScriptCore. We only support
frames with static size
Patch by Brian Anderson.

llvm-svn: 147960
2012-01-11 19:00:37 +00:00
Rafael Espindola 1e1b797989 Split segmented stacks tests into tests for static- and dynamic-size frames.
Patch by Brian Anderson.

llvm-svn: 147959
2012-01-11 18:51:03 +00:00
Rafael Espindola 4eecacb9c8 Generate the segmented stack prologue for fastcc too.
Patch by Brian Anderson.

llvm-svn: 147958
2012-01-11 18:41:19 +00:00
Chandler Carruth 3212a34269 Revert r147945 which disabled an addressing mode transformation. I had
hoped this would revive one of the llvm-gcc selfhost build bots, but it
didn't so it doesn't appear that my transform is the culprit.

If anyone else is seeing failures, please let me know!

llvm-svn: 147957
2012-01-11 18:36:12 +00:00
Rafael Espindola 2b89448d60 Use unsigned comparison in segmented stack prologue.
This is a comparison of two addresses, and GCC does the comparison unsigned.

Patch by Brian Anderson.

llvm-svn: 147954
2012-01-11 18:23:35 +00:00
Rafael Espindola 6635ae1c17 Explicitly set the scale to 1 on some segstack prologue instrs.
Patch by Brian Anderson.

llvm-svn: 147952
2012-01-11 18:14:03 +00:00
Kevin Enderby 6223cf72e6 The error check for using -g with a .s file already containing dwarf .file
directives was in the wrong place and getting triggered incorectly with a
cpp .file directive.  This change fixes that and adds a test case.

llvm-svn: 147951
2012-01-11 18:04:47 +00:00
Jan Sjödin 21f83d9f36 Add XOP Intrinsics and tests
llvm-svn: 147949
2012-01-11 15:20:20 +00:00
Nadav Rotem baae7e4577 Fix a bug in the lowering of BUILD_VECTOR for AVX. SCALAR_TO_VECTOR does not zero untouched elements. Use INSERT_VECTOR_ELT instead.
llvm-svn: 147948
2012-01-11 14:07:51 +00:00
Duncan Sands 0bf46b5363 Don't try to create a GEP when the pointee type is unsized (such GEPs
are invalid).  Fixes a crash on array1.C from the GCC testsuite when
compiled with dragonegg.

llvm-svn: 147946
2012-01-11 12:20:08 +00:00
Chandler Carruth 9bc48e5215 Disable the transformation I added in r147936 to see if it fixes some
strange build bot failures that look like a miscompile into an infloop.
I'll investigate this tomorrow, but I'd both like to know whether my
patch is the culprit, and get the bots back to green.

llvm-svn: 147945
2012-01-11 12:17:47 +00:00
Jakob Stoklund Olesen 6039983755 Fix undefined code and reenable test case.
I don't think the compact encoding code is right, but at least is has
defined behavior now.

llvm-svn: 147938
2012-01-11 09:08:04 +00:00
Chandler Carruth 55b2cdee26 Teach the X86 instruction selection to do some heroic transforms to
detect a pattern which can be implemented with a small 'shl' embedded in
the addressing mode scale. This happens in real code as follows:

  unsigned x = my_accelerator_table[input >> 11];

Here we have some lookup table that we look into using the high bits of
'input'. Each entity in the table is 4-bytes, which means this
implicitly gets turned into (once lowered out of a GEP):

  *(unsigned*)((char*)my_accelerator_table + ((input >> 11) << 2));

The shift right followed by a shift left is canonicalized to a smaller
shift right and masking off the low bits. That hides the shift right
which x86 has an addressing mode designed to support. We now detect
masks of this form, and produce the longer shift right followed by the
proper addressing mode. In addition to saving a (rather large)
instruction, this also reduces stalls in Intel chips on benchmarks I've
measured.

In order for all of this to work, one part of the DAG needs to be
canonicalized *still further* than it currently is. This involves
removing pointless 'trunc' nodes between a zextload and a zext. Without
that, we end up generating spurious masks and hiding the pattern.

llvm-svn: 147936
2012-01-11 08:41:08 +00:00
Stepan Dyatkovskiy 8216569812 Improved compile time:
1. Size heuristics changed. Now we calculate number of unswitching
branches only once per loop.
2. Some checks was moved from UnswitchIfProfitable to
processCurrentLoop, since it is not changed during processCurrentLoop
iteration. It allows decide to skip some loops at an early stage.
Extended statistics:
- Added total number of instructions analyzed.

llvm-svn: 147935
2012-01-11 08:40:51 +00:00
NAKAMURA Takumi 0e60839e11 llvm/test/CodeGen/X86/zext-fold.ll: Relax an expression in stack offset.
llvm-svn: 147928
2012-01-11 07:34:22 +00:00
NAKAMURA Takumi 9e823f6ae1 llvm/test/CodeGen/X86/sub-with-overflow.ll: Add explicit -mtriple=i686-linux.
llvm-svn: 147927
2012-01-11 07:34:14 +00:00
Rafael Espindola 647841b181 Add big endian mips support. Based on a patch by Jack Carter.
llvm-svn: 147924
2012-01-11 04:04:14 +00:00
Rafael Espindola 870c4e92b9 Add the skeleton of an asm parser for mips.
llvm-svn: 147923
2012-01-11 03:56:41 +00:00
Andrew Trick 642f0f6a40 ARM Ld/St Optimizer fix.
Allow LDRD to be formed from pairs with different LDR encodings. This was the original intention of the pass. Somewhere along the way, the LDR opcodes were refined which broke the optimization. We really don't care what the original opcodes are as long as they both map to the same LDRD and the immediate still fits.

Fixes rdar://10435045 ARMLoadStoreOptimization cannot handle mixed LDRi8/LDRi12

llvm-svn: 147922
2012-01-11 03:56:08 +00:00
Jakob Stoklund Olesen 05ff7f06fb Disable test that seems to expose an unrelated Linux issue.
llvm-svn: 147921
2012-01-11 03:42:27 +00:00
Jakob Stoklund Olesen 8b1d023a4a Detect when a value is undefined on an edge to a landing pad.
Consider this code:

int h() {
  int x;
  try {
    x = f();
    g();
  } catch (...) {
    return x+1;
  }
  return x;
}

The variable x is undefined on the first edge to the landing pad, but it
has the f() return value on the second edge to the landing pad.

SplitAnalysis::getLastSplitPoint() would assume that the return value
from f() was live into the landing pad when f() throws, which is of
course impossible.

Detect these cases, and treat them as if the landing pad wasn't there.
This allows spill code to be inserted after the function call to f().

<rdar://problem/10664933>

llvm-svn: 147912
2012-01-11 02:07:05 +00:00
Bill Wendling c79155192d If the global variable is removed by the linker, then don't constant merge it
with other symbols.

An object in the __cfstring section is suppoed to be filled with CFString
objects, which have a pointer to ___CFConstantStringClassReference followed by a
pointer to a __cstring. If we allow the object in the __cstring section to be
merged with another global, then it could end up in any section. Because the
linker is going to remove these symbols in the final executable, we shouldn't
bother to merge them.
<rdar://problem/10564621>

llvm-svn: 147899
2012-01-11 00:13:08 +00:00
Eric Christopher 43a1182975 Don't avoid recursing for pointer types, just reference types. Expand on
the comment.

Fixes constvars.exp on the gdb test builder.

llvm-svn: 147897
2012-01-11 00:01:29 +00:00
Chad Rosier a415140a2c Add test case for r147881.
llvm-svn: 147891
2012-01-10 23:09:53 +00:00
Joerg Sonnenberger 96cd35cf6d Default stack alignment for 32bit x86 should be 4 Bytes, not 8 Bytes.
Add a test that checks the stack alignment of a simple function for
Darwin, Linux and NetBSD for 32bit and 64bit mode.

llvm-svn: 147888
2012-01-10 22:43:53 +00:00
Jakob Stoklund Olesen 20f1dd5faf Consider unknown alignment caused by OptimizeThumb2Instructions().
This function runs after all constant islands have been placed, and may
shrink some instructions to their 2-byte forms.  This can actually cause
some constant pool entries to move out of range because of growing
alignment padding.

Treat instructions that may be shrunk the same as inline asm - they
erode the known alignment bits.

Also reinstate an old assertion in verify(). It is correct now that
basic block offsets include alignments.

Add a single large test case that will hopefully exercise many parts of
the constant island pass.

<rdar://problem/10670199>

llvm-svn: 147885
2012-01-10 22:32:14 +00:00
Jim Grosbach 74ac7d50a1 ARM updating VST2 pseudo-lowering fixed vs. register update.
rdar://10663487

llvm-svn: 147876
2012-01-10 21:11:12 +00:00
Kevin Enderby 8d4a2204b7 Various crash reporting tools have a problem with the dwarf generated for
assembly source when it generates the TAG_subprogram dwarf debug info for
the labels that have nothing between them as in this bit of assembly source:

% cat ZeroLength.s 
_func1:
_func2:
 nop

One solution would be to not emit the subsequent labels with the same address
and use the next label with a different address or the end of the section for
the AT_high_pc value of the TAG_subprogram.

Turns out in llvm-mc it is not possible in all cases to determine of two
symbols have the same value at the point we put out the TAG_subprogram dwarf
debug info.

So we will have llvm-mc instead of putting out TAG_subprogram's put out
DW_TAG_label's.  And the DW_TAG_label does not have a AT_high_pc value which
avoids the problem.

This commit is only the functional change to make the diffs clear as to what is
really being changed.  The next commit will be to clean up the names of such
things like MCGenDwarfSubprogramEntry to something like MCGenDwarfLabelEntry.

rdar://10666925

llvm-svn: 147860
2012-01-10 17:52:29 +00:00
Nadav Rotem 61bdf79035 Fix a bug in the legalization of shuffle vectors. When we emulate shuffles using BUILD_VECTORS we may be using a BV of different type. Make sure to cast it back.
llvm-svn: 147851
2012-01-10 14:28:46 +00:00
Craig Topper 430f3f1bd6 Fix a crash in AVX2 when trying to broadcast a double into a 128-bit vector. There is no vbroadcastsd xmm, but we do need to support 64-bit integers broadcasted into xmm. Also factor the AVX check into the isVectorBroadcast function. This makes more sense since the AVX2 check was already inside.
llvm-svn: 147844
2012-01-10 08:23:59 +00:00
Evan Cheng 0be4144a68 Allow machine-cse to look across MBB boundary when cse'ing instructions that
define physical registers. It's currently very restrictive, only catching
cases where the CE is in an immediate (and only) predecessor. But it catches
a surprising large number of cases.

rdar://10660865

llvm-svn: 147827
2012-01-10 02:02:58 +00:00
Andrew Trick d5d2db9af9 Enable LSR IV Chains with sufficient heuristics.
These heuristics are sufficient for enabling IV chains by
default. Performance analysis has been done for i386, x86_64, and
thumbv7. The optimization is rarely important, but can significantly
speed up certain cases by eliminating spill code within the
loop. Unrolled loops are prime candidates for IV chains. In many
cases, the final code could still be improved with more target
specific optimization following LSR. The goal of this feature is for
LSR to make the best choice of induction variables.

Instruction selection may not completely take advantage of this
feature yet. As a result, there could be cases of slight code size
increase.

Code size can be worse on x86 because it doesn't support postincrement
addressing. In fact, when chains are formed, you may see redundant
address plus stride addition in the addressing mode. GenerateIVChains
tries to compensate for the common cases.

On ARM, code size increase can be mitigated by using postincrement
addressing, but downstream codegen currently misses some opportunities.

llvm-svn: 147826
2012-01-10 01:45:08 +00:00
Andrew Trick 248d410e3e Adding IV chain generation to LSR.
After collecting chains, check if any should be materialized. If so,
hide the chained IV users from the LSR solver. LSR will only solve for
the head of the chain. GenerateIVChains will then materialize the
chained IV users by computing the IV relative to its previous value in
the chain.

In theory, chained IV users could be exposed to LSR's solver. This
would be considerably complicated to implement and I'm not aware of a
case where we need it. In practice it's more important to
intelligently prune the search space of nontrivial loops before
running the solver, otherwise the solver is often forced to prune the
most optimal solutions. Hiding the chained users does this well, so
that LSR is more likely to find the best IV for the chain as a whole.

llvm-svn: 147801
2012-01-09 21:18:52 +00:00
Benjamin Kramer f9d0cc0160 InstCombine: Teach foldLogOpOfMaskedICmpsHelper that sign bit tests are bit tests.
This subsumes several other transforms while enabling us to catch more cases.

llvm-svn: 147777
2012-01-09 17:23:27 +00:00
Chandler Carruth ea27c16adc Cleanup and FileCheck-ize a test.
llvm-svn: 147772
2012-01-09 09:44:26 +00:00
Craig Topper ef7f5bf8c9 Clean up patterns for MOVNT*. Not sure why there were floating point types on MOVNTPS and MOVNTDQ. And v4i64 was completely missing.
llvm-svn: 147767
2012-01-09 06:52:46 +00:00
Rafael Espindola f28213ca01 Don't print an unused label before .cfi_endproc.
llvm-svn: 147763
2012-01-09 00:17:29 +00:00
Craig Topper 744f6311d3 Don't disable MMX support when AVX is enabled. Fix predicates for MMX instructions that were added along with SSE instructions to check for AVX in addition to SSE level.
llvm-svn: 147762
2012-01-09 00:11:29 +00:00
Benjamin Kramer 6609f741b9 Tweak my last commit to be less conservative about uses.
We still save an instruction when just the "and" part is replaced.
Also change the code to match comments more closely.

llvm-svn: 147753
2012-01-08 21:12:51 +00:00
Benjamin Kramer da37e15345 InstCombine: If we have a bit test and a sign test anded/ored together, merge the sign bit into the bit test.
This is common in bit field code, e.g. checking if the first or the last bit of a bit field is set.

llvm-svn: 147749
2012-01-08 18:32:24 +00:00
Victor Umansky 540651cf59 Reverted commit #147601 upon Evan's request.
llvm-svn: 147748
2012-01-08 17:20:33 +00:00
Rafael Espindola 382412032c Don't print a label before .cfi_startproc when we don't need to. This makes
the produce assembly when using CFI just a bit more readable.

llvm-svn: 147743
2012-01-07 22:42:19 +00:00
Jakob Stoklund Olesen 8cdce7e690 Use getRegForValue() to materialize the address of ARM globals.
This enables basic local CSE, giving us 20% smaller code for
consumer-typeset in -O0 builds.

<rdar://problem/10658692>

llvm-svn: 147720
2012-01-07 04:07:22 +00:00
Andrew Trick 732ad80dbb LSR: Don't optimize loops if an outer loop has no preheader.
LoopSimplify may not run on some outer loops, e.g. because of indirect
branches. SCEVExpander simply cannot handle outer loops with no preheaders.
Fixes rdar://10655343 SCEVExpander segfault.

llvm-svn: 147718
2012-01-07 03:16:50 +00:00
Rafael Espindola 0708209642 Split Finish into Finish and FinishImpl to have a common place to do end of
file error checking. Use that to error on an unfinished cfi_startproc.

The error is not nice, but is already better than a segmentation fault.

llvm-svn: 147717
2012-01-07 03:13:18 +00:00
Evan Cheng 00b1a3cd7e Added a late machine instruction copy propagation pass. This catches
opportunities that only present themselves after late optimizations
such as tail duplication .e.g.
## BB#1:
        movl    %eax, %ecx
        movl    %ecx, %eax
        ret

The register allocator also leaves some of them around (due to false
dep between copies from phi-elimination, etc.)

This required some changes in codegen passes. Post-ra scheduler and the
pseudo-instruction expansion passes have been moved after branch folding
and tail merging. They were before branch folding before because it did
not always update block livein's. That's fixed now. The pass change makes
independently since we want to properly schedule instructions after
branch folding / tail duplication.

rdar://10428165
rdar://10640363

llvm-svn: 147716
2012-01-07 03:02:36 +00:00
Jakob Stoklund Olesen 68f034ee1a Use movw+movt in ARMFastISel::ARMMaterializeGV.
This eliminates a lot of constant pool entries for -O0 builds of code
with many global variable accesses.

This speeds up -O0 codegen of consumer-typeset by 2x because the
constant island pass no longer has to look at thousands of constant pool
entries.

<rdar://problem/10629774>

llvm-svn: 147712
2012-01-07 01:47:05 +00:00
Andrew Trick 5adedf5d47 Extended replaceCongruentPhis to handle mixed phi types.
llvm-svn: 147707
2012-01-07 01:12:09 +00:00
Eric Christopher c206d46709 Make the 'x' constraint work for AVX registers as well.
Fixes rdar://10614894

llvm-svn: 147704
2012-01-07 01:02:09 +00:00
Andrew Trick cbf2fe066a comment typo
llvm-svn: 147701
2012-01-07 00:29:20 +00:00
Jakob Stoklund Olesen 68a922c0e9 Enable aligned NEON spilling by default.
Experiments show this to be a small speedup for modern ARM cores.

llvm-svn: 147689
2012-01-06 22:19:37 +00:00
Dan Gohman 5ab9c0a927 Fix SpeculativelyExecuteBB to either speculate all or none of the phis
present in the bottom of the CFG triangle, as the transformation isn't
ever valuable if the branch can't be eliminated.

Also, unify some heuristics between SimplifyCFG's multiple
if-converters, for consistency.

This fixes rdar://10627242.

llvm-svn: 147630
2012-01-05 23:58:56 +00:00
Eli Friedman 55fa49f32d PR11705, part 2: globalopt shouldn't put inttoptr/ptrtoint operations into global initializers if there's an implied extension or truncation.
llvm-svn: 147625
2012-01-05 23:03:32 +00:00
Rafael Espindola 23f8d64b58 Link symbols with different visibilities according to the rules in the
System V Application Binary Interface. This lets us use
-fvisibility-inlines-hidden with LTO.
Fixes PR11697.

llvm-svn: 147624
2012-01-05 23:02:01 +00:00
Dan Gohman 5267211899 Revert r56315. When the instruction to speculate is a load, this
code can incorrectly move the load across a store. This never
happens in practice today, but only because the current
heuristics accidentally preclude it.

llvm-svn: 147623
2012-01-05 22:54:35 +00:00
Chandler Carruth e041a30bb9 Prevent a DAGCombine from firing where there are two uses of
a combined-away node and the result of the combine isn't substantially
smaller than the input, it's just canonicalized. This is the first part
of a significant (7%) performance gain for Snappy's hot decompression
loop.

llvm-svn: 147604
2012-01-05 11:05:55 +00:00
Chandler Carruth 6bc151f5d4 Cleanup and FileCheck-ize a test.
llvm-svn: 147603
2012-01-05 11:05:47 +00:00
Victor Umansky 9255b6d9fe Peephole optimization of ptest-conditioned branch in X86 arch. Performs instruction combining of sequences generated by ptestz/ptestc intrinsics to ptest+jcc pair for SSE and AVX.
Testing: passed 'make check' including LIT tests for all sequences being handled (both SSE and AVX)

Reviewers: Evan Cheng, David Blaikie, Bruno Lopes, Elena Demikhovsky, Chad Rosier, Anton Korobeynikov
llvm-svn: 147601
2012-01-05 08:46:19 +00:00
Benjamin Kramer aca1885695 FileCheck hygiene.
llvm-svn: 147580
2012-01-05 00:43:34 +00:00
Jakob Stoklund Olesen d110e2a83f Reapply r146997, "Heed spill slot alignment on ARM."
Now that canRealignStack() understands frozen reserved registers, it is
safe to use it for aligned spill instructions.

It will only return true if the registers reserved at the beginning of
register allocation allow for dynamic stack realignment.

<rdar://problem/10625436>

llvm-svn: 147579
2012-01-05 00:26:57 +00:00
Nick Lewycky 0c48afa0ed Teach instcombine all sorts of great stuff about shifts that have exact, nuw or
nsw bits on them.

llvm-svn: 147528
2012-01-04 09:28:29 +00:00
NAKAMURA Takumi 91a3f886ef test/CodeGen/X86/jump_sign.ll: Add -mcpu=pentiumpro for non-x86 hosts. It uses "cmov".
llvm-svn: 147521
2012-01-04 03:52:23 +00:00
Akira Hatanaka c669d7a6db Have getRegForInlineAsmConstraint return the correct register class when target
is Mips64.

llvm-svn: 147516
2012-01-04 02:45:01 +00:00
Evan Cheng 801d98b3f0 Fix more places which should be checking for iOS, not darwin.
llvm-svn: 147513
2012-01-04 01:55:04 +00:00
Evan Cheng 104dbb0fd1 For x86, canonicalize max
(x > y) ? x : y
=>
(x >= y) ? x : y

So for something like
(x - y) > 0 : (x - y) ? 0
It will be
(x - y) >= 0 : (x - y) ? 0

This makes is possible to test sign-bit and eliminate a comparison against
zero. e.g.
subl   %esi, %edi
testl  %edi, %edi
movl   $0, %eax
cmovgl %edi, %eax
=>
xorl   %eax, %eax
subl   %esi, $edi
cmovsl %eax, %edi

rdar://10633221

llvm-svn: 147512
2012-01-04 01:41:39 +00:00
Kostya Serebryany 842ae27ae3 [asan] one more test for asan instrumentation: (*a)++ should be instrumented only once.
llvm-svn: 147509
2012-01-04 01:02:14 +00:00
Jakob Stoklund Olesen 1b7f2a7638 Revert r146997, "Heed spill slot alignment on ARM."
This patch caused a miscompilation of oggenc because a frame pointer was
suddenly needed halfway through register allocation.

<rdar://problem/10625436>

llvm-svn: 147487
2012-01-03 22:34:35 +00:00
Nadav Rotem 6d31bac85e Revert 147426 because it caused pr11696.
llvm-svn: 147485
2012-01-03 22:19:42 +00:00
Nadav Rotem 1e7dda13c8 Fix incorrect widening of the bitcast sdnode in case the incoming operand is integer-promoted.
llvm-svn: 147484
2012-01-03 22:12:28 +00:00
Chad Rosier 493c1b3152 Enhance DAGCombine for transforming 128->256 casts into a vmovaps, rather
then a vxorps + vinsertf128 pair if the original vector came from a load.
rdar://10594409

llvm-svn: 147481
2012-01-03 21:05:52 +00:00
Elena Demikhovsky 8ec21a2801 Fixed a bug in SelectionDAG.cpp.
The failure seen on win32, when i64 type is illegal.
It happens on stage of conversion VECTOR_SHUFFLE to BUILD_VECTOR.

The failure message is:
llc: SelectionDAG.cpp:784: void VerifyNodeCommon(llvm::SDNode*): Assertion `(I->getValueType() == EltVT || (EltVT.isInteger() && I->getValueType().isInteger() && EltVT.bitsLE(I->getValueType()))) && "Wrong operand type!"' failed.

I added a special test that checks vector shuffle on win32.

llvm-svn: 147445
2012-01-03 11:59:04 +00:00
Andrew Trick cbcc98fb50 Fix SCEVExpander to handle loops with no preheader when LSR gives it a
"phony" insertion point.

Fixes rdar://10619599: "SelectionDAGBuilder shouldn't visit PHI nodes!" assert

llvm-svn: 147439
2012-01-02 21:25:10 +00:00
Nadav Rotem 6c7a0e6c8b Optimize the sequence blend(sign_extend(x)) to blend(shl(x)) since SSE blend instructions only look at the highest bit.
llvm-svn: 147426
2012-01-02 08:05:46 +00:00
Craig Topper b910984458 Allow CRC32 instructions to be selected when AVX is enabled.
llvm-svn: 147411
2012-01-01 19:51:58 +00:00
Craig Topper 1c064e0a89 Fix sfence, lfence, mfence, and clflush to be able to be selected when AVX is enabled. Fix monitor and mwait to require SSE3 or AVX, previously they worked even if SSE3 was disabled. Make prefetch instructions not set the execution domain since they don't use XMM registers.
llvm-svn: 147409
2012-01-01 19:40:22 +00:00
Rafael Espindola d3df940169 Revert 147399. It broke CodeGen/ARM/vext.ll.
llvm-svn: 147400
2012-01-01 17:36:23 +00:00
Elena Demikhovsky 67f80c3432 Fixed a bug in SelectionDAG.cpp.
The failure seen on win32, when i64 type is illegal.
It happens on stage of conversion VECTOR_SHUFFLE to BUILD_VECTOR.

The failure message is:
llc: SelectionDAG.cpp:784: void VerifyNodeCommon(llvm::SDNode*): Assertion `(I->getValueType() == EltVT || (EltVT.isInteger() && I->getValueType().isInteger() && EltVT.bitsLE(I->getValueType()))) && "Wrong operand type!"' failed.

I added a special test that checks vector shuffle on win32.

llvm-svn: 147399
2012-01-01 16:22:47 +00:00
Craig Topper d51092d93a Add patterns for integer forms of SHUFPD/VSHUFPD with a memory load.
llvm-svn: 147393
2011-12-31 23:24:49 +00:00
Craig Topper 0e796fee11 Fix typo in a SHUFPD and VSHUFPD pattern that prevented SHUFPD/VSHUFPD with a load from being selected.
llvm-svn: 147392
2011-12-31 23:15:11 +00:00
Nick Lewycky b59008c694 Make use of the exact bit when optimizing '(X >>exact 3) << 1' to eliminate the
'and' that would zero out the trailing bits, and to produce an exact shift
ourselves.

llvm-svn: 147391
2011-12-31 21:30:22 +00:00
Craig Topper 2ba766ae84 Add disassembler support for VPERMIL2PD and VPERMIL2PS.
llvm-svn: 147368
2011-12-30 06:23:39 +00:00
Craig Topper 03a0beda88 Add FMA4 instructions to disassembler.
llvm-svn: 147367
2011-12-30 05:20:36 +00:00
Craig Topper 6c08930c5e Change FMA4 memory forms to use memopv* instead of alignedloadv*. No need to force alignment on these instructions. Add a couple testcases for memory forms.
llvm-svn: 147361
2011-12-30 02:18:36 +00:00
Craig Topper 2ca79b9d4b Fix load size for FMA4 SS/SD instructions. They need to use f32 and f64 size, but with the special handling to be compatible with the intrinsic expecting a vector. Similar handling is already used elsewhere.
llvm-svn: 147360
2011-12-30 01:49:53 +00:00
Hal Finkel 692d1fb355 Cleanup stack/frame register define/kill states. This fixes two bugs:
1. The ST*UX instructions that store and update the stack pointer did not set define/kill on R1. This became a problem when I activated post-RA scheduling (and had incorrectly adjusted the Frames-large test).

2. eliminateFrameIndex did not kill its scavenged temporary register, and this could cause the scavenger to exhaust all available registers (and its emergency spill slot) when there were a lot of CR values to spill. The 2010-02-12-saveCR test has been adjusted to check for this.

llvm-svn: 147359
2011-12-30 00:34:00 +00:00
Rafael Espindola 4ea99816ef Implement cfi_restore. Patch by Brian Anderson!
llvm-svn: 147356
2011-12-29 21:43:03 +00:00
Craig Topper d773607eee Fix execution domains for PS/PD FMA3 instructions. Add SS/SD forms o FMA3 instructions.
llvm-svn: 147353
2011-12-29 20:43:40 +00:00
Rafael Espindola ef4aa35164 Implement .cfi_escape. Patch by Brian Anderson!
llvm-svn: 147352
2011-12-29 20:24:47 +00:00
Craig Topper 8cab06a214 Expose FMA3 instructions to the disassembler.
llvm-svn: 147351
2011-12-29 20:03:14 +00:00
Nick Lewycky 4c378a4453 Change CaptureTracking to pass a Use* instead of a Value* when a value is
captured. This allows the tracker to look at the specific use, which may be
especially interesting for function calls.

Use this to fix 'nocapture' deduction in FunctionAttrs. The existing one does
not iterate until a fixpoint and does not guarantee that it produces the same
result regardless of iteration order. The new implementation builds up a graph
of how arguments are passed from function to function, and uses a bottom-up walk
on the argument-SCCs to assign nocapture. This gets us nocapture more often, and
does so rather efficiently and independent of iteration order.

llvm-svn: 147327
2011-12-28 23:24:21 +00:00