Commit Graph

3837 Commits

Author SHA1 Message Date
David Blaikie 33111dfea0 Remove the (apparently) unnecessary debug info metadata indirection.
The main lists of debug info metadata attached to the compile_unit had an extra
layer of metadata nodes they went through for no apparent reason. This patch
removes that (& still passes just as much of the GDB 7.5 test suite). If anyone
can show evidence as to why these extra metadata nodes are there I'm open to
reverting this patch & documenting why they're there.

llvm-svn: 174266
2013-02-02 05:56:24 +00:00
Shuxin Yang cadd8a068e rdar://13126763
Fix a bug in DAGCombine. The symptom is mistakenly optimizing expression
"x + x*x" into "x * 3.0".

llvm-svn: 174239
2013-02-02 00:22:03 +00:00
David Sehr 8114a7a651 Two changes relevant to LEA and x32:
1) allows the use of RIP-relative addressing in 32-bit LEA instructions under
   x86-64 (ILP32 and LP64)
2) separates the size of address registers in 64-bit LEA instructions from
   control by ILP32/LP64.

llvm-svn: 174208
2013-02-01 19:28:09 +00:00
Lang Hames dd47804394 When lowering memcpys to loads and stores, make sure we don't promote alignments
past the natural stack alignment.

llvm-svn: 174085
2013-01-31 20:23:43 +00:00
Eric Christopher 4e3e94c13d Check and allow floating point registers to select the size of the
register for inline asm. This conforms to how gcc allows for effective
casting of inputs into gprs (fprs is already handled).

llvm-svn: 174008
2013-01-31 00:50:46 +00:00
Eli Bendersky 6c84b90b70 Replace some more greps with FileChecks in tests
llvm-svn: 174006
2013-01-31 00:44:12 +00:00
Eli Bendersky a320e00e74 Rewrite this test properly with a FileCheck instead of greps
llvm-svn: 173997
2013-01-31 00:11:52 +00:00
Evan Cheng 9449ec956f Forgot the test case before.
llvm-svn: 173988
2013-01-30 22:57:00 +00:00
Benjamin Kramer 05cc93964a When the legalizer is splitting vector shifts, the result may not have the right shift amount type.
Fix that by adding a cast to the shift expander. This came up with vector shifts
on sse-less X86 CPUs.

   <2 x i64>       = shl <2 x i64> <2 x i64>
-> i64,i64         = shl i64 i64; shl i64 i64
-> i32,i32,i32,i32 = shl_parts i32 i32 i64; shl_parts i32 i32 i64

Now we cast the last two i64s to the right type. Fixes the crash in PR14668.

llvm-svn: 173615
2013-01-27 11:19:11 +00:00
Benjamin Kramer 99c68dd964 X86: Do splat promotion later, so the optimizer can chew on it first.
This catches many cases where we can emit a more efficient shuffle for a
specific mask or when the mask contains undefs. Once the splat is lowered to
unpacks we can't do that anymore.

There is a possibility of moving the promotion after pshufb matching, but I'm
not sure if pshufb with a mask loaded from memory is faster than 3 shuffles, so
I avoided that for now.

llvm-svn: 173569
2013-01-26 11:44:21 +00:00
Benjamin Kramer 7268a05178 FileCheckize and merge some tests.
llvm-svn: 173568
2013-01-26 11:14:32 +00:00
Eli Bendersky 597fc1233a In this patch, we teach X86_64TargetMachine that it has a ILP32
(defined by the x32 ABI) mode, in which case its pointers are 32-bits
in size. This knowledge is also added to X86RegisterInfo that now
returns the appropriate registers in getPointerRegClass.

There are many outcomes to this change. In order to keep the patches
separate and manageable, we start by focusing on some simple testable
cases. The patch adds a test with passing a pointer to a function -
focusing on the difference between the two data models for x86-64.
Another test is added for handling of 'sret' arguments (and
functionality is added in X86ISelLowering to make it work).

A note on naming: the "x32 ABI" document refers to the AMD64
architecture (in LLVM it's distinguished by being is64Bits() in the
x86 subtarget) with two variations: the LP64 (default) data model, and
the ILP32 data model. This patch adds predicates to the subtarget
which are consistent with this naming scheme.

llvm-svn: 173503
2013-01-25 22:07:43 +00:00
Eli Bendersky e6abe83258 Now that llvm-dwarfdump supports flags to specify which DWARF section to dump,
use them in tests that run llvm-dwarfdump. This is in order to make tests as
specific as possible.

llvm-svn: 173498
2013-01-25 21:44:53 +00:00
Andrew Trick e2c3f5c982 MIsched: Improve the interface to SchedDFS analysis (subtrees).
Allow the strategy to select SchedDFS. Allow the results of SchedDFS
to affect initialization of the scheduler state.

llvm-svn: 173425
2013-01-25 06:33:57 +00:00
Andrew Trick 44f750a3e5 MISched: Add SchedDFSResult to ScheduleDAGMI to formalize the
interface and allow other strategies to select it.

llvm-svn: 173413
2013-01-25 04:01:04 +00:00
Bill Wendling 7c8f96a91b Add the heuristic to differentiate SSPStrong from SSPRequired.
The requirements of the strong heuristic are:

* A Protector is required for functions which contain an array, regardless of
  type or length.

* A Protector is required for functions which contain a structure/union which
  contains an array, regardless of type or length.  Note, there is no limit to
  the depth of nesting.

* A protector is required when the address of a local variable (i.e., stack
  based variable) is exposed. (E.g., such as through a local whose address is
  taken as part of the RHS of an assignment or a local whose address is taken as
  part of a function argument.)

llvm-svn: 173231
2013-01-23 06:43:53 +00:00
Bill Wendling d154e283f2 Add the IR attribute 'sspstrong'.
SSPStrong applies a heuristic to insert stack protectors in these situations:

* A Protector is required for functions which contain an array, regardless of
  type or length.

* A Protector is required for functions which contain a structure/union which
  contains an array, regardless of type or length.  Note, there is no limit to
  the depth of nesting.

* A protector is required when the address of a local variable (i.e., stack
  based variable) is exposed. (E.g., such as through a local whose address is
  taken as part of the RHS of an assignment or a local whose address is taken as
  part of a function argument.)

This patch implements the SSPString attribute to be equivalent to
SSPRequired. This will change in a subsequent patch.

llvm-svn: 173230
2013-01-23 06:41:41 +00:00
Michael Liao 3dffc5e2b7 Fix an issue of pseudo atomic instruction DAG schedule
- Add list of physical registers clobbered in pseudo atomic insts
  Physical registers are clobbered when pseudo atomic instructions are
  expanded. Add them in clobber list to prevent DAG scheduler to
  mis-schedule them after these insns are declared side-effect free.
- Add test case from Michael Kuperstein <michael.m.kuperstein@intel.com>

llvm-svn: 173200
2013-01-22 21:47:38 +00:00
NAKAMURA Takumi 9439237063 llvm/test/CodeGen/X86/win_ftol2.ll: Add -cpu=generic to appease valgrind.
On valgrind the processor is reported;
  Host CPU: athlon-fx

llvm-svn: 172983
2013-01-20 15:40:02 +00:00
Nadav Rotem 9450fcfff1 Revert 172708.
The optimization handles esoteric cases but adds a lot of complexity both to the X86 backend and to other backends.
This optimization disables an important canonicalization of chains of SEXT nodes and makes SEXT and ZEXT asymmetrical.
Disabling the canonicalization of consecutive SEXT nodes into a single node disables other DAG optimizations that assume
that there is only one SEXT node. The AVX mask optimizations is one example. Additionally this optimization does not update the cost model.

llvm-svn: 172968
2013-01-20 08:35:56 +00:00
Nadav Rotem 7b3120b9ae On Sandybridge split unaligned 256bit stores into two xmm-sized stores.
llvm-svn: 172894
2013-01-19 08:38:41 +00:00
Nadav Rotem 7431211214 On Sandybridge loading unaligned 256bits using two XMM loads (vmovups and vinsertf128) is faster than using a single vmovups instruction.
llvm-svn: 172868
2013-01-18 23:10:30 +00:00
NAKAMURA Takumi b72e763325 llvm/test/CodeGen/X86/Atomics-64.ll: Tweak for 2nd RUN not to overwrite %t. It sometimes causes spurious failure on lit win32.
Feel free to prune or suppress each output.

llvm-svn: 172823
2013-01-18 14:52:02 +00:00
Elena Demikhovsky f6a30e05d5 Optimization for the following SIGN_EXTEND pairs:
v8i8  -> v8i64, 
v8i8  -> v8i32, 
v4i8  -> v4i64, 
v4i16 -> v4i64 
for AVX and AVX2.

Bug 14865.

llvm-svn: 172708
2013-01-17 09:59:53 +00:00
Benjamin Kramer bcd14a0f26 X86: Add patterns for X86ISD::VSEXT in registers.
Those can occur when something between the sextload and the store is on the same
chain and blocks isel. Fixes PR14887.

llvm-svn: 172353
2013-01-13 11:37:04 +00:00
Preston Gurd 99c6990457 Update patch for the pad short functions pass for Intel Atom (only).
Adds a check for -Oz, changes the code to not re-visit BBs,
and skips over DBG_VALUE instrs.

Patch by Andy Zhang.

llvm-svn: 172258
2013-01-11 22:06:56 +00:00
Tim Northover 3a51aab390 Simplify writing floating types to assembly.
This removes previous special cases for each floating-point type in favour of a
shared codepath.

llvm-svn: 172189
2013-01-11 10:36:13 +00:00
NAKAMURA Takumi e46e8225f4 llvm/test/CodeGen/X86/ms-inline-asm.ll: Fixup; Globals doesn't have leading underscore in symbol on linux.
llvm-svn: 172139
2013-01-10 23:02:48 +00:00
Evan Cheng c8444b159a PR14896: Handle memcpy from constant string where the memcpy size is larger than the string size.
llvm-svn: 172124
2013-01-10 22:13:27 +00:00
Chad Rosier a4bc9437a2 [ms-inline asm] Add support for calling functions from inline assembly.
Part of rdar://12991541

llvm-svn: 172121
2013-01-10 22:10:27 +00:00
Evan Cheng 5652a8df32 Fix a DAG combine bug visitBRCOND() is transforming br(xor(x, y)) to br(x != y).
It cahced XOR's operands before calling visitXOR() but failed to update the
operands when visitXOR changed the XOR node.

rdar://12968664

llvm-svn: 171999
2013-01-09 20:56:40 +00:00
Nadav Rotem 3f5825c6c1 add -march to the test
llvm-svn: 171956
2013-01-09 07:04:23 +00:00
Nadav Rotem 977e0be4a0 Efficient lowering of vector sdiv when the divisor is a splatted power of two constant.
PR 14848. The lowered sequence is based on the existing sequence the target-independent
DAG Combiner creates for the scalar case.

Patch by Zvi Rackover.

llvm-svn: 171953
2013-01-09 05:14:33 +00:00
Preston Gurd a01daace88 Pad Short Functions for Intel Atom
The current Intel Atom microarchitecture has a feature whereby
when a function returns early then it is slightly faster to execute
a sequence of NOP instructions to wait until the return address is ready,
as opposed to simply stalling on the ret instruction until
the return address is ready.

When compiling for X86 Atom only, this patch will run a pass,
called "X86PadShortFunction" which will add NOP instructions where less
than four cycles elapse between function entry and return.

It includes tests.

This patch has been updated to address Nadav's review comments
- Optimize only at >= O1 and don't do optimization if -Os is set
- Stores MachineBasicBlock* instead of BBNum
- Uses DenseMap instead of std::map
- Fixes placement of braces

Patch by Andy Zhang.

llvm-svn: 171879
2013-01-08 18:27:24 +00:00
Craig Topper 4f1c7256f9 Fix suffix handling for parsing and printing of cvtsi2ss, cvtsi2sd, cvtss2si, cvttss2si, cvtsd2si, and cvttsd2si to match gas behavior.
cvtsi2* should parse with an 'l' or 'q' suffix or no suffix at all. No suffix should be treated the same as 'l' suffix. Printing should always print a suffix. Previously we didn't parse or print an 'l' suffix.
cvtt*2si/cvt*2si should parse with an 'l' or 'q' suffix or not suffix at all. No suffix should use the destination register size to choose encoding. Printing should not print a suffix.

Original 'l' suffix issue with cvtsi2* pointed out by Michael Kuperstein.

llvm-svn: 171668
2013-01-06 20:39:29 +00:00
Evan Cheng 3fb03e23a4 Fix for PR14739. It's not safe to fold a load into a call across a store. Thanks to Nick Lewycky for the initial patch.
llvm-svn: 171665
2013-01-06 19:00:15 +00:00
Craig Topper 92a70b1e65 Recommit r171461 which was incorrectly reverted. Mark DIV/IDIV instructions hasSideEffects=1 because they can trap when dividing by 0. This is needed to keep early if conversion from moving them across basic blocks.
llvm-svn: 171608
2013-01-05 07:39:25 +00:00
Nadav Rotem 478b6a47ec Revert revision 171524. Original message:
URL: http://llvm.org/viewvc/llvm-project?rev=171524&view=rev
Log:
The current Intel Atom microarchitecture has a feature whereby when a function
returns early then it is slightly faster to execute a sequence of NOP
instructions to wait until the return address is ready,
as opposed to simply stalling on the ret instruction
until the return address is ready.

When compiling for X86 Atom only, this patch will run a pass, called
"X86PadShortFunction" which will add NOP instructions where less than four
cycles elapse between function entry and return.

It includes tests.

Patch by Andy Zhang.

llvm-svn: 171603
2013-01-05 05:42:48 +00:00
Preston Gurd e36b685a94 The current Intel Atom microarchitecture has a feature whereby when a function
returns early then it is slightly faster to execute a sequence of NOP
instructions to wait until the return address is ready,
as opposed to simply stalling on the ret instruction
until the return address is ready.

When compiling for X86 Atom only, this patch will run a pass, called
"X86PadShortFunction" which will add NOP instructions where less than four
cycles elapse between function entry and return.

It includes tests.

Patch by Andy Zhang.

llvm-svn: 171524
2013-01-04 20:54:54 +00:00
Nadav Rotem c616a5408a Revert revision: 171467. This transformation is incorrect and makes some tests fail. Original message:
Simplified TRUNCATE operation that comes after SETCC. It is possible since SETCC result is 0 or -1.
Added a test.

llvm-svn: 171468
2013-01-04 17:35:21 +00:00
Elena Demikhovsky 5f2f06d2d9 Simplified TRUNCATE operation that comes after SETCC. It is possible since SETCC result is 0 or -1.
Added a test.

llvm-svn: 171467
2013-01-03 08:48:33 +00:00
Michael Gottesman 820aac1c78 Revert "Mark DIV/IDIV instructions hasSideEffects=1 because they can trap when dividing by 0. This is needed to keep early if conversion from moving them across basic blocks."
This reverts commit r171461 since it breaks the following tests:

Clang :: Analysis/outofbound-notwork.c
Clang :: Analysis/string-fail.c
Clang :: CXX/basic/basic.lookup/basic.lookup.qual/p6-0x.cpp
Clang :: CXX/basic/basic.lookup/basic.lookup.unqual/p15.cpp
Clang :: CXX/dcl.dcl/dcl.spec/dcl.fct.spec/p4.cpp
Clang :: CXX/dcl.dcl/dcl.spec/dcl.stc/p10.cpp
Clang :: CXX/temp/temp.param/p14.cpp
Clang :: CXX/temp/temp.res/temp.dep.res/temp.point/p1.cpp
Clang :: CodeGen/2009-02-13-zerosize-union-field-ppc.c
Clang :: CodeGen/blocks-2.c
Clang :: CodeGen/libcalls-d.c
Clang :: CodeGen/libcalls-ld.c
Clang :: CodeGenCXX/conversion-function.cpp
Clang :: CodeGenCXX/debug-info-limit-type.cpp
Clang :: CodeGenCXX/inheriting-constructor.cpp
Clang :: FixIt/fixit-errors.c
Clang :: FixIt/fixit-pmem.cpp
Clang :: Modules/namespaces.cpp
Clang :: PCH/changed-files.c
Clang :: PCH/pr4489.c
Clang :: PCH/source-manager-stack.c
Clang :: Parser/cxx-ambig-decl-expr-xfail.cpp
Clang :: SemaCXX/switch-implicit-fallthrough-cxx98.cpp
Clang :: SemaTemplate/instantiate-function-1.mm

llvm-svn: 171466
2013-01-03 08:18:30 +00:00
Craig Topper 7c27cc9fd0 Mark DIV/IDIV instructions hasSideEffects=1 because they can trap when dividing by 0. This is needed to keep early if conversion from moving them across basic blocks.
llvm-svn: 171461
2013-01-03 06:40:20 +00:00
Jakob Stoklund Olesen 725d57682b Fix PR14732 by handling all kinds of IMPLICIT_DEF live ranges.
Most IMPLICIT_DEF instructions are removed by the ProcessImplicitDefs
pass, and a few are reinserted by PHIElimination when a PHI argument is
<undef>.

RegisterCoalescer was assuming that all IMPLICIT_DEF live ranges look
like those created by PHIElimination, and that their live range never
leaves the basic block.

The PR14732 test case does tricks with PHI nodes that causes a longer
IMPLICIT_DEF live range to appear. This happens very rarely, but
RegisterCoalescer should be able to handle it.

llvm-svn: 171435
2013-01-03 00:47:51 +00:00
Tom Stellard 567f886eb0 DAGCombiner: Avoid generating illegal vector INT_TO_FP nodes
DAGCombiner::reduceBuildVecConvertToConvertBuildVec() was making two
mistakes:

1. It was checking the legality of scalar INT_TO_FP nodes and then generating
vector nodes.

2. It was passing the result value type to
TargetLoweringInfo::getOperationAction() when it should have been
passing the value type of the first operand.

llvm-svn: 171420
2013-01-02 22:13:01 +00:00
Nadav Rotem c8d7047fa9 AVX: Fix a bug in WidenMaskArithmetic.
llvm-svn: 171397
2013-01-02 17:40:39 +00:00
Dmitri Gribenko 56bf2e1830 Tests: rewrite 'opt ... %s' to 'opt ... < %s' so that opt does not emit a ModuleID
This is done to avoid odd test failures, like the one fixed in r171243.

llvm-svn: 171250
2012-12-30 02:33:22 +00:00
Nadav Rotem 3da9ac72fa AVX: Move the ZEXT/ANYEXT DAGCo optimizations to the lowering of these optimizations. The old test cases still cover all of these lowering/optimizations. The single change that we have is that now anyext does not need to zero a register, because it does not use the exact code path as the zero_extend.
llvm-svn: 171178
2012-12-28 05:45:24 +00:00
Nadav Rotem 2a054b4475 On AVX/AVX2 the type v8i1 is legalized to v8i16, which is an XMM sized
register. In most cases we actually compare or select YMM-sized registers
and mixing the two types creates horrible code. This commit optimizes
some of the transition sequences.

PR14657.

llvm-svn: 171148
2012-12-27 08:15:45 +00:00
NAKAMURA Takumi 40aa3285f4 llvm/test/CodeGen/X86: FileCheck-ize two tests in r171083.
llvm-svn: 171084
2012-12-26 03:19:30 +00:00
NAKAMURA Takumi 334f685328 llvm/test/CodeGen/X86: Disable avx in two tests corresponding to r171082.
llvm-svn: 171083
2012-12-26 03:08:55 +00:00
Benjamin Kramer a9f265ee98 Harden test so it's not affected by changes to compare lowering.
This only failed on hosts that don't have SSE41.

llvm-svn: 171066
2012-12-25 13:23:23 +00:00
Benjamin Kramer 81b5a8fd2e X86: Shave off one shuffle from the pcmpeqq sequence for SSE2 by making use of and commutativity.
llvm-svn: 171064
2012-12-25 13:09:08 +00:00
Benjamin Kramer df4af41b9b X86: Custom lower <2 x i64> eq and ne when SSE41 is not available.
pcmpeqd, pshufd, pshufd, pand is cheaper than unpack + cmpq, sbbq, cmpq, sbbq + pack.
Small speedup on loop-vectorized viterbi (-march=core2).

llvm-svn: 171063
2012-12-25 12:54:19 +00:00
NAKAMURA Takumi 1b18db7ea3 llvm/test/CodeGen/X86/fold-vex.ll: Add explicit triple.
llvm-svn: 171029
2012-12-24 11:14:06 +00:00
Nadav Rotem dc0ad92b64 Some x86 instructions can load/store one of the operands to memory. On SSE, this memory needs to be aligned.
When these instructions are encoded in VEX (on AVX) there is no such requirement. This changes the folding
tables and removes the alignment restrictions from VEX-encoded instructions.

llvm-svn: 171024
2012-12-24 09:40:33 +00:00
Benjamin Kramer 76268ac682 X86: Turn mul of <4 x i32> into pmuludq when no SSE4.1 is available.
pmuludq is slow, but it turns out that all the unpacking and packing of the
scalarized mul is even slower. 10% speedup on loop-vectorized paq8p.

llvm-svn: 170985
2012-12-22 16:07:56 +00:00
Benjamin Kramer b2f0a2bd4b X86: Emit vector sext as shuffle + sra if vpmovsx is not available.
Also loosen the SSSE3 dependency a bit, expanded pshufb + psra is still better
than scalarized loads. Fixes PR14590.

llvm-svn: 170984
2012-12-22 11:34:28 +00:00
Nadav Rotem d5aae980cb In some cases, due to scheduling constraints we copy the EFLAGS.
The only way to read the eflags is using push and pop. If we don't
adjust the stack then we run over the first frame index. This is
not something that we want to do, so we have to make sure that
our machine function does not copy the flags. If it does then
we have to emit the prolog that adjusts the stack.

rdar://12896831

llvm-svn: 170961
2012-12-21 23:48:49 +00:00
Benjamin Kramer b4688f84bd try to unbreak ppc buildbots.
llvm-svn: 170913
2012-12-21 18:11:45 +00:00
Benjamin Kramer 82d1c371e2 X86: Match pmin/pmax as a target specific dag combine. This occurs during vectorization.
Part of PR14667.

llvm-svn: 170908
2012-12-21 17:46:58 +00:00
Eric Christopher 6e47b725ff Move these files over to the debug info directory.
llvm-svn: 170810
2012-12-21 00:03:42 +00:00
Bob Wilson 3365b80290 Do not introduce vector operations in functions marked with noimplicitfloat.
<rdar://problem/12879313>

llvm-svn: 170630
2012-12-20 01:36:20 +00:00
Elena Demikhovsky 14a4af0e66 Optimized load + SIGN_EXTEND patterns in the X86 backend.
llvm-svn: 170506
2012-12-19 07:50:20 +00:00
Craig Topper 63f5921776 Teach SimplifySetCC that comparing AssertZext i1 against a constant 1 can be rewritten as a compare against a constant 0 with the opposite condition.
llvm-svn: 170495
2012-12-19 06:12:28 +00:00
Craig Topper f924a58af1 Add rest of BMI/BMI2 instructions to the folding tables as well as popcnt and lzcnt.
llvm-svn: 170304
2012-12-17 05:02:29 +00:00
Benjamin Kramer b16ccde7a4 X86: Add a couple of target-specific dag combines that turn VSELECTS into psubus if possible.
We match the pattern "x >= y ? x-y : 0" into "subus x, y" and two special cases
if y is a constant. DAGCombiner canonicalizes those so we first have to undo the
canonicalization for those cases. The pattern occurs in gzip when the loop
vectorizer is enabled. Part of PR14613.

llvm-svn: 170273
2012-12-15 16:47:44 +00:00
Nadav Rotem 8487537bdb TypeLegalizer: Do not generate target specific nodes with illegal types, because we cant type-legalize them.
llvm-svn: 170245
2012-12-14 21:20:37 +00:00
Evan Cheng bf0baa9de7 Fix a bug in DAGCombiner::MatchBSwapHWord. Make sure the node has operands before referencing them. rdar://12868039
llvm-svn: 170078
2012-12-13 01:34:32 +00:00
NAKAMURA Takumi be230b8fdb llvm/test/CodeGen/X86/atom-bypass-slow-division.ll: Fix possible typo(s) in CHECK-NOT lines.
Found by Alexander Zinenko, thanks!

llvm-svn: 169978
2012-12-12 13:34:20 +00:00
NAKAMURA Takumi cae5321a3b llvm/test/CodeGen/X86/atom-bypass-slow-division.ll: Rename symbols, s/test_/Test/g, not to mismatch "CHECK(-NOT): test".
llvm-svn: 169977
2012-12-12 13:34:14 +00:00
NAKAMURA Takumi 69d1405e48 llvm/test/CodeGen/X86/store_op_load_fold.ll: Fix typo, s/CHECK_NEXT/CHECK-NEXT/
llvm-svn: 169957
2012-12-12 01:41:01 +00:00
NAKAMURA Takumi 01ac65af00 llvm/test/CodeGen/X86/store_op_load_fold.ll: Add explicit triple.
llvm-svn: 169956
2012-12-12 01:40:56 +00:00
Manman Ren 82751a105c DAGCombine: clamp hi bit in APInt::getBitsSet to avoid assertion
rdar://12838504

llvm-svn: 169951
2012-12-12 01:13:50 +00:00
Evan Cheng 04e5518783 Avoid using lossy load / stores for memcpy / memset expansion. e.g.
f64 load / store on non-SSE2 x86 targets.

llvm-svn: 169944
2012-12-12 00:42:09 +00:00
Chad Rosier d4c0c6cb22 Add a triple to this test.
llvm-svn: 169803
2012-12-11 00:51:36 +00:00
Chandler Carruth b27041c50b Fix a miscompile in the DAG combiner. Previously, we would incorrectly
try to reduce the width of this load, and would end up transforming:

  (truncate (lshr (sextload i48 <ptr> as i64), 32) to i32)
to
  (truncate (zextload i32 <ptr+4> as i64) to i32)

We lost the sext attached to the load while building the narrower i32
load, and replaced it with a zext because lshr always zext's the
results. Instead, bail out of this combine when there is a conflict
between a sextload and a zext narrowing. The rest of the DAG combiner
still optimize the code down to the proper single instruction:

  movswl 6(...),%eax

Which is exactly what we wanted. Previously we read past the end *and*
missed the sign extension:

  movl 6(...), %eax

llvm-svn: 169802
2012-12-11 00:36:57 +00:00
Paul Redmond c4550d4967 move X86-specific test
This test case uses -mcpu=corei7 so it belongs in CodeGen/X86

Reviewed by: Nadav

llvm-svn: 169801
2012-12-11 00:36:43 +00:00
Chad Rosier df42cf39ab Fall back to the selection dag isel to select tail calls.
This shouldn't affect codegen for -O0 compiles as tail call markers are not
emitted in unoptimized compiles.  Testing with the external/internal nightly
test suite reveals no change in compile time performance.  Testing with -O1,
-O2 and -O3 with fast-isel enabled did not cause any compile-time or
execution-time failures.  All tests were performed on my x86 machine.
I'll monitor our arm testers to ensure no regressions occur there.

In an upcoming clang patch I will be marking the objc_autoreleaseReturnValue
and objc_retainAutoreleaseReturnValue as tail calls unconditionally.  While
it's theoretically true that this is just an optimization, it's an
optimization that we very much want to happen even at -O0, or else ARC
applications become substantially harder to debug.

Part of rdar://12553082

llvm-svn: 169796
2012-12-11 00:18:02 +00:00
Evan Cheng 79e2ca90bc Some enhancements for memcpy / memset inline expansion.
1. Teach it to use overlapping unaligned load / store to copy / set the trailing
   bytes. e.g. On 86, use two pairs of movups / movaps for 17 - 31 byte copies.
2. Use f64 for memcpy / memset on targets where i64 is not legal but f64 is. e.g.
   x86 and ARM.
3. When memcpy from a constant string, do *not* replace the load with a constant
   if it's not possible to materialize an integer immediate with a single
   instruction (required a new target hook: TLI.isIntImmLegal()).
4. Use unaligned load / stores more aggressively if target hooks indicates they
   are "fast".
5. Update ARM target hooks to use unaligned load / stores. e.g. vld1.8 / vst1.8.
   Also increase the threshold to something reasonable (8 for memset, 4 pairs
   for memcpy).

This significantly improves Dhrystone, up to 50% on ARM iOS devices.

rdar://12760078

llvm-svn: 169791
2012-12-10 23:21:26 +00:00
Craig Topper d8005db486 Teach DAG combine to handle vector add/sub with vectors of all 0s.
llvm-svn: 169727
2012-12-10 08:12:29 +00:00
Craig Topper a183ddb0fe Teach DAG combine to handle vector logical operations with vectors of all 1s or all 0s. These cases can show up when vectors are split for legalizing. Fix some tests that were dependent on these cases not being combined.
llvm-svn: 169684
2012-12-08 22:49:19 +00:00
Nadav Rotem ad0b5fbe8c When we use the BLEND instruction that uses the MSB as a mask, we can remove
the VSRI instruction before it since it does not affect the MSB.

Thanks Craig Topper for suggesting this.

llvm-svn: 169638
2012-12-07 21:43:11 +00:00
Nadav Rotem 481e50efe0 X86: Prefer using VPSHUFD over VPERMIL because it has better throughput.
llvm-svn: 169624
2012-12-07 19:01:13 +00:00
Nadav Rotem ac450eb59e Fix a bug in the code that merges consecutive stores. Previously we did not
check if loads that happen in between stores alias with the first store in the
chain, only with the second store onwards.

llvm-svn: 169516
2012-12-06 17:34:13 +00:00
Craig Topper 216bcd522b Remove intrinsic specific instructions for (V)MOVQUmr with patterns pointing to the normal instructions.
llvm-svn: 169482
2012-12-06 07:31:16 +00:00
Andrew Trick fda7a8832d RegisterPressureTracker: fix findUseBetween to handle DebugValue
llvm-svn: 169427
2012-12-05 21:37:50 +00:00
Andrew Trick 7f7cee39ab RegisterPresssureTracker: Track live physical register by unit.
This is much simpler to reason about, more efficient, and
fixes some corner cases involving implicit super-register defs.
Fixed rdar://12797931.

llvm-svn: 169425
2012-12-05 21:37:42 +00:00
Elena Demikhovsky cd3c1c4a16 Simplified BLEND pattern matching for shuffles.
Generate VPBLENDD for AVX2 and VPBLENDW for v16i16 type on AVX2.

llvm-svn: 169366
2012-12-05 09:24:57 +00:00
Evan Cheng d31802c1f6 Add x86 isel lowering logic to form bit test with inverted condition. e.g.
x ^ -1.

Patch by David Majnemer.
rdar://12755626

llvm-svn: 169339
2012-12-05 00:10:38 +00:00
Bill Wendling d7767125d5 Use the 'count' attribute to calculate the upper bound of an array.
The count attribute is more accurate with regards to the size of an array. It
also obviates the upper bound attribute in the subrange. We can also better
handle an unbound array by setting the count to -1 instead of the lower bound to
1 and upper bound to 0.

llvm-svn: 169312
2012-12-04 21:34:03 +00:00
Bill Wendling bfc0e5725f Add a 'count' field to the DWARF subrange.
The count field is necessary because there isn't a difference between the 'lo'
and 'hi' attributes for a one-element array and a zero-element array. When the
count is '0', we know that this is a zero-element array. When it's >=1, then
it's a normal constant sized array. When it's -1, then the array is unbounded.

llvm-svn: 169218
2012-12-04 06:20:49 +00:00
Nadav Rotem 1157e1410c Allow merging multiple store sequences on the same chain.
llvm-svn: 169111
2012-12-02 17:14:09 +00:00
Eli Bendersky b7b1ffc8e7 Fix an invalid regex in the test
llvm-svn: 169108
2012-12-02 15:46:02 +00:00
Andrew Trick b767d1eba8 misched: Fix RegisterPressureTracker handling of DebugVals.
Assertion failed: (TopRPTracker.getPos() == RegionBegin && "bad initial Top tracker").
rdar://12790302.

llvm-svn: 169072
2012-12-01 01:22:49 +00:00
Andrew Trick d5953622ce misched: Fix the DAG builder to handle an undef operand at ExitSU.
Assertion failed: (VNI && "No value to read by operand")
rdar://12790267.

llvm-svn: 169071
2012-12-01 01:22:44 +00:00
Andrew Trick a01302182c misched: Fix LiveInterval update to better handle DebugVal.
Assertion failed: (itr != mi2iMap.end() && "Instruction not found in maps.")
rdar://12777252.

llvm-svn: 169070
2012-12-01 01:22:41 +00:00
Andrew Trick e7ea8aa48a misched: fix RegionBegin when DebugValues get shuffled to the top.
assert (RemainingInstrs == 0 && "Instruction count mismatch!")

rdar://12776937.

llvm-svn: 169069
2012-12-01 01:22:38 +00:00
Nadav Rotem 307d767177 When combining consecutive stores allow loads in between the stores, if the loads do not alias.
llvm-svn: 168832
2012-11-29 00:00:08 +00:00
Andrew Trick 48d392e81e misched: Analysis that partitions the DAG into subtrees.
This is a simple, cheap infrastructure for analyzing the shape of a
DAG. It recognizes uniform DAGs that take the shape of bottom-up
subtrees, such as the included matrix multiplication example. This is
useful for heuristics that balance register pressure with ILP. Two
canonical expressions of the heuristic are implemented in scheduling
modes: -misched-ilpmin and -misched-ilpmax.

llvm-svn: 168773
2012-11-28 05:13:28 +00:00