Commit Graph

5423 Commits

Author SHA1 Message Date
Jakob Stoklund Olesen 02845410f9 Fix PR11422.
This was a bug in keeping track of the available domains when merging
domain values.

The wrong domain mask caused ExecutionDepsFix to try to move VANDPSYrr
to the integer domain which is only available in AVX2.

Also add an assertion to catch future attempts at emitting AVX2
instructions.

llvm-svn: 145096
2011-11-23 04:03:08 +00:00
Chandler Carruth 4a87aa0c31 Fix a crash in block placement due to an inner loop that happened to be
reversed in the function's original ordering, and we happened to
encounter it while handling an outer unnatural CFG structure.

Thanks to the test case reduced from GCC's source by Benjamin Kramer.
This may also fix a crasher in gzip that Duncan reduced for me, but
I haven't yet gotten to testing that one.

llvm-svn: 145094
2011-11-23 03:03:21 +00:00
Hal Finkel 6f0ae783fe add basic PPC register-pressure feedback; adjust the vaarg test to match the new register-allocation pattern
llvm-svn: 145065
2011-11-22 16:21:04 +00:00
Chandler Carruth ee54feb6f6 Fix a devilish miscompile exposed by block placement. The
updateTerminator code didn't correctly handle EH terminators in one very
specific case. AnalyzeBranch would find no terminator instruction, and
so the fallback in updateTerminator is to assume fallthrough. This is
correct, but the destination of the fallthrough was assumed to be the
first successor.

This is *almost always* true, but in certain cases the loop
transformations will cause the landing pad to be the first successor!
Instead of this brittle logic, actually look through the successors for
a non-landing-pad accessor, and to assert if more than one is found.

This will hopefully fix some (if not all) of the self host miscompiles
with block placement. Thanks to Benjamin Kramer for reporting, Nick
Lewycky for an initial stab at a reduction, and Duncan for endless
advice on EH (which I know nothing about) as well as reviewing the
actual fix.

llvm-svn: 145062
2011-11-22 13:13:16 +00:00
Rafael Espindola c55e1af137 Add triple to the test.
llvm-svn: 145057
2011-11-22 06:36:25 +00:00
Rafael Espindola 2021f38281 If a register is both an early clobber and part of a tied use, handle the use
before the clobber so that we copy the value if needed.

Fixes pr11415.

llvm-svn: 145056
2011-11-22 06:27:18 +00:00
Craig Topper 6270d072c5 Lowering for v32i8 to VPUNPCKLBW/VPUNPCKHBW when AVX2 is enabled.
llvm-svn: 145028
2011-11-21 08:26:50 +00:00
Craig Topper d12d6f4b1c Test case for r145026
llvm-svn: 145027
2011-11-21 06:58:09 +00:00
Craig Topper a065238c6e Make LowerSIGN_EXTEND_INREG split 256-bit vectors when AVX1 is enabled and use AVX2 shifts when AVX2 is enabled.
llvm-svn: 145022
2011-11-21 01:12:36 +00:00
NAKAMURA Takumi 76dfa03874 test/CodeGen/X86/block-placement.ll: Relax expressions for Win32.
llvm-svn: 145011
2011-11-20 12:49:45 +00:00
Chandler Carruth 18dfac385b The logic for breaking the CFG in the presence of hot successors didn't
properly account for the *global* probability of the edge being taken.
This manifested as a very large number of unconditional branches to
blocks being merged against the CFG even though they weren't
particularly hot within the CFG.

The fix is to check whether the edge being merged is both locally hot
relative to other successors for the source block, and globally hot
compared to other (unmerged) predecessors of the destination block.

This introduces a new crasher on GCC single-source, but it's currently
behind a flag, and Ben has offered to work on the reduction. =]

llvm-svn: 145010
2011-11-20 11:22:06 +00:00
Chandler Carruth 20df3953d3 Add some comments to the latest test case I added here to document what
is actually being tested. Also add some FileCheck goodness to much more
carefully ensure that the result is the desired result. Before this test
would only have failed through an assert failure if the underlying fix
were reverted.

Also, add some weight metadata and a comment explaining exactly what is
going on to a trick section of the test case. Originally, we were
getting very unlucky and trying to form a block chain that isn't
actually profitable. I'm working on a fix to avoid forming these
unprofitable chains, and that would also have masked any failure from
this test case. The easy solution is to add some metadata that makes it
*really* profitable to form the bad chain here.

llvm-svn: 145006
2011-11-20 09:30:40 +00:00
Craig Topper e79761df73 Add code for lowering v32i8 shifts by a splat to AVX2 immediate shift instructions. Remove 256-bit splat handling from LowerShift as it was already handled by PerformShiftCombine.
llvm-svn: 145005
2011-11-20 00:12:05 +00:00
Craig Topper a3a6583694 Use 256-bit vcmpeqd for creating an all ones vector when AVX2 is enabled.
llvm-svn: 145004
2011-11-19 22:34:59 +00:00
Chandler Carruth f3dc9eff16 Move the handling of unanalyzable branches out of the loop-driven chain
formation phase and into the initial walk of the basic blocks. We
essentially pre-merge all blocks where unanalyzable fallthrough exists,
as we won't be able to update the terminators effectively after any
reorderings. This is quite a bit more principled as there may be CFGs
where the second half of the unanalyzable pair has some analyzable
predecessor that gets placed first. Then it may get placed next,
implicitly breaking the unanalyzable branch even though we never even
looked at the part that isn't analyzable. I've included a test case that
triggers this (thanks Benjamin yet again!), and I'm hoping to synthesize
some more general ones as I dig into related issues.

Also, to make this new scheme work we have to be able to handle branches
into the middle of a chain, so add this check. We always fallback on the
incoming ordering.

Finally, this starts to really underscore a known limitation of the
current implementation -- we don't consider broken predecessors when
merging successors. This can caused major missed opportunities, and is
something I'm planning on looking at next (modulo more bug reports).

llvm-svn: 144994
2011-11-19 10:26:02 +00:00
Craig Topper 6d77f4ae14 Test cases for SSSE3/AVX integer horizontal add/sub.
llvm-svn: 144990
2011-11-19 09:03:33 +00:00
Craig Topper de6b73bb4d Extend VPBLENDVB and VPSIGN lowering to work for AVX2.
llvm-svn: 144987
2011-11-19 07:07:26 +00:00
Nadav Rotem 1ec141d0f9 Add AVX2 vpbroadcast support
llvm-svn: 144967
2011-11-18 02:49:55 +00:00
Devang Patel 107e8ec30d DISubrange supports unsigned lower/upper array bounds, so let's not fake it in the end while emitting DWARF. If a FE needs to encode signed lower/upper array bounds then we need to extend DISubrange or ad DISignedSubrange.
llvm-svn: 144937
2011-11-17 23:43:15 +00:00
Chad Rosier f83ab704e4 When fast iseling a GEP, accumulate the offset rather than emitting a series of
ADDs.  MaxOffs is used as a threshold to limit the size of the offset. Tradeoffs
being: (1) If we can't materialize the large constant then we'll cause fast-isel
to bail. (2) Too large of an offset can't be directly encoded in the ADD
resulting in a MOV+ADD.  Generally not a bad thing because otherwise we would
have had ADD+ADD, but on Thumb this turns into a MOVS+MOVT+ADD. Working on a fix
for that. (3) Conversely, too low of a threshold we'll miss opportunities to 
coalesce ADDs.
rdar://10412592

llvm-svn: 144886
2011-11-17 07:15:58 +00:00
Eli Friedman ff1eaa7578 Make sure to replace the chain properly when DAGCombining a LOAD+EXTRACT_VECTOR_ELT into a single LOAD. Fixes PR10747/PR11393.
llvm-svn: 144863
2011-11-16 23:50:22 +00:00
Evan Cheng 011538dc79 Another missing X86ISD::MOVLPD pattern. rdar://10450317
llvm-svn: 144839
2011-11-16 22:24:44 +00:00
Evan Cheng 822ddde50d Disable expensive two-address optimizations at -O0. rdar://10453055
llvm-svn: 144806
2011-11-16 18:44:48 +00:00
Eli Friedman e6270395e3 Fix testcase.
llvm-svn: 144769
2011-11-16 03:03:52 +00:00
Eli Friedman 87f92512c3 CONCAT_VECTORS can have more than two operands. PR11389.
llvm-svn: 144768
2011-11-16 02:52:39 +00:00
Nadav Rotem 37010002f2 AVX: Add support for vbroadcast from BUILD_VECTOR and refactor some of the vbroadcast code.
llvm-svn: 144720
2011-11-15 22:50:37 +00:00
NAKAMURA Takumi f6b315c081 test/CodeGen/X86/dec-eflags-lower.ll: Relax expression for win32 x64.
llvm-svn: 144714
2011-11-15 22:30:37 +00:00
Pete Cooper 7c7ba1baa1 Added custom lowering for load->dec->store sequence in x86 when the EFLAGS registers is used
by later instructions.

Only done for DEC64m right now.

Fixes <rdar://problem/6172640>

llvm-svn: 144705
2011-11-15 21:57:53 +00:00
Rafael Espindola f11e7f1305 We currently use a callback to handle an IL pass deleting a BB that still
has a reference to it. Unfortunately, that doesn't work for codegen passes
since we don't get notified of MBB's being deleted (the original BB stays).

Use that fact to our advantage and after printing a function, check if
any of the IL BBs corresponds to a symbol that was not printed. This fixes
pr11202.

llvm-svn: 144674
2011-11-15 19:08:46 +00:00
Jakob Stoklund Olesen 4949b9a283 Revert r144611 and r144613.
These tests are actually correct, clang was miscompiling ExeDepsFix::processUses.

Evan fixed the miscompilation in r144628.

llvm-svn: 144630
2011-11-15 07:13:03 +00:00
Chandler Carruth 9b548a7fcf Rather than trying to use the loop block sequence *or* the function
block sequence when recovering from unanalyzable control flow
constructs, *always* use the function sequence. I'm not sure why I ever
went down the path of trying to use the loop sequence, it is
fundamentally not the correct sequence to use. We're trying to preserve
the incoming layout in the cases of unreasonable control flow, and that
is only encoded at the function level. We already have a filter to
select *exactly* the sub-set of blocks within the function that we're
trying to form into a chain.

The resulting code layout is also significantly better because of this.
In several places we were ending up with completely unreasonable control
flow constructs due to the ordering chosen by the loop structure for its
internal storage. This change removes a completely wasteful vector of
basic blocks, saving memory allocation in the common case even though it
costs us CPU in the fairly rare case of unnatural loops. Finally, it
fixes the latest crasher reduced out of GCC's single source. Thanks
again to Benjamin Kramer for the reduction, my bugpoint skills failed at
it.

llvm-svn: 144627
2011-11-15 06:26:43 +00:00
Craig Topper 05baa85f58 Properly qualify AVX2 specific parts of execution dependency table. Also enable converting between 256-bit PS/PD operations when AVX1 is enabled. Fixes PR11370.
llvm-svn: 144622
2011-11-15 05:55:35 +00:00
Jakob Stoklund Olesen 9c0de9bb6b Really fix test.
llvm-svn: 144613
2011-11-15 03:17:01 +00:00
Jakob Stoklund Olesen 14b66375a9 Allow for depencendy-breaking instructions before cvt*.
This should unbreak clang-x86_64-darwin10-RA, but I can't actually
reproduce the failure.

llvm-svn: 144611
2011-11-15 02:29:48 +00:00
Evan Cheng 7ca4b6eb5c Add vmov.f32 to materialize f32 immediate splats which cannot be handled by
integer variants. rdar://10437054

llvm-svn: 144608
2011-11-15 02:12:34 +00:00
Jakob Stoklund Olesen f8ad336bc4 Break false dependencies before partial register updates.
Two new TargetInstrInfo hooks lets the target tell ExecutionDepsFix
about instructions with partial register updates causing false unwanted
dependencies.

The ExecutionDepsFix pass will break the false dependencies if the
updated register was written in the previoius N instructions.

The small loop added to sse-domains.ll runs twice as fast with
dependency-breaking instructions inserted.

llvm-svn: 144602
2011-11-15 01:15:30 +00:00
Jim Grosbach 3e2c6f380c ARM VLDR/VSTR instructions don't need a size suffix.
Canonicallize on the non-suffixed form, but continue to accept assembly that
has any correctly sized type suffix.

llvm-svn: 144583
2011-11-14 23:03:21 +00:00
Chad Rosier 4e88fbebde Add newline to end of file. Thanks, Eli.
llvm-svn: 144579
2011-11-14 22:48:33 +00:00
Chad Rosier ab7223e99a Add support for inlining small memcpys.
rdar://10412592

llvm-svn: 144578
2011-11-14 22:46:17 +00:00
Chad Rosier 45110fdf8d Fix a performance regression from r144565. Positive offsets were being lowered
into registers, rather then encoded directly in the load/store.

llvm-svn: 144576
2011-11-14 22:34:48 +00:00
Evan Cheng fb13d32b3f Add a missing pattern for X86ISD::MOVLPD. rdar://10436044
llvm-svn: 144566
2011-11-14 20:35:52 +00:00
Chad Rosier adfd200bcb Add support for Thumb load/stores with negative offsets.
rdar://10412592

llvm-svn: 144565
2011-11-14 20:22:27 +00:00
Evan Cheng 30f44ad785 Teach two-address pass to re-schedule two-address instructions (or the kill
instructions of the two-address operands) in order to avoid inserting copies.
This fixes the few regressions introduced when the two-address hack was
disabled (without regressing the improvements).
rdar://10422688

llvm-svn: 144559
2011-11-14 19:48:55 +00:00
Pete Cooper 890e02e854 Changed SSE4/AVX <2 x i64> extract and insert ops to be Custom lowered
Constant idx case is still done in tablegen but other cases are then expanded

Fixes <rdar://problem/10435460>

llvm-svn: 144557
2011-11-14 19:38:42 +00:00
Jakob Stoklund Olesen 7e6004a3c1 Fix early-clobber handling in shrinkToUses.
I broke this in r144515, it affected most ARM testers.

<rdar://problem/10441389>

llvm-svn: 144547
2011-11-14 18:45:38 +00:00
Jakob Stoklund Olesen 7e07b388ac Delete stale comment.
llvm-svn: 144542
2011-11-14 18:03:05 +00:00
Chandler Carruth ed5aa547bc Fix an overflow bug in MachineBranchProbabilityInfo. This pass relied on
the sum of the edge weights not overflowing uint32, and crashed when
they did. This is generally safe as BranchProbabilityInfo tries to
provide this guarantee. However, the CFG can get modified during codegen
in a way that grows the *sum* of the edge weights. This doesn't seem
unreasonable (imagine just adding more blocks all with the default
weight of 16), but it is hard to come up with a case that actually
triggers 32-bit overflow. Fortuately, the single-source GCC build is
good at this. The solution isn't very pretty, but its no worse than the
previous code. We're already summing all of the edge weights on each
query, we can sum them, check for an overflow, compute a scale, and sum
them again.

I've included a *greatly* reduced test case out of the GCC source that
triggers it. It's a pretty lame test, as it clearly is just barely
triggering the overflow. I'd like to have something that is much more
definitive, but I don't understand the fundamental pattern that triggers
an explosion in the edge weight sums.

The buggy code is duplicated within this file. I'll colapse them into
a single implementation in a subsequent commit.

llvm-svn: 144526
2011-11-14 08:50:16 +00:00
Chad Rosier 2a1df883d0 Add support for ARM halfword load/stores and signed byte loads with negative
offsets.
rdar://10412592

llvm-svn: 144518
2011-11-14 04:09:28 +00:00
Chandler Carruth 1071cfa4ae Teach machine block placement to cope with unnatural loops. These don't
get loop info structures associated with them, and so we need some way
to make forward progress selecting and placing basic blocks. The
technique used here is pretty brutal -- it just scans the list of blocks
looking for the first unplaced candidate. It keeps placing blocks like
this until the CFG becomes tractable.

The cost is somewhat unfortunate, it requires allocating a vector of all
basic block pointers eagerly. I have some ideas about how to simplify
and optimize this, but I'm trying to get the logic correct first.

Thanks to Benjamin Kramer for the reduced test case out of GCC. Sadly
there are other bugs that GCC is tickling that I'm reducing and working
on now.

llvm-svn: 144516
2011-11-14 00:00:35 +00:00
Chandler Carruth 8d15078927 Rewrite #3 of machine block placement. This is based somewhat on the
second algorithm, but only loosely. It is more heavily based on the last
discussion I had with Andy. It continues to walk from the inner-most
loop outward, but there is a key difference. With this algorithm we
ensure that as we visit each loop, the entire loop is merged into
a single chain. At the end, the entire function is treated as a "loop",
and merged into a single chain. This chain forms the desired sequence of
blocks within the function. Switching to a single algorithm removes my
biggest problem with the previous approaches -- they had different
behavior depending on which system triggered the layout. Now there is
exactly one algorithm and one basis for the decision making.

The other key difference is how the chain is formed. This is based
heavily on the idea Andy mentioned of keeping a worklist of blocks that
are viable layout successors based on the CFG. Having this set allows us
to consistently select the best layout successor for each block. It is
expensive though.

The code here remains very rough. There is a lot that needs to be done
to clean up the code, and to make the runtime cost of this pass much
lower. Very much WIP, but this was a giant chunk of code and I'd rather
folks see it sooner than later. Everything remains behind a flag of
course.

I've added a couple of tests to exercise the issues that this iteration
was motivated by: loop structure preservation. I've also fixed one test
that was exhibiting the broken behavior of the previous version.

llvm-svn: 144495
2011-11-13 11:20:44 +00:00
Chad Rosier 1198d894d0 The order in which the predicate is added differs between Thumb and ARM mode. Fix predicate when in ARM mode and restore SelectIntrinsicCall.
llvm-svn: 144494
2011-11-13 09:44:21 +00:00
Chad Rosier a476e391f1 Temporarily disable SelectIntrinsicCall when in ARM mode. This is causing failures.
llvm-svn: 144492
2011-11-13 05:14:43 +00:00
Chad Rosier c8cfd3a8fb Add support for emitting both signed- and zero-extend loads. Fix
SimplifyAddress to handle either a 12-bit unsigned offset or the ARM +/-imm8
offsets (addressing mode 3).  This enables a load followed by an integer 
extend to be folded into a single load.

For example:
ldrb r1, [r0]       ldrb r1, [r0]
uxtb r2, r1     =>
mov  r3, r2         mov  r3, r1

llvm-svn: 144488
2011-11-13 02:23:59 +00:00
Jakob Stoklund Olesen 6ddb767fb5 Remove the -color-ss-with-regs option.
It was off by default.

The new register allocators don't have the problems that made it
necessary to reallocate registers during stack slot coloring.

llvm-svn: 144481
2011-11-13 00:31:23 +00:00
Jakob Stoklund Olesen 7ef502f6d1 Delete the 'standard' spiller with used the old spilling framework.
The current register allocators all use the inline spiller.

llvm-svn: 144477
2011-11-12 23:29:02 +00:00
Jakob Stoklund Olesen ce4ef9f8d5 Remove histogram tests.
Counting the number of occurences of each opcode is not a useful test.

llvm-svn: 144474
2011-11-12 22:39:40 +00:00
Jakob Stoklund Olesen 0eac531bc2 RAGreedy is better about hinting now.
Or maybe we are just getting lucky.

llvm-svn: 144473
2011-11-12 22:39:37 +00:00
Jakob Stoklund Olesen 8ec1a92afd Linear scan is going away.
llvm-svn: 144472
2011-11-12 22:39:34 +00:00
Jakob Stoklund Olesen 654d60888e XFAIL test that depends on linear scan to remove dead code.
Filed PR11364 to track the problem.  Should the register allocator
eliminate dead code?

llvm-svn: 144471
2011-11-12 22:39:30 +00:00
Jakob Stoklund Olesen fa3a8ee6e2 Remove obsolete test.
This test was committed with a bugfix to RemoveCopyByCommutingDef, but
that optimization is no longer triggered by this test.

llvm-svn: 144470
2011-11-12 22:39:27 +00:00
Jakob Stoklund Olesen 80b3d299a9 Remove obsolete test.
This test is for a very specific LocalRewriter bug.  LocalRewriter is
going away.

llvm-svn: 144469
2011-11-12 22:39:24 +00:00
Jakob Stoklund Olesen 0c7d9d90ef Remove obsolete test.
I don't think this test does what is was supposed to do, and
LocalRewriter is going away anyway.

llvm-svn: 144463
2011-11-12 20:37:57 +00:00
Jakob Stoklund Olesen 126f9779c3 Eliminate more linear scan tests.
llvm-svn: 144462
2011-11-12 20:35:26 +00:00
Jakob Stoklund Olesen 9d090daa33 Switch a couple -O0 tests to RABasic.
llvm-svn: 144461
2011-11-12 20:11:04 +00:00
Jakob Stoklund Olesen 4deff7bc1d Switch a few tests off linearscan.
llvm-svn: 144460
2011-11-12 19:53:52 +00:00
Jakob Stoklund Olesen 6ac6aa782d Delete old test of a VirtRegRewriter feature.
This test doesn't expose the issue with RAGreedy.

I filed PR11363 to track the missing InlineSpiller feature.

llvm-svn: 144459
2011-11-12 19:53:48 +00:00
Jakob Stoklund Olesen 74d091b395 Remove old test that doesn't make sense.
The test is checking that the output doesn't contains any 'mov '
strings. It does contain movl, though.

llvm-svn: 144458
2011-11-12 19:53:45 +00:00
Craig Topper 3dc75f9e3b Add more AVX2 shift lowering support. Move AVX2 variable shift to use patterns instead of custom lowering code.
llvm-svn: 144457
2011-11-12 09:58:49 +00:00
Eli Friedman 9d448e4a42 Don't try to form pre/post-indexed loads/stores until after LegalizeDAG runs. Fixes PR11029.
llvm-svn: 144438
2011-11-12 00:35:34 +00:00
Chad Rosier a7ebc5617d Add support in fast-isel for selecting memset/memcpy/memmove intrinsics.
llvm-svn: 144426
2011-11-11 23:31:03 +00:00
Chad Rosier ab1a4a2301 Loosen test by using REs. Approved by Devang.
llvm-svn: 144425
2011-11-11 23:25:38 +00:00
Andrew Trick 28c1d18434 Preserve MachineMemOperands in ARMLoadStoreOptimizer.
Fixes PR8113.

llvm-svn: 144409
2011-11-11 22:18:09 +00:00
Dan Bailey 089cc53232 allow non-device function calls in PTX when natively handling device-side printf
llvm-svn: 144388
2011-11-11 14:45:12 +00:00
Craig Topper ea28a34c43 Add lowering for AVX2 shift instructions.
llvm-svn: 144380
2011-11-11 07:39:23 +00:00
Chad Rosier 7ddd63ce4e Add support for using immediates with select instructions.
rdar://10412592

llvm-svn: 144376
2011-11-11 06:20:39 +00:00
Eli Friedman c4a001478c Make sure to expand SIGN_EXTEND_INREG for NEON vectors. PR11319, round 3.
llvm-svn: 144361
2011-11-11 03:16:38 +00:00
Chad Rosier 2a3503e061 Add support for using MVN to materialize negative constants.
rdar://10412592

llvm-svn: 144348
2011-11-11 00:36:21 +00:00
Chad Rosier d1762e00e2 When in ARM mode, LDRH/STRH require special handling of negative offsets.
For correctness, disable this for now.
rdar://10418009

llvm-svn: 144316
2011-11-10 21:09:49 +00:00
NAKAMURA Takumi 270354100a test/CodeGen/X86/lsr-loop-exit-cond.ll: Try to appease linux and freebsd bots to specify explicit -mtriple=x86_64-darwin.
I guess it expects -relocation-model=pic.

llvm-svn: 144290
2011-11-10 14:18:59 +00:00
Evan Cheng d33b2d6b7a Use a bigger hammer to fix PR11314 by disabling the "forcing two-address
instruction lower optimization" in the pre-RA scheduler.

The optimization, rather the hack, was done before MI use-list was available.
Now we should be able to implement it in a better way, perhaps in the
two-address pass until a MI scheduler is available.

Now that the scheduler has to backtrack to handle call sequences. Adding
artificial scheduling constraints is just not safe. Furthermore, the hack
is not taking all the other scheduling decisions into consideration so it's just
as likely to pessimize code. So I view disabling this optimization goodness
regardless of PR11314.

llvm-svn: 144267
2011-11-10 07:43:16 +00:00
Chad Rosier 3fbd094ad9 For immediate encodings of icmp, zero or sign extend first. Then
determine if the value is negative and flip the sign accordingly.
rdar://10422026

llvm-svn: 144258
2011-11-10 01:30:39 +00:00
Jakob Stoklund Olesen eef48b6938 Strip old implicit operands after foldMemoryOperand.
The TII.foldMemoryOperand hook preserves implicit operands from the
original instruction.  This is not what we want when those implicit
operands refer to the register being spilled.

Implicit operands referring to other registers are preserved.

This fixes PR11347.

llvm-svn: 144247
2011-11-10 00:17:03 +00:00
Eli Friedman 2d4055b683 Make sure we correctly unroll conversions between v2f64 and v2i32 on ARM.
llvm-svn: 144241
2011-11-09 23:36:02 +00:00
Eli Friedman 53218b6fcc Add check so we don't try to perform an impossible transformation. Fixes issue from PR11319.
llvm-svn: 144216
2011-11-09 22:25:12 +00:00
Nadav Rotem 1938482bfa AVX2: Add patterns for variable shift operations
llvm-svn: 144212
2011-11-09 21:22:13 +00:00
Chad Rosier c22f6518b2 Use REs to remove dependencies on the register allocation order.
llvm-svn: 144209
2011-11-09 20:06:13 +00:00
Duncan Sands 635e4efca0 Speculatively revert commit 144124 (djg) in the hope that the 32 bit
dragonegg self-host buildbot will recover (it is complaining about object
files differing between different build stages).  Original commit message:

Add a hack to the scheduler to disable pseudo-two-address dependencies in
basic blocks containing calls. This works around a problem in which
these artificial dependencies can get tied up in calling seqeunce
scheduling in a way that makes the graph unschedulable with the current
approach of using artificial physical register dependencies for calling
sequences. This fixes PR11314.

llvm-svn: 144188
2011-11-09 14:20:48 +00:00
Nadav Rotem 79135d844d Add AVX2 support for vselect of v32i8
llvm-svn: 144187
2011-11-09 13:21:28 +00:00
Craig Topper f87a2bef51 Enable execution dependency fix pass for YMM registers when AVX2 is enabled. Add AVX2 logical operations to list of replaceable instructions.
llvm-svn: 144179
2011-11-09 09:37:21 +00:00
Craig Topper c9eb09d3b8 Add instruction selection for AVX2 integer comparisons.
llvm-svn: 144176
2011-11-09 08:06:13 +00:00
Craig Topper 8c8a431057 Add AVX2 instruction lowering for add, sub, and mul.
llvm-svn: 144174
2011-11-09 07:28:55 +00:00
Chad Rosier 595d419427 Add support for encoding immediates in icmp and fcmp. Hopefully, this will
remove a fair number of unnecessary materialized constants.
rdar://10412592

llvm-svn: 144163
2011-11-09 03:22:02 +00:00
Jakob Stoklund Olesen 3dc89c9768 Collapse DomainValues across loop back-edges.
During the initial RPO traversal of the basic blocks, remember the ones
that are incomplete because of back-edges from predecessors that haven't
been visited yet.

After the initial RPO, revisit all those loop headers so the incoming
DomainValues on the back-edges can be properly collapsed.

This will properly fix execution domains on software pipelined code,
like the included test case.

llvm-svn: 144151
2011-11-09 01:06:56 +00:00
Dan Gohman a4bc6171a5 Add a hack to the scheduler to disable pseudo-two-address dependencies in
basic blocks containing calls. This works around a problem in which
these artificial dependencies can get tied up in calling seqeunce
scheduling in a way that makes the graph unschedulable with the current
approach of using artificial physical register dependencies for calling
sequences. This fixes PR11314.

llvm-svn: 144124
2011-11-08 21:29:06 +00:00
Evan Cheng c3770ac687 Add workaround for Cortex-M3 errata 602117 by replacing ldrd x, y, [x] with ldm or ldr pairs.
llvm-svn: 144123
2011-11-08 21:21:09 +00:00
Pete Cooper fbbbd04705 Adding test for machine-licm operating on invariant load instructions
llvm-svn: 144104
2011-11-08 19:06:53 +00:00
Lang Hames b85fcd07df Lower mem-ops to unaligned i32/i16 load/stores on ARM where supported.
Add support for trimming constants to GetDemandedBits. This fixes some funky
constant generation that occurs when stores are expanded for targets that don't
support unaligned stores natively.

llvm-svn: 144102
2011-11-08 18:56:23 +00:00
NAKAMURA Takumi d8d583f766 test/CodeGen/X86/vec_shuffle-39.ll: Add explicit -mtriple=x86_64-linux. Passing packed value is not compatible on Win32 x64.
llvm-svn: 144068
2011-11-08 03:46:39 +00:00
NAKAMURA Takumi ac9ef21f02 test/CodeGen/X86/vec_shuffle-38.ll: Relax expression for Win32 x64.
llvm-svn: 144067
2011-11-08 03:46:32 +00:00
NAKAMURA Takumi 33dac06330 test/CodeGen/X86/vec_shuffle.ll: Add explicit -mtriple=i686-linux. We may see some suboptimal frame (%ebp) emission on certain hosts. Possible [PR11031]
llvm-svn: 144066
2011-11-08 03:46:25 +00:00
Eli Friedman 6f84fed675 Make sure to mark vector extload's as expand on ARM. Fixes PR11319.
llvm-svn: 144057
2011-11-08 01:43:53 +00:00
Eli Friedman f2a9bd4b1e Add a bunch of calls to RemoveDeadNode in LegalizeDAG, so legalization doesn't get confused by CSE later on. Fixes PR11318.
Re-commit of r144034, with an extra fix so that RemoveDeadNode doesn't blow up.

llvm-svn: 144055
2011-11-08 01:25:24 +00:00
Evan Cheng 91b56e0390 Add x86 isel logic and patterns to match movlps from clang generated IR for _mm_loadl_pi(). rdar://10134392, rdar://10050222
llvm-svn: 144052
2011-11-08 00:31:58 +00:00
Bill Wendling 2197b015c8 Convert to the new EH model.
llvm-svn: 144049
2011-11-08 00:17:28 +00:00
Bill Wendling 9b7942a543 Convert tests to the new EH model.
llvm-svn: 144048
2011-11-08 00:09:27 +00:00
Chad Rosier 5de1bea5c9 Enable support for returning i1, i8, and i16. Nothing special todo as it's the
callee's responsibility to sign or zero-extend the return value.  The additional
test case just checks to make sure the calls are selected (i.e., -fast-isel-abort
doesn't assert).

llvm-svn: 144047
2011-11-08 00:03:32 +00:00
Pete Cooper 2dc40434aa Added missing newline
llvm-svn: 144046
2011-11-08 00:03:24 +00:00
Eli Friedman a35a5295e0 Revert r144034 while I try to track down a crash.
llvm-svn: 144044
2011-11-07 23:53:20 +00:00
Jakob Stoklund Olesen 9279f9efbc Fix test for Windows as well.
llvm-svn: 144038
2011-11-07 23:10:43 +00:00
Jakob Stoklund Olesen a70e9417fb Kill and collapse outstanding DomainValues.
DomainValues that are only used by "don't care" instructions are now
collapsed to the first possible execution domain after all basic blocks
have been processed.  This typically means the PS domain on x86.

For example, the vsel_i64 and vsel_double functions in sse2-blend.ll are
completely collapsed to the PS domain instead of containing a mix of
execution domains created by isel.

llvm-svn: 144037
2011-11-07 23:08:21 +00:00
Pete Cooper 7a4be01ac8 InstCombine now optimizes vector udiv by power of 2 to shifts
Fixes r8429

llvm-svn: 144036
2011-11-07 23:04:49 +00:00
Eli Friedman 55a86d32d3 Add a bunch of calls to RemoveDeadNode in LegalizeDAG, so legalization doesn't get confused by CSE later on. Fixes PR11318.
llvm-svn: 144034
2011-11-07 22:51:10 +00:00
Benjamin Kramer 69d57cf9c4 Simplify some uses of utohexstr.
As a side effect hex is printed lowercase instead of uppercase now.

llvm-svn: 144013
2011-11-07 21:00:59 +00:00
Jakob Stoklund Olesen 7f076cb6cc Fix test for Linux.
llvm-svn: 144003
2011-11-07 20:47:23 +00:00
Jakob Stoklund Olesen 0241308954 Expand V_SET0 to xorps by default.
The xorps instruction is smaller than pxor, so prefer that encoding.

The ExecutionDepsFix pass will switch the encoding to pxor and xorpd
when appropriate.

llvm-svn: 143996
2011-11-07 19:15:58 +00:00
Craig Topper a6d409d543 Add AVX2 variable shift instructions and intrinsics.
llvm-svn: 143915
2011-11-07 08:26:24 +00:00
Craig Topper ff39be0afc Add AVX2 VPMOVMASK instructions and intrinsics.
llvm-svn: 143904
2011-11-07 03:20:35 +00:00
Craig Topper e122dcbf4a Add AVX2 VEXTRACTI128 and VINSERTI128 instructions. Fix VPERM2I128 to be qualified with HasAVX2 instead of HasAVX. Mark VINSERTF128 and VEXTRACTF128 as never having side effects.
llvm-svn: 143902
2011-11-07 02:00:04 +00:00
Craig Topper f01f1b5cb9 More AVX2 instructions and their intrinsics.
llvm-svn: 143895
2011-11-06 23:04:08 +00:00
Craig Topper 05d1cb98e7 Add more AVX2 instructions and intrinsics.
llvm-svn: 143861
2011-11-06 06:12:20 +00:00
Chad Rosier d0191a53c9 Add support for passing i1, i8, and i16 call parameters. Also, be sure to
zero-extend the constant integer encoding.  Test case provides testing for
both call parameters and materialization of i1, i8, and i16 types.

llvm-svn: 143821
2011-11-05 20:16:15 +00:00
Benjamin Kramer c74798d5cf Add an option to pad an uleb128 to MCObjectWriter and remove the uleb128 encoding from the DWARF asm printer.
As a side effect we now print dwarf ulebs with .ascii directives.

llvm-svn: 143809
2011-11-05 11:52:44 +00:00
Eli Friedman 8f249600e7 Enhanced vzeroupper insertion pass that avoids inserting vzeroupper where it is unnecessary through local analysis. Patch from Bruno Cardoso Lopes, with some additional changes.
I'm going to wait for any review comments and perform some additional testing before turning this on by default.

llvm-svn: 143750
2011-11-04 23:46:11 +00:00
Craig Topper b9a46e6b83 Add intrinsics for X86 vcvtps2ph and vcvtph2ps instructions
llvm-svn: 143682
2011-11-04 06:59:21 +00:00
Chad Rosier f3e73ad5da Add fast-isel support for returning i1, i8, and i16.
llvm-svn: 143669
2011-11-04 00:50:21 +00:00
Dan Gohman 198b7ffc11 Reapply r143206, with fixes. Disallow physical register lifetimes
across calls, and only check for nested dependences on the special
call-sequence-resource register.

llvm-svn: 143660
2011-11-03 21:49:52 +00:00
Pete Cooper 65ba66c660 Reverted r143600 - selector reference change
llvm-svn: 143646
2011-11-03 20:47:50 +00:00
Dan Bailey b68515c232 fixed global array handling for ptx to use the correct bit widths
llvm-svn: 143640
2011-11-03 19:24:46 +00:00
Craig Topper 0e7cbbabea Add new X86 AVX2 VBROADCAST instructions.
llvm-svn: 143612
2011-11-03 07:35:53 +00:00
Chad Rosier bf5f4bec1a Add support for sign-extending non-legal types in SelectSIToFP().
llvm-svn: 143603
2011-11-03 02:04:59 +00:00
Pete Cooper e6173d81ae Treat objc selector reference globals as invariant so that MachineLICM can hoist them out of loops. Fixes <rdar://problem/6027699>
llvm-svn: 143600
2011-11-03 00:56:36 +00:00
Lang Hames 9929c423a1 Try to lower memset/memcpy/memmove to vector instructions on ARM where the alignment permits.
llvm-svn: 143582
2011-11-02 22:52:45 +00:00
Nick Lewycky d1ee7f8cf1 Don't emit a directory entry for the value in DW_AT_comp_dir, that is always
implied by directory index zero.

llvm-svn: 143570
2011-11-02 20:55:33 +00:00
Chad Rosier 9cf803c4bf Add support for comparing integer non-legal types.
llvm-svn: 143559
2011-11-02 18:08:25 +00:00
Craig Topper a47b05c7f3 More AVX2 instructions and intrinsics.
llvm-svn: 143536
2011-11-02 06:54:17 +00:00
Craig Topper 682b850602 Add a bunch more X86 AVX2 instructions and their corresponding intrinsics.
llvm-svn: 143529
2011-11-02 04:42:13 +00:00
Eli Friedman 3f5eccbe7a Teach the x86 backend a couple tricks for dealing with v16i8 sra by a constant splat value. Fixes PR11289.
llvm-svn: 143498
2011-11-01 21:18:39 +00:00
Richard Osborne 56ce0932db Don't fold negative offsets into cp / dp accesses to avoid relocation errors.
This can happen if the address + addend is less than the start of the cp / dp.

llvm-svn: 143459
2011-11-01 11:31:53 +00:00
Richard Osborne 37fe7d6641 Combine various XCore tests for floating point intrinsic support into a single test.
llvm-svn: 143458
2011-11-01 10:51:48 +00:00
Richard Osborne 8591b6b0ab Move various XCore tests to FileCheck
llvm-svn: 143457
2011-11-01 10:41:28 +00:00
Craig Topper fec80c6ad2 Fix operand type for x86 pmadd_ub_sw intrinsic.
llvm-svn: 143455
2011-11-01 07:25:22 +00:00
Craig Topper 9821e75e64 Fix operand type for int_x86_ssse3_phadd_sw_128 intrinsic
llvm-svn: 143336
2011-10-31 07:16:37 +00:00
Craig Topper 242d1f8c73 Test case for X86 FS/GS Base intrinsics
llvm-svn: 143332
2011-10-31 02:15:47 +00:00
Craig Topper cfcfdf2aab Begin adding AVX2 instructions. No selection support yet other than intrinsics.
llvm-svn: 143331
2011-10-31 02:15:10 +00:00
Nick Lewycky aab6169ef6 Switch new .file directive emission off by default, change llc's flag for it to
-enable-dwarf-directory.

llvm-svn: 143326
2011-10-31 01:06:02 +00:00
Benjamin Kramer 7402ee6ec2 X86: Emit logical shift by constant splat of <16 x i8> as a <8 x i16> shift and zero out the bits where zeros should've been shifted in.
llvm-svn: 143315
2011-10-30 17:31:21 +00:00
Craig Topper 9cdb9ffa43 Fix return type for X86 mpsadbw instrinsic. The instruction takes in a vector of 8-bit integers, but produces a vector of 16-bit integers.
llvm-svn: 143313
2011-10-30 17:22:45 +00:00
Nadav Rotem c602b2c4de Fix pr11266.
On x86: (shl V, 1) -> add V,V

Hardware support for vector-shift is sparse and in many cases we scalarize the
result. Additionally, on sandybridge padd is faster than shl.

llvm-svn: 143311
2011-10-30 13:24:22 +00:00
Nadav Rotem 1dda6a8ce1 Stabilize the test by specifying an exact cpu target
llvm-svn: 143307
2011-10-30 08:07:50 +00:00
Nadav Rotem bf6568b5d6 Add a new DAGCombine optimization for BUILD_VECTOR.
If all of the inputs are zero/any_extended, create a new simple BV
which can be further optimized by other BV optimizations.

llvm-svn: 143297
2011-10-29 21:23:04 +00:00