Commit Graph

6085 Commits

Author SHA1 Message Date
Peter Collingbourne 82e657b509 Object: Prepend __imp_ when mangling a dllimport symbol in IRObjectFile.
We cannot prepend __imp_ in the IR mangler because a function reference may
be emitted unmangled in a constant initializer. The linker is expected to
resolve such references to thunks. This is covered by the new test case.

Strictly speaking we ought to emit two undefined symbols, one with __imp_ and
one without, as we cannot know which symbol the final object file will refer
to. However, this would require rather intrusive changes to IRObjectFile,
and lld works fine without it for now.

This reimplements r239437, which was reverted in r239502.

Differential Revision: http://reviews.llvm.org/D10400

llvm-svn: 239560
2015-06-11 21:42:18 +00:00
Reid Kleckner 2691c59e97 Revert "Fix merges of non-zero vector stores"
This reverts commit r239539.

It was causing SDAG assertions while building freetype.

llvm-svn: 239543
2015-06-11 17:25:24 +00:00
Matt Arsenault e23a063dc3 Fix merges of non-zero vector stores
Now actually stores the non-zero constant instead of 0.
I somehow forgot to include this part of r238108.

The test change was just an independent instruction order swap,
so just add another check line to satisfy CHECK-NEXT.

llvm-svn: 239539
2015-06-11 16:03:52 +00:00
Simon Pilgrim 5965680d53 [X86][SSE] Vectorized i8 and i16 shift operators
This patch ensures that SHL/SRL/SRA shifts for i8 and i16 vectors avoid scalarization. It builds on the existing i8 SHL vectorized implementation of moving the shift bits up to the sign bit position and separating the 4, 2 & 1 bit shifts with several improvements:

1 - SSE41 targets can use (v)pblendvb directly with the sign bit instead of performing a comparison to feed into a VSELECT node.
2 - pre-SSE41 targets were masking + comparing with an 0x80 constant - we avoid this by using the fact that a set sign bit means a negative integer which can be compared against zero to then feed into VSELECT, avoiding the need for a constant mask (zero generation is much cheaper).
3 - SRA i8 needs to be unpacked to the upper byte of a i16 so that the i16 psraw instruction can be correctly used for sign extension - we have to do more work than for SHL/SRL but perf tests indicate that this is still beneficial.

The i16 implementation is similar but simpler than for i8 - we have to do 8, 4, 2 & 1 bit shifts but less shift masking is involved. SSE41 use of (v)pblendvb requires that the i16 shift amount is splatted to both bytes however.

Tested on SSE2, SSE41 and AVX machines.

Differential Revision: http://reviews.llvm.org/D9474

llvm-svn: 239509
2015-06-11 07:46:37 +00:00
Sanjay Patel 08829bac81 [x86] Add a reassociation optimization to increase ILP via the MachineCombiner pass
This is a reimplementation of D9780 at the machine instruction level rather than the DAG.

Use the MachineCombiner pass to reassociate scalar single-precision AVX additions (just a
starting point; see the TODO comments) to increase ILP when it's safe to do so.

The code is closely based on the existing MachineCombiner optimization that is implemented
for AArch64.

This patch should not cause the kind of spilling tragedy that led to the reversion of r236031.

Differential Revision: http://reviews.llvm.org/D10321

llvm-svn: 239486
2015-06-10 20:32:21 +00:00
Reid Kleckner c87a6faba1 [WinEH] _except_handlerN uses 0 instead of 1 to indicate catch-all
Our usage of 1 was a holdover from __C_specific_handler.

llvm-svn: 239482
2015-06-10 18:14:07 +00:00
Igor Laevsky 346ff628f7 [StatepointLowering] Reuse stack slots across basic blocks
During statepoint lowering we can sometimes avoid spilling of the value if we know that it was already spilled for previous statepoint.
We were doing this by checking if incoming statepoint value was lowered into load from stack slot. This was working only in boundaries of one basic block.

But instead of looking at the lowered node we can look directly at the llvm-ir value and if it was gc.relocate (or some simple modification of it) look up stack slot for it's derived pointer and reuse stack slot from it. This allows us to look across basic block boundaries.

Differential Revision: http://reviews.llvm.org/D10251

llvm-svn: 239472
2015-06-10 12:31:53 +00:00
Elena Demikhovsky 00c9ad5ec2 AVX-512: Fixed a bug in comparison of i1 vectors.
cmp eq should give kxnor instruction
cmp neq should give kxor 

https://llvm.org/bugs/show_bug.cgi?id=23631

llvm-svn: 239460
2015-06-10 06:49:28 +00:00
Reid Kleckner 673de15af9 [WinEH] Call llvm.stackrestore in __except blocks
We have to do this manually, the runtime only sets up ebp. Fixes a crash
when returning after catching an exception.

llvm-svn: 239451
2015-06-10 01:34:54 +00:00
Reid Kleckner 2bc93ca846 [WinEH] Emit .safeseh directives for all 32-bit exception handlers
Use a "safeseh" string attribute to do this. You would think we chould
just accumulate the set of personalities like we do on dwarf, but this
fails to account for the LSDA-loading thunks we use for
__CxxFrameHandler3. Each of those needs to make it into .sxdata as well.
The string attribute seemed like the most straightforward approach.

llvm-svn: 239448
2015-06-10 01:02:30 +00:00
Reid Kleckner f12c030f48 [WinEH] Add 32-bit SEH state table emission prototype
This gets all the handler info through to the asm printer and we can
look at the .xdata tables now. I've convinced one small catch-all test
case to work, but other than that, it would be a stretch to say this is
functional.

The state numbering algorithm avoids doing any scope reconstruction as
we do for C++ to simplify the implementation.

llvm-svn: 239433
2015-06-09 21:42:19 +00:00
Akira Hatanaka d9699bc7bd Remove DisableTailCalls from TargetOptions and the code in resetTargetOptions
that was resetting it.

Remove the uses of DisableTailCalls in subclasses of TargetLowering and use
the value of function attribute "disable-tail-calls" instead. Also,
unconditionally add pass TailCallElim to the pipeline and check the function
attribute at the start of runOnFunction to disable the pass on a per-function
basis. 
 
This is part of the work to remove TargetMachine::resetTargetOptions, and since
DisableTailCalls was the last non-fast-math option that was being reset in that
function, we should be able to remove the function entirely after the work to
propagate IR-level fast-math flags to DAG nodes is completed.

Out-of-tree users should remove the uses of DisableTailCalls and make changes
to attach attribute "disable-tail-calls"="true" or "false" to the functions in
the IR.

rdar://problem/13752163

Differential Revision: http://reviews.llvm.org/D10099

llvm-svn: 239427
2015-06-09 19:07:19 +00:00
Simon Pilgrim 4af289d0f2 [X86][SSE] Added lzcnt vector tests.
llvm-svn: 239333
2015-06-08 19:58:43 +00:00
Matthias Braun 6f8db0e1a7 X86: Reject register operands with obvious type mismatches.
While we have some code to transform specification like {ax} into
{eax}/{rax} if the operand type isn't 16bit, we should reject cases
where there is no sane way to do this, like the i128 type in the
example.

Related to rdar://21042280

Differential Revision: http://reviews.llvm.org/D10260

llvm-svn: 239309
2015-06-08 16:56:23 +00:00
Simon Pilgrim 4791f6d89b [DAGCombiner] Added CTLZ vector constant folding support.
llvm-svn: 239305
2015-06-08 16:19:00 +00:00
Igor Breger 00d9f8457b AVX-512: Implemented 256/128bit VALIGND/Q instructions for SKX and KNL
Implemented DAG lowering for all these forms.
Added tests for DAG lowering and encoding.

Differential Revision: http://reviews.llvm.org/D10310

llvm-svn: 239300
2015-06-08 14:03:17 +00:00
Simon Pilgrim c789e1d57b [DAGCombiner] Added CTTZ vector constant folding support.
llvm-svn: 239293
2015-06-08 09:57:09 +00:00
Simon Pilgrim 856fc94a2f [X86] Added tzcnt vector tests.
llvm-svn: 239264
2015-06-07 21:01:34 +00:00
Simon Pilgrim 3a7718038d [X86] Added BitScanForward/BitScanReverse memory folding + tests
llvm-svn: 239257
2015-06-07 18:34:25 +00:00
Simon Pilgrim 19cefe6903 Fixed line endings
llvm-svn: 239253
2015-06-07 16:09:48 +00:00
Simon Pilgrim 68cd237f57 [DAGCombiner] Added CTPOP vector constant folding support.
Added tests to the existing SSE/AVX test files.

llvm-svn: 239252
2015-06-07 15:37:14 +00:00
Simon Pilgrim 7de557e5a9 [X86][AVX2] Added tests for v32i8 vector shifts
Currently still scalarized, but D9474 should remedy that.

llvm-svn: 239146
2015-06-05 12:35:36 +00:00
Andrea Di Biagio eb33134ce7 Simplify code; NFC.
Also, moved test cases from CodeGen/X86/fold-buildvector-bug.ll into
CodeGen/X86/buildvec-insertvec.ll and regenerated CHECK lines using
update_llc_test_checks.py.

llvm-svn: 239142
2015-06-05 10:29:55 +00:00
Simon Pilgrim b4c562de87 [X86][SSE] Added tests for i8/i16 vector shifts
Currently still scalarized, but D9474 should remedy that.

llvm-svn: 239136
2015-06-05 08:24:23 +00:00
Swaroop Sridhar 70d18df18f Statepoint: Fix handling of Far Immediate calls
gc.statepoint intrinsics with a far immediate call target 
were lowered incorrectly as pc-rel32 calls.

This change fixes the problem, and generates an indirect call 
via a scratch register.

For example: 

Intrinsic:
  %safepoint_token = call i32 (i64, i32, void ()*, i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void ()* inttoptr (i64 140727162896504 to void ()*), i32 0, i32 0, i32 0, i32 0)

Old Incorrect Lowering:
  callq 140727162896504

New Correct Lowering:
  movabsq $140727162896504, %rax 
  callq *%rax

In lowerCallFromStatepoint(), the callee-target was modified and 
represented as a "TargetConstant" node, rather than a "Constant" node.
Undoing this modification enabled LowerCall() to generate the 
correct CALL instruction.

llvm-svn: 239114
2015-06-04 23:03:21 +00:00
Charles Davis da280728b6 [Target/X86] Don't use callee-saved registers in a Win64 tail call on non-Windows.
Summary:
A small bit that I missed when I updated the X86 backend to account for
the Win64 calling convention on non-Windows. Now we don't use dead
non-volatile registers when emitting a Win64 indirect tail call on
non-Windows.

Should fix PR23710.

Test Plan: Added test for the correct behavior based on the case I posted to PR23710.

Reviewers: rnk

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D10258

llvm-svn: 239111
2015-06-04 22:50:05 +00:00
Benjamin Kramer ff0fb6936b [SDAG switch lowering] Fix switch case -> or merging for 0 and INT_MIN
The big/small ordering here is based on signed values so SmallValue will
be INT_MIN and BigValue 0. This shouldn't be a problem but the code
assumed that BigValue always had more bits set than SmallValue.

We used to just miss the transformation, but a recent refactoring of
mine turned this into an assertion failure.

llvm-svn: 239105
2015-06-04 22:05:51 +00:00
Andrea Di Biagio 9ac8a6b13d [DAGCombiner] Fix wrong folding of a build_vector into a blend with zero.
Method 'visitBUILD_VECTOR' in the DAGCombiner knows how to combine a
build_vector of a bunch of extract_vector_elt nodes and constant zero nodes
into a shuffle blend with a zero vector.

However, method 'visitBUILD_VECTOR' forgot that a floating point
build_vector may contain negative zero as well as positive zero.

Example:

define <2 x double> @example(<2 x double> %A) {
entry:
  %0 = extractelement <2 x double> %A, i32 0
  %1 = insertelement <2 x double> undef, double %0, i32 0
  %2 = insertelement <2 x double> %1, double -0.0, i32 1
  ret <2 x double> %2
}

Before this patch, llc (with -mattr=+sse4.1) wrongly generated
  movq   %xmm0, %xmm0  # xmm0 = xmm0[0],zero

So, the sign bit of the negative zero was effectively lost.

This patch fixes the problem by adding explicit checks for positive zero.

With this patch, llc produces the following code for the example above:
  movhpd .LCPI0_0(%rip), %xmm0

where .LCPI0_0 referes to a 'double -0'.

llvm-svn: 239070
2015-06-04 19:15:01 +00:00
Hans Wennborg d922915685 Switch lowering: fix assert in buildBitTests (PR23738)
When checking (High - Low + 1).sle(BitWidth), BitWidth would be truncated
to the size of the left-hand side. In the case of this PR, the left-hand
side was i4, so BitWidth=64 got truncated to 0 and the assert failed.

llvm-svn: 239048
2015-06-04 15:55:00 +00:00
Elena Demikhovsky 2f1a0dabd0 AVX-512: I brought back vector-shuffle-512-v8.ll test.
I re-generated it after all AVX-512 shuffle optimizations.

llvm-svn: 239026
2015-06-04 07:49:56 +00:00
Sanjay Patel 667a7e2a0f make reciprocal estimate code generation more flexible by adding command-line options (3rd try)
The first try (r238051) to land this was reverted due to ExecutionEngine build failure;
that was hopefully addressed by r238788.

The second try (r238842) to land this was reverted due to BUILD_SHARED_LIBS failure;
that was hopefully addressed by r238953.

This patch adds a TargetRecip class for processing many recip codegen possibilities.
The class is intended to handle both command-line options to llc as well
as options passed in from a front-end such as clang with the -mrecip option.

The x86 backend is updated to use the new functionality.
Only -mcpu=btver2 with -ffast-math should see a functional change from this patch.
All other x86 CPUs continue to *not* use reciprocal estimates by default with -ffast-math.

Differential Revision: http://reviews.llvm.org/D8982

llvm-svn: 239001
2015-06-04 01:32:35 +00:00
Asaf Badouh 402ebb34af re-apply 238809
AVX-512: Implemented GETEXP instruction for KNL and SKX
Added rounding mode modifier for SQRTPS/PD
Added tests for encoding and intrinsics.
CR:
http://reviews.llvm.org/D9991

llvm-svn: 238923
2015-06-03 13:41:48 +00:00
Elena Demikhovsky 21de893377 AVX-512: VSHUFPD instruction selection - code improvements
llvm-svn: 238918
2015-06-03 11:21:01 +00:00
Rafael Espindola cf8beece97 Revert "make reciprocal estimate code generation more flexible by adding command-line options (2nd try)"
This reverts commit r238842.

It broke -DBUILD_SHARED_LIBS=ON build.

llvm-svn: 238900
2015-06-03 05:32:44 +00:00
Sanjoy Das 513aadecac [SelectionDAG] Fix PR23603.
Summary:
LLVM's MI level notion of invariant_load is different from LLVM's IR
level notion of invariant_load with respect to dereferenceability.  The
IR notion of invariant_load only guarantees that all *non-faulting*
invariant loads result in the same value.  The MI notion of invariant
load guarantees that the load can be legally moved to any location
within its containing function.  The MI notion of invariant_load is
stronger than the IR notion of invariant_load -- an MI invariant_load is
an IR invariant_load + a guarantee that the location being loaded from
is dereferenceable throughout the function's lifetime.

Reviewers: hfinkel, reames

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D10075

llvm-svn: 238881
2015-06-02 22:33:30 +00:00
Sanjay Patel 6f031d848e make reciprocal estimate code generation more flexible by adding command-line options (2nd try)
The first try (r238051) to land this was reverted due to bot failures
that were hopefully addressed by r238788.

This patch adds a TargetRecip class for processing many recip codegen possibilities.
The class is intended to handle both command-line options to llc as well
as options passed in from a front-end such as clang with the -mrecip option.

The x86 backend is updated to use the new functionality.
Only -mcpu=btver2 with -ffast-math should see a functional change from this patch.
All other x86 CPUs continue to *not* use reciprocal estimates by default with -ffast-math.

Differential Revision: http://reviews.llvm.org/D8982

llvm-svn: 238842
2015-06-02 15:28:15 +00:00
Elena Demikhovsky 44a129c533 AVX-512: Shorten implementation of lowerV16X32VectorShuffle()
using lowerVectorShuffleWithSHUFPS() and other shuffle-helpers routines.
Added matching of VALIGN instruction.

llvm-svn: 238830
2015-06-02 13:43:18 +00:00
Asaf Badouh 8d897dd05f revert 238809
llvm-svn: 238810
2015-06-02 07:45:19 +00:00
Asaf Badouh 17de10f37e AVX-512: Implemented GETEXP instruction for KNL and SKX
Added rounding mode modifier for SQRTPS/PD
Added tests for encoding and intrinsics.

llvm-svn: 238809
2015-06-02 07:18:14 +00:00
Elena Demikhovsky 67afb630e1 AVX-512: Optimized vector shuffle for v16f32 and v16i32 types.
llvm-svn: 238743
2015-06-01 13:26:18 +00:00
Elena Demikhovsky 0c41088ebf AVX-512: Implemented vector shuffle lowering for v8i64 and v8f64 types.
I removed the vector-shuffle-512-v8.ll, it is auto-generated test, not valid any more.

llvm-svn: 238735
2015-06-01 09:49:53 +00:00
Elena Demikhovsky dd68d0cb0f AVX-512: Fixed a bug in compress and expand intrinsics.
By Igor Breger (igor.breger@intel.com)

llvm-svn: 238724
2015-06-01 06:30:13 +00:00
Chandler Carruth cb58910ce8 [x86] Unify the horizontal adding used for popcount lowering taking the
best approach of each.

For vNi16, we use SHL + ADD + SRL pattern that seem easily the best.

For vNi32, we use the PUNPCK + PSADBW + PACKUSWB pattern. In some cases
there is a huge improvement with this in IACA's estimated throughput --
over 2x higher throughput!!!! -- but the measurements are too good to be
true. In one narrow case, the SHL + ADD + SHL + ADD + SRL pattern looks
slightly faster, but I'm not sure I believe any of the measurements at
this point. Both are the exact same uops though. Hard to be confident of
anything past that.

If anyone wants to collect very detailed (Agner-level) timings with the
result of this patch, or with the i32 case replaced with SHL + ADD + SHl
+ ADD + SRL, I'd be very interested. Note that you'll need to test it on
both Ivybridge and Haswell, with both SSE3, SSSE3, and AVX selected as
I saw unique behavior in each of these buckets with IACA all of which
should be checked against measured performance.

But this patch is still a useful improvement by dropping duplicate work
and getting the much nicer PSADBW lowering for v2i64.

I'd still like to rephrase this in terms of generic horizontal sum. It's
a bit lame to have a special case of that just for popcount.

llvm-svn: 238652
2015-05-30 10:35:03 +00:00
Chandler Carruth 3bedf4407b [x86] Update the order of instructions after I switched to a bitcast
helper that skips creating a cast when it isn't necessary.

It's really somewhat concerning that this was caused by the the presence
of a no-op bitcast, but...

llvm-svn: 238642
2015-05-30 06:02:37 +00:00
Chandler Carruth 2599da3cfd [x86] Restore the bitcasts I removed when refactoring this to avoid
shifting vectors of bytes as x86 doesn't have direct support for that.

This removes a bunch of redundant masking in the generated code for SSE2
and SSE3.

In order to avoid the really significant code size growth this would
have triggered, I also factored the completely repeatative logic for
shifting and masking into two lambdas which in turn makes all of this
much easier to read IMO.

llvm-svn: 238637
2015-05-30 04:05:11 +00:00
Chandler Carruth 6ba9730a4e [x86] Implement a faster vector population count based on the PSHUFB
in-register LUT technique.

Summary:
A description of this technique can be found here:
http://wm.ite.pl/articles/sse-popcount.html

The core of the idea is to use an in-register lookup table and the
PSHUFB instruction to compute the population count for the low and high
nibbles of each byte, and then to use horizontal sums to aggregate these
into vector population counts with wider element types.

On x86 there is an instruction that will directly compute the horizontal
sum for the low 8 and high 8 bytes, giving vNi64 popcount very easily.
Various tricks are used to get vNi32 and vNi16 from the vNi8 that the
LUT computes.

The base implemantion of this, and most of the work, was done by Bruno
in a follow up to D6531. See Bruno's detailed post there for lots of
timing information about these changes.

I have extended Bruno's patch in the following ways:

0) I committed the new tests with baseline sequences so this shows
   a diff, and regenerated the tests using the update scripts.

1) Bruno had noticed and mentioned in IRC a redundant mask that
   I removed.

2) I introduced a particular optimization for the i32 vector cases where
   we use PSHL + PSADBW to compute the the low i32 popcounts, and PSHUFD
   + PSADBW to compute doubled high i32 popcounts. This takes advantage
   of the fact that to line up the high i32 popcounts we have to shift
   them anyways, and we can shift them by one fewer bit to effectively
   divide the count by two. While the PSHUFD based horizontal add is no
   faster, it doesn't require registers or load traffic the way a mask
   would, and provides more ILP as it happens on different ports with
   high throughput.

3) I did some code cleanups throughout to simplify the implementation
   logic.

4) I refactored it to continue to use the parallel bitmath lowering when
   SSSE3 is not available to preserve the performance of that version on
   SSE2 targets where it is still much better than scalarizing as we'll
   still do a bitmath implementation of popcount even in scalar code
   there.

With #1 and #2 above, I analyzed the result in IACA for sandybridge,
ivybridge, and haswell. In every case I measured, the throughput is the
same or better using the LUT lowering, even v2i64 and v4i64, and even
compared with using the native popcnt instruction! The latency of the
LUT lowering is often higher than the latency of the scalarized popcnt
instruction sequence, but I think those latency measurements are deeply
misleading. Keeping the operation fully in the vector unit and having
many chances for increased throughput seems much more likely to win.

With this, we can lower every integer vector popcount implementation
using the LUT strategy if we have SSSE3 or better (and thus have
PSHUFB). I've updated the operation lowering to reflect this. This also
fixes an issue where we were scalarizing horribly some AVX lowerings.

Finally, there are some remaining cleanups. There is duplication between
the two techniques in how they perform the horizontal sum once the byte
population count is computed. I'm going to factor and merge those two in
a separate follow-up commit.

Differential Revision: http://reviews.llvm.org/D10084

llvm-svn: 238636
2015-05-30 03:20:59 +00:00
Chandler Carruth c2e400de83 [x86] Restructure the parallel bitmath lowering of popcount into
a separate routine, generalize it to work for all the integer vector
sizes, and do general code cleanups.

This dramatically improves lowerings of byte and short element vector
popcount, but more importantly it will make the introduction of the
LUT-approach much cleaner.

The biggest cleanup I've done is to just force the legalizer to do the
bitcasting we need. We run these iteratively now and it makes the code
much simpler IMO. Other changes were minor, and mostly naming and
splitting things up in a way that makes it more clear what is going on.

The other significant change is to use a different final horizontal sum
approach. This is the same number of instructions as the old method, but
shifts left instead of right so that we can clear everything but the
final sum with a single shift right. This seems likely better than
a mask which will usually have to read the mask from memory. It is
certaily fewer u-ops. Also, this will be temporary. This and the LUT
approach share the need of horizontal adds to finish the computation,
and we have more clever approaches than this one that I'll switch over
to.

llvm-svn: 238635
2015-05-30 03:20:55 +00:00
Reid Kleckner e6531a5588 [WinEH] Adjust the 32-bit SEH prologue to better match reality
It turns out that _except_handler3 and _except_handler4 really use the
same stack allocation layout, at least today. They just make different
choices about encoding the LSDA.

This is in preparation for lowering the llvm.eh.exceptioninfo().

llvm-svn: 238627
2015-05-29 22:57:46 +00:00
Reid Kleckner 173a72524f Disable FP elimination in funcs using 32-bit MSVC EH personalities
The value in 'ebp' acts as an implicit argument to the outlined
handlers, and is recovered with frameaddress(1).

llvm-svn: 238619
2015-05-29 21:58:11 +00:00
Matthias Braun 165d467125 MachineCopyPropagation: Remove the copies instead of using KILL instructions.
For some history here see the commit messages of r199797 and r169060.

The original intent was to fix cases like:

%EAX<def> = COPY %ECX<kill>, %RAX<imp-def>
%RCX<def> = COPY %RAX<kill>

where simply removing the copies would have RCX undefined as in terms of
machine operands only the ECX part of it is defined. The machine
verifier would complain about this so 169060 changed such COPY
instructions into KILL instructions so some super-register imp-defs
would be preserved. In r199797 it was finally decided to always do this
regardless of super-register defs.

But this is wrong, consider:
R1 = COPY R0
...
R0 = COPY R1
getting changed to:
R1 = KILL R0
...
R0 = KILL R1

It now looks like R0 dies at the first KILL and won't be alive until the
second KILL, while in reality R0 is alive and must not change in this
part of the program.

As this only happens after register allocation there is not much code
still performing liveness queries so the issue was not noticed.  In fact
I didn't manage to create a testcase for this, without unrelated changes
I am working on at the moment.

The fix is simple: As of r223896 the MachineVerifier allows reads from
partially defined registers, so the whole transforming COPY->KILL thing
is not necessary anymore. This patch also changes a similar (but more
benign case as the def and src are the same register) case in the
VirtRegRewriter.

Differential Revision: http://reviews.llvm.org/D10117

llvm-svn: 238588
2015-05-29 18:19:25 +00:00