was saying that the matching superregister class of GR32_NOREX in GR64_NOREX_NOSP
is GR64_NOREX, which drops the NOSP constraint. This fixes PR10032.
llvm-svn: 132225
subregisters:
When a value is in a subregister, at least report the location as being
the superregister. We should extend the .td files to encode the bit
range so that we can produce a DW_OP_bit_piece.
llvm-svn: 132224
According to PR2536, the old spiller had trouble with the IMPLICIT_DEF in this
code:
%reg1028<def> = MOV16rm %reg0, 1, %reg0, <ga:g_5>, Mem:LD(2,2) [g_5 + 0]
%reg1039<def> = IMPLICIT_DEF
%reg1038<def> = INSERT_SUBREG %reg1039, %reg1028, 2
%reg1025<def> = AND32ri %reg1038, 65534, %%EFLAGS<imp-def>
However, today we emit a zero-extending load instead:
%vreg10<def> = MOVZX32rm16 %noreg, 1, %noreg, <ga:@g_5>, %noreg; %mem:LD2[@g_5] GR32:%vreg10
%vreg0<def> = AND32ri %vreg10, 65534, %%EFLAGS<imp-def,dead>; %GR32:%vreg0,%vreg10
This makes the test pointless since it no longer creates the spiller hazard.
llvm-svn: 132210
- the selector for the landing pad must provide all available information
about the handlers, filters, and cleanups within that landing pad
- calls to _Unwind_Resume must be converted to branches to the enclosing
lpad so as to avoid re-entering the unwinder when the lpad claimed it
was going to handle the exception in some way
This is quite specific to libUnwind-based unwinding. In an effort to not
interfere too badly with other unwinders, and with existing hacks in frontends,
this only triggers on _Unwind_Resume (not _Unwind_Resume_or_Rethrow) and does
nothing with selectors if it cannot find a selector call for either lpad.
llvm-svn: 132200
- Flip order of bitfields. This gets our output matching GAS.
- Handle case where the end of the prolog wasn't specified.
- If the resulting unwind info struct is less than 8 bytes, pad to 8 bytes.
Add a test for the latter two.
llvm-svn: 132188
Rework how the MCWin64EHUnwindInfo instances are stored. Fix issues with
chained unwind areas exposed by the test that were related to this.
The ChainedParent field had the wrong address, because when the chained unwind
info was added, the addresses shifted around. Now we store the pointers to the
structures, which are now allocated from the MC heap.
llvm-svn: 132106
Use a proper worklist for use-def traversal without holding onto an
iterator. Now that we process all IV uses, we need complete logic for
resusing existing derived IV defs. See HoistStep.
llvm-svn: 132103
The practical effects here are that x86-64 fast-isel can now handle trunc from i8 to i1, and ARM fast-isel can handle many more constructs involving integers narrower than 32 bits (including loads, stores, and many integer casts).
rdar://9437928 .
llvm-svn: 132099
them.
I had to add a special SwitchSectionNoChange method to MCStreamer just for
.seh_handlerdata. If this isn't OK, please let me know, and I'll find some
other way to fix .seh_handlerdata streaming.
llvm-svn: 132084
after checking for a GEP, so that it matches what GetUnderlyingObject
does. This fixes an obscure bug turned up by bugpoint in the testcase
for PR9931.
llvm-svn: 131971
non-zero.
- Teach X86 cmov optimization to eliminate the cmov from ctlz, cttz extension
when the source of X86ISD::BSR / X86ISD::BSF is proven to be non-zero.
rdar://9490949
llvm-svn: 131948
The following improvements are accomplished as a result of applying this patch:
- Fixed frame objects' offsets (relative to either the virtual frame pointer or
the stack pointer) are set before instruction selection is completed. There is
no need to wait until Prologue/Epilogue Insertion is run to set them.
- Calculation of final offsets of fixed frame objects is straightforward. It is
no longer necessary to assign negative offsets to fixed objects for incoming
arguments in order to distinguish them from the others.
- Since a fixed object has its relative offset set during instruction
selection, there is no need to conservatively set its alignment to 4.
- It is no longer necessary to reorder non-fixed frame objects in
MipsFrameLowering::adjustMipsStackFrame.
llvm-svn: 131915
aligned.
Teach memcpyopt to not give up all hope when confonted with an underaligned
memcpy feeding an overaligned byval. If the *source* of the memcpy can be
determined to be adequeately aligned, or if it can be forced to be, we can
eliminate the memcpy.
This addresses PR9794. We now compile the example into:
define i32 @f(%struct.p* nocapture byval align 8 %q) nounwind ssp {
entry:
%call = call i32 @g(%struct.p* byval align 8 %q) nounwind
ret i32 %call
}
in both x86-64 and x86-32 mode. We still don't get a tailcall though,
because tailcalls apparently can't handle byval.
llvm-svn: 131884
result is non-zero. Implement an example optimization (PR9814), which allows us to
transform:
A / ((1 << B) >>u 2)
into:
A >>u (B-2)
which we compile into:
_divu3: ## @divu3
leal -2(%rsi), %ecx
shrl %cl, %edi
movl %edi, %eax
ret
instead of:
_divu3: ## @divu3
movb %sil, %cl
movl $1, %esi
shll %cl, %esi
shrl $2, %esi
movl %edi, %eax
xorl %edx, %edx
divl %esi, %eax
ret
llvm-svn: 131860
failing to form a memset, then having to delete it" but my approximation
isn't safe for self recurrent loops. Instead of doign a hack, just
do it the right way.
llvm-svn: 131858
I also changed -simplifycfg, -jump-threading and -codegenprepare to use this to produce slightly better code without any extra cleanup passes (AFAICT this was the only place in -simplifycfg where now-dead conditions of replaced terminators weren't being cleaned up). The only other user of this function is -sccp, but I didn't read that thoroughly enough to figure out whether it might be holding pointers to instructions that could be deleted by this.
llvm-svn: 131855
Original log message:
When BasicAA can determine that two pointers have the same base but
differ by a dynamic offset, return PartialAlias instead of MayAlias.
See the comment in the code for details. This fixes PR9971.
llvm-svn: 131809
text section.
Assume the following bit of annotated assembly:
.section .data.rel.ro,"aw",%progbits
.align 2
.LAlpha:
.long startval(GOTOFF)
.text
.align 2
.type main,%function
.align 4
main: ;;; assume "main" starts at offset 0x20
0x0 push {r11, lr}
0x4 movw r0, :lower16:(.LAlpha-(.LBeta+8))
;;; ==> (.AddrOf(.LAlpha) - ((.AddrOf(.LBeta) - .AddrOf(".")) + 8)
;;; ==> (??? - ((16-4) + 8) = -20
0x8 movt r0, :upper16:(.LAlpha-(.LBeta+8))
;;; ==> (.AddrOf(.LAlpha) - ((.AddrOf(.LBeta) - .AddrOf(".")) + 8)
;;; ==> (??? - ((16-8) + 8) = -16
0xc ... blah
.LBeta:
0x10 add r0, pc, r0
0x14 ... blah
.LGamma:
0x18 add r1, pc, r1
Above snippet results in the following relocs in the .o file for the
first pair of movw/movt instructions
00000024 R_ARM_MOVW_PREL_NC .LAlpha
00000028 R_ARM_MOVT_PREL .LAlpha
And the encoded instructions in the .o file for main: must be
00000020 <main>:
20: e92d4800 push {fp, lr}
24: e30f0fec movw r0, #65516 ; 0xffec i.e. -20
28: e34f0ff0 movt r0, #65520 ; 0xfff0 i.e. -16
However, llc (prior to this commit) generates the following sequence
00000020 <main>:
20: e92d4800 push {fp, lr}
24: e30f0fec movw r0, #65516 ; 0xffec - i.e. -20
28: e34f0fff movt r0, #65535 ; 0xffff - i.e. -1
What has to happen in the ArmAsmBackend is that if the relocation is PC
relative, the 16 bits encoded as part of movw and movt must be both addends,
not addresses. It makes sense to encode addresses by right shifting the value
by 16, but the result is incorrect for PIC.
i.e., the right shift by 16 for movt is ONLY valid for the NON-PCRel case.
This change agrees with what GNU as does, and makes the PIC code run.
MC/ARM/elf-movt.s covers this case.
llvm-svn: 131674
x86_64 sibcall logic. I've filed PR9943 for the sibcall problem, and
this patch alters the testcase to work around the flaw. When PR9943
is fixed, this patch should be reverted.
llvm-svn: 131557
happily accept things like "sext <2 x i32> to <999 x i64>". It would
also accept "sext <2 x i32> to i64", though the verifier would catch
that later. Fixed by having castIsValid check that vector lengths match
except when doing a bitcast. (2) When creating a cast instruction, check
that the cast is valid (this was already done when creating constexpr
casts). While there, replace getScalarSizeInBits (used to allow more
vector casts) with getPrimitiveSizeInBits in getCastOpcode and isCastable
since vector to vector casts are now handled explicitly by passing to the
element types; i.e. this bit should result in no functional change.
llvm-svn: 131532
As an example, the change to InstCombineCalls catches a common case where a call to a bitcast of a function is rewritten.
Chris, does this approach look reasonable?
llvm-svn: 131516
When instructions are deleted, they leave tombstone SlotIndex entries.
The isZeroLength method should ignore these null indexes.
This causes RABasic to sometimes spill a callee-saved register in the
abi-isel.ll test, so don't run that test with -regalloc=basic. Prioritizing
register allocation according to spill weight can cause more registers to be
used.
llvm-svn: 131436
("T is 1 if the target symbol S has type STT_FUNC and the
symbol addresses a Thumb instruction ;it is 0 otherwise."
from "ELF for the ARM Architecture" 4.7.1.2)
Patch by Koan-Sin Tan!
llvm-svn: 131406
by non-CMP expressions. The executable test case (129821) would test
this as well, if we had an "-O0 -disable-arm-fast-isel" LLVM-GCC
tester. Alas, the ARM assembly would be very difficult to check with
FileCheck.
The thumb2-cbnz.ll test is affected; it generates larger code (tst.w
vs. cmp #0), but I believe the new version is correct.
rdar://problem/9298790
llvm-svn: 131261
If there is a store after the load node, then there is a chain, which means
that there is another user. Thus, asking hasOneUser would fail. Instead we
ask hasNUsesOfValue on the 'data' value.
llvm-svn: 131183
at the start of basic blocks to their common predecessor. It's actually quite
common (e.g. about 50 times in JM/lencod) and has shown to be a nice code size
benefit. e.g.
pushq %rax
testl %edi, %edi
jne LBB0_2
## BB#1:
xorb %al, %al
popq %rdx
ret
LBB0_2:
xorb %al, %al
callq _foo
popq %rdx
ret
=>
pushq %rax
xorb %al, %al
testl %edi, %edi
je LBB0_2
## BB#1:
callq _foo
LBB0_2:
popq %rdx
ret
rdar://9145558
llvm-svn: 131172
this clang will use .debug_frame in, for example,
clang -g -c -m32 test.c
This matches gcc's behaviour. It looks like .debug_frame is a bit bigger
than .eh_frame, but has the big advantage of not being allocated.
llvm-svn: 131140
often expressed as "x >= y ? x : y", there is a good chance we can extract
the existing "x >= y" from it and use that as a replacement for "max(x,y)==x".
llvm-svn: 131049
This can't be just an assertion, users can always write impossible inline
assembly. Such an assembly statement should be included in the error message.
llvm-svn: 131024
Most of these tests require a single mov instruction that can come either before
or after a 2-addr instruction. -join-physregs changes the behavior, but the
results are equivalent.
llvm-svn: 130891
landing pad as its successor.
SjLj exception handling jumps to the correct landing pad via a switch statement
that's generated right before code-gen. Loosen the constraint in the machine
instruction verifier to allow for this. Note, this isn't the most rigorous check
since we cannot determine where that switch statement came from. But it's
marginally better than turning this check off when SjLj exceptions are used.
<rdar://problem/9187612>
llvm-svn: 130881
Original message:
Teach MachineCSE how to do simple cross-block CSE involving physregs. This allows, for example, eliminating duplicate cmpl's on x86. Part of rdar://problem/8259436 .
llvm-svn: 130877
These tests all follow the same pattern:
mov r2, r0
movs r0, #0
$CMP r2, r1
it eq
moveq r0, #1
bx lr
The first 'mov' can be eliminated by rematerializing 'movs r0, #0' below the
test instruction:
$CMP r0, r1
mov.w r0, #0
it eq
moveq r0, #1
bx lr
So far, only physreg coalescing can do that. The register allocators won't yet
split live ranges just to eliminate copies. They can learn, but this particular
problem is not likely to show up in real code. It only appears because r0 is
used for both the function argument and return value.
llvm-svn: 130858