have 4 bits per register in the operand encoding), but have undefined
behavior when the operand value is 13 or 15 (SP and PC, respectively).
The trivial coalescer in linear scan sometimes will merge a copy from
SP into a subsequent instruction which uses the copy, and if that
instruction cannot legally reference SP, we get bad code such as:
mls r0,r9,r0,sp
instead of:
mov r2, sp
mls r0, r9, r0, r2
This patch adds a new register class for use by Thumb2 that excludes
the problematic registers (SP and PC) and is used instead of GPR
for those operands which cannot legally reference PC or SP. The
trivial coalescer explicitly requires that the register class
of the destination for the COPY instruction contain the source
register for the COPY to be considered for coalescing. This prevents
errant instructions like that above.
PR7499
llvm-svn: 109842
integers with mov + vdup. 8003375. This is
currently disabled by default because LICM will
not hoist a VDUP, so it pessimizes the code if
the construct occurs inside a loop (8248029).
llvm-svn: 109799
We do sometimes load from a too small stack slot when dealing with x86 arguments
(varargs and smaller-than-32-bit args). It looks like we know what we are doing
in those cases, so I am going to remove the assert instead of artifically
enlarging stack slot sizes.
The assert in storeRegToStackSlot stays in. We don't want to write beyond the
bounds of a stack slot.
llvm-svn: 109764
The size of this object isn't used for anything - technically it is of variable
size.
This avoids a false positive from the assert in
X86InstrInfo::loadRegFromStackSlot, and fixes PR7735.
llvm-svn: 109652
subregister operands like this:
%reg1040:sub_32bit<def> = MOV32rm <fi#-2>, 1, %reg0, 0, %reg0, %reg1040<imp-def>; mem:LD4[FixedStack-2](align=8)
Make them return false when subreg operands are present. VirtRegRewriter is
making bad assumptions otherwise.
This fixes PR7713.
llvm-svn: 109489
we are using AVX and no AVX version of the desired intruction is present,
this is better for incremental dev (without fallbacks it's easier to spot
what's missing). Not sure this is the best hack thought (we can also disable
all HasSSE* predicates by dinamically marking them 'false' if AVX is present)
llvm-svn: 109434
This assumption is not satisfied due to global mergeing.
Workaround the issue by temporary disablinge mergeing of const globals.
Also, ignore LLVM "special" globals. This fixes PR7716
llvm-svn: 109423
appropriate for targets without detailed instruction iterineries.
The scheduler schedules for increased instruction level parallelism in
low register pressure situation; it schedules to reduce register pressure
when the register pressure becomes high.
On x86_64, this is a win for all tests in CFP2000. It also sped up 256.bzip2
by 16%.
llvm-svn: 109300
comments explaining why it was wrong. 8225024.
Fix the real problem in 8213383: the code that splits very large
blocks when no other place to put constants can be found was not
considering the case that the block contained a Thumb tablejump.
llvm-svn: 109282
it's too late to start backing off aggressive latency scheduling when most
of the registers are in use so the threshold should be a bit tighter.
- Correctly handle live out's and extract_subreg etc.
- Enable register pressure aware scheduling by default for hybrid scheduler.
For ARM, this is almost always a win on # of instructions. It's runtime
neutral for most of the tests. But for some kernels with high register
pressure it can be a huge win. e.g. 464.h264ref reduced number of spills by
54 and sped up by 20%.
llvm-svn: 109279
SSE, so we can't return floating point values if this
is disabled. Detect this error for clang.
With SSE1 only, f64 is a problem; it can be done, but
neither llvm-gcc nor clang has ever generated correct
code for it. Since nobody noticed this I think it's
OK to treat it as an error for now.
This also handles SSE-sized vectors of floating point.
8207686, 8204109.
llvm-svn: 109201
ARM/PPC/MSP430-specific code (which are the only targets that
implement the hook) can directly reference their target-specific
instrinfo classes.
llvm-svn: 109171
This is probably not the best way to implement "Force LR to
be spilled if the Thumb function size is > 2048." do this,
it should use the branch shortening infrastructure, but I'm
just preserving functionality here.
llvm-svn: 109165
rip out the implementation of X86InstrInfo::GetInstSizeInBytes.
The code being ripped out just implemented a copy and hacked up
version of the (old) instruction encoder, and is buggy and
terrible in other ways. Since "GetInstSizeInBytes" is really
only there to support the JIT's "NeedsExactSize" hook (which
noone is using), just rip out the code. I will rip out the
NeedsExactSize hook next.
This resolves rdar://7617809 - switch X86InstrInfo::GetInstSizeInBytes to use X86MCCodeEmitter
llvm-svn: 109149
mov pc, r1
.align 2
LJTI0_0_0:
.long LBB0_14
This fixes rdar://8213383. No test case since it's not possible to come up with a suitable small one.
llvm-svn: 109076
asmprinter or mangler around. This is option #B for killing off
X86InstrInfo::GetInstSizeInBytes. Option #A (killing
"needsexactsize") was sent for consideration to llvmdev.
llvm-svn: 109056
1) all registers were spilled as xmm, regardless of actual size
2) win64 abi doesn't do the varargs-size-in-%al thing
Still to look into:
xmm6-15 are marked as clobbered by call instructions on win64 even though they aren't.
llvm-svn: 109035
- Fix a typo for PIC check during jmp table lowering
- Also fix the "first jump table basic block is not
considered only reachable by fall through" problem, use this
ad-hoc solution until I come up with something better.
Patch by stetorvs@gmail.com
llvm-svn: 108820
of AsmPrinter and InstLowering into libx86 and out of the
asmprinter subdirectory. Now X86/AsmPrinter just depends on
MC stuff, not all of codegen and LLVM IR.
llvm-svn: 108782
instruction, we only want to allow the one for the current subtarget.
- This also fixes suffix matching for jmp instructions, because it eliminates
the ambiguity between 'jmpl' and 'jmpq'.
llvm-svn: 108746
it should set the jump table encloding the EK_Inline. This prevents
a second, unused, copy of the table from being emitted after the function
body. PR6581.
llvm-svn: 108730
it should set the jump table encloding the EK_Inline. This prevents
a second, unused, copy of the table from being emitted after the function
body. PR7499.
llvm-svn: 108722
- Currently includes a hack to limit ourselves to "In32BitMode" and "In64BitMode", because we don't have the other infrastructure to properly deal with setting SSE, etc. features on X86.
llvm-svn: 108677
- Unfortunate, but necessary for now to handle subtarget instruction matching. Eventually we should factor out the lower level target machine information so we don't need to do this.
llvm-svn: 108664
stack realignment on ARM.
Also check for function attributes as we do on X86 as well as
make explicit that we're checking can as well as needs in this function.
llvm-svn: 108582
operands.
Hopefully this fixes the llvm-gcc-powerpc-darwin9 buildbot. It really shouldn't
since missing memoperands should not affect correctness.
llvm-svn: 108540
FP_REG_KILL instructions are still inserted, but can be disabled by passing
-live-x87 to llc. The X87FPRegKillInserterPass is going to be removed shortly.
CFG edges are partioned into bundles where the x87 stack must be allocated
identically. Code is insertad at the end of each basic block that shuffles the
live FP registers to match the outgoing bundles expectations.
This fix is in preparation for some upcoming register allocator improvements
that may extend the live range of registers beyond a basic block, similar to
LICM. It also provides a nice runtime speedup if you are building with
-mfpmath=387.
llvm-svn: 108529
-enable-no-nans-fp-math and -enable-no-infs-fp-math. All of the current codegen fp math optimizations only care whether the fp arithmetics arguments and results can never be NaN.
llvm-svn: 108465
this fixes rdar://8192860. Unfortunately it can only be triggered
with llc because llvm-mc matches another (correctly encoded) version
of this, so no testcase.
llvm-svn: 108454
instructions use different values (e.g., 2-byte or 4-byte alignment).
Also fix ARMInstPrinter to print these alignments as bits instead of bytes.
llvm-svn: 108386
address cannot be allocated a register is in 32-bit mode where the first
three arguments are marked inreg. In that case EAX, EDX, and ECX will be
used for argument passing.
This fixes PR7610.
llvm-svn: 108327