- Make foldMemoryOperandImpl aware of 256-bit zero vectors folding and support the 128-bit counterparts of AVX too.
- Make sure MOV[AU]PS instructions are only selected when SSE1 is enabled, and duplicate the patterns to match AVX.
- Add a testcase for a simple 128-bit zero vector creation.
llvm-svn: 110946
term goal here is to be able to match enough of vector_shuffle and build_vector
so all avx intrinsics which aren't mapped to their own built-ins but to
shufflevector calls can be codegen'd. This is the first (baby) step, support
building zeroed vectors.
llvm-svn: 110897
nice farm in the country where it can play with other tests. And bunnies.
It is not clear what is being tested, and the revision history shows a bunch of
random changes to the expected instruction count. Clearly, we are just fudging
it to pass whenever it fails.
llvm-svn: 110118
away from a computer now.
--- Reverse-merging r109881 into '.':
D test/CodeGen/X86/avx-intrinsics-x86.ll
D test/CodeGen/X86/avx-intrinsics-x86_64.ll
llvm-svn: 109959
void foo() { __builtin_unreachable(); }
It will output the following on Darwin X86:
_func1:
Leh_func_begin0:
pushq %rbp
Ltmp0:
movq %rsp, %rbp
Ltmp1:
Leh_func_end0:
This prolog adds a new Call Frame Information (CFI) row to the FDE with an
address that is not within the address range of the code it describes -- part is
equal to the end of the function -- and therefore results in an invalid EH
frame. If we emit a nop in this situation, then the CFI row is now within the
address range.
llvm-svn: 108568
the function. We'll just turn it into a "trap" instruction instead.
The problem with not handling this is that it might generate a prologue without
the equivalent epilogue to go with it:
$ cat t.ll
define void @foo() {
entry:
unreachable
}
$ llc -o - t.ll -relocation-model=pic -disable-fp-elim -unwind-tables
.section __TEXT,__text,regular,pure_instructions
.globl _foo
.align 4, 0x90
_foo: ## @foo
Leh_func_begin0:
## BB#0: ## %entry
pushq %rbp
Ltmp0:
movq %rsp, %rbp
Ltmp1:
Leh_func_end0:
...
The unwind tables then have bad data in them causing all sorts of problems.
Fixes <rdar://problem/8096481>.
llvm-svn: 108473
-enable-no-nans-fp-math and -enable-no-infs-fp-math. All of the current codegen fp math optimizations only care whether the fp arithmetics arguments and results can never be NaN.
llvm-svn: 108465
to keep "Text" in sync with the "pure instructions" section attribute.
Lack of this attribute was preventing the assembler from emitting
multibyte noops instructions for templates (and inlines, and other
coalesced stuff) and was causing the assembler to mismatch .o files.
This fixes rdar://8018335
llvm-svn: 108461
address cannot be allocated a register is in 32-bit mode where the first
three arguments are marked inreg. In that case EAX, EDX, and ECX will be
used for argument passing.
This fixes PR7610.
llvm-svn: 108327
- Check getBytesToPopOnReturn().
- Eschew ST0 and ST1 for return values.
- Fix the PIC base register initialization so that it doesn't ever
fail to end up the top of the entry block.
llvm-svn: 108039
U utils/TableGen/FastISelEmitter.cpp
--- Reverse-merging r107943 into '.':
U test/CodeGen/X86/fast-isel.ll
U test/CodeGen/X86/fast-isel-loads.ll
U include/llvm/Target/TargetLowering.h
U include/llvm/Support/PassNameParser.h
U include/llvm/CodeGen/FunctionLoweringInfo.h
U include/llvm/CodeGen/CallingConvLower.h
U include/llvm/CodeGen/FastISel.h
U include/llvm/CodeGen/SelectionDAGISel.h
U lib/CodeGen/LLVMTargetMachine.cpp
U lib/CodeGen/CallingConvLower.cpp
U lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
U lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
U lib/CodeGen/SelectionDAG/FastISel.cpp
U lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
U lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
U lib/CodeGen/SelectionDAG/InstrEmitter.cpp
U lib/CodeGen/SelectionDAG/TargetLowering.cpp
U lib/Target/XCore/XCoreISelLowering.cpp
U lib/Target/XCore/XCoreISelLowering.h
U lib/Target/X86/X86ISelLowering.cpp
U lib/Target/X86/X86FastISel.cpp
U lib/Target/X86/X86ISelLowering.h
llvm-svn: 107987
It is OK for an alias live range to overlap if there is a copy to or from the
physical register. CoalescerPair can work out if the copy is coalescable
independently of the alias.
This means that we can join with the actual destination interval instead of
using the getOrigDstReg() hack. It is no longer necessary to merge clobber
ranges into subregisters.
llvm-svn: 107695
the example in the testcase, we now generate:
_test1: ## @test1
movss 4(%esp), %xmm0
addss 8(%esp), %xmm0
movl 12(%esp), %eax
movss %xmm0, (%eax)
ret
instead of:
_test1: ## @test1
subl $20, %esp
movl 24(%esp), %eax
movq %mm0, (%esp)
movq %mm0, 8(%esp)
movss (%esp), %xmm0
addss 12(%esp), %xmm0
movss %xmm0, (%eax)
addl $20, %esp
ret
v2f32 support did not work reliably because most of the X86
backend didn't know it was legal. It was apparently only added
to support returning source-level v2f32 values in MMX registers
in x86-32 mode. If ABI compatibility is important on this
GCC-extended-vector type for some reason, then the frontend
should generate IR that returns v2i32 instead of v2f32. However,
we generally don't try very hard to be abi compatible on gcc
extended vectors.
llvm-svn: 107601
v2f32 as legal in 32-bit mode. It is just as terrible there,
but I just care about x86-64 and noone claims it is valuable
in 64-bit mode.
llvm-svn: 107600
- X86 unfolding should check if the instructions being unfolded has memoperands.
If there is no memoperands, then it must assume conservative alignment. If this
would introduce an expensive sse unaligned load / store, then unfoldMemoryOperand
etc. should not unfold the instruction.
llvm-svn: 107509
PrologEpilog code, and use it to determine whether
the asm forces stack alignment or not. gcc consistently
does not do this for GCC-style asms; Apple gcc inconsistently
sometimes does it for asm blocks. There is no
convenient place to put a bit in either the SDNode or
the MachineInstr form, so I've added an extra operand
to each; unlovely, but it does allow for expansion for
more bits, should we need it. PR 5125. Some
existing testcases are affected.
The operand lists of the SDNode and MachineInstr forms
are indexed with awesome mnemonics, like "2"; I may
fix this someday, but not now. I'm not making it any
worse. If anyone is inspired I think you can find all
the right places from this patch.
llvm-svn: 107506
Objective-C metadata types which should be marked as "weak", but which the
linker will remove upon final linkage. However, this linkage isn't specific to
Objective-C.
For example, the "objc_msgSend_fixup_alloc" symbol is defined like this:
.globl l_objc_msgSend_fixup_alloc
.weak_definition l_objc_msgSend_fixup_alloc
.section __DATA, __objc_msgrefs, coalesced
.align 3
l_objc_msgSend_fixup_alloc:
.quad _objc_msgSend_fixup
.quad L_OBJC_METH_VAR_NAME_1
This is different from the "linker_private" linkage type, because it can't have
the metadata defined with ".weak_definition".
Currently only supported on Darwin platforms.
llvm-svn: 107433
have to be registers, per gcc documentation. This affects
the logic for determining what "g" should lower to. PR 7393.
A couple of existing testcases are affected.
llvm-svn: 107079
When an instruction has tied operands and physreg defines, we must take extra
care that the tied operands conflict with neither physreg defs nor uses.
The special treatment is given to inline asm and instructions with tied operands
/ early clobbers and physreg defines.
This fixes PR7509.
llvm-svn: 107043
CopyFromReg nodes for aliasing registers (AX and AL). This confuses the fast
register allocator.
Instead of CopyFromReg(AL), use ExtractSubReg(CopyFromReg(AX), sub_8bit).
This fixes PR7312.
llvm-svn: 106934
for an "i" constraint should get lowered; PR 6309. While
this argument was passed around a lot, this is the only
place it was used, so it goes away from a lot of other
places.
llvm-svn: 106893
address requires a register or secondary load to compute
(most PIC modes). This improves "g" constraint handling. 8015842.
The test from 2007 is attempting to test the fix for PR1761,
but since -relocation-model=static doesn't work on Darwin
x86-64, it was not testing what it was supposed to be testing
and was passing erroneously. Fixed to use Linux x86-64.
llvm-svn: 106779
when the condition is constant. This optimization shouldn't be
necessary, because codegen shouldn't be able to find dead control
paths that the IR-level optimizer can't find. And it's undesirable,
because it encourages bugpoint to leave "br i1 false" branches
in its output. And it wasn't updating the CFG.
I updated all the tests I could, but some tests are too reduced
and I wasn't able to meaningfully preserve them.
llvm-svn: 106748