From 427ea6f0a759fb1c577681f079903b1ee60ce846 Mon Sep 17 00:00:00 2001 From: Chris Lattner Date: Fri, 19 May 2006 20:45:52 +0000 Subject: [PATCH] Split FP-stack notes out of the main readme. Next up: splitting out SSE. llvm-svn: 28399 --- llvm/lib/Target/X86/README-FPStack.txt | 99 ++++++++++++++++++++++++ llvm/lib/Target/X86/README.txt | 100 ------------------------- 2 files changed, 99 insertions(+), 100 deletions(-) create mode 100644 llvm/lib/Target/X86/README-FPStack.txt diff --git a/llvm/lib/Target/X86/README-FPStack.txt b/llvm/lib/Target/X86/README-FPStack.txt new file mode 100644 index 000000000000..d94fa0219da4 --- /dev/null +++ b/llvm/lib/Target/X86/README-FPStack.txt @@ -0,0 +1,99 @@ +//===---------------------------------------------------------------------===// +// Random ideas for the X86 backend: FP stack related stuff +//===---------------------------------------------------------------------===// + +//===---------------------------------------------------------------------===// + +Some targets (e.g. athlons) prefer freep to fstp ST(0): +http://gcc.gnu.org/ml/gcc-patches/2004-04/msg00659.html + +//===---------------------------------------------------------------------===// + +On darwin/x86, we should codegen: + + ret double 0.000000e+00 + +as fld0/ret, not as: + + movl $0, 4(%esp) + movl $0, (%esp) + fldl (%esp) + ... + ret + +//===---------------------------------------------------------------------===// + +This should use fiadd on chips where it is profitable: +double foo(double P, int *I) { return P+*I; } + +We have fiadd patterns now but the followings have the same cost and +complexity. We need a way to specify the later is more profitable. + +def FpADD32m : FpI<(ops RFP:$dst, RFP:$src1, f32mem:$src2), OneArgFPRW, + [(set RFP:$dst, (fadd RFP:$src1, + (extloadf64f32 addr:$src2)))]>; + // ST(0) = ST(0) + [mem32] + +def FpIADD32m : FpI<(ops RFP:$dst, RFP:$src1, i32mem:$src2), OneArgFPRW, + [(set RFP:$dst, (fadd RFP:$src1, + (X86fild addr:$src2, i32)))]>; + // ST(0) = ST(0) + [mem32int] + +//===---------------------------------------------------------------------===// + +The FP stackifier needs to be global. Also, it should handle simple permutates +to reduce number of shuffle instructions, e.g. turning: + +fld P -> fld Q +fld Q fld P +fxch + +or: + +fxch -> fucomi +fucomi jl X +jg X + +Ideas: +http://gcc.gnu.org/ml/gcc-patches/2004-11/msg02410.html + + +//===---------------------------------------------------------------------===// + +Add a target specific hook to DAG combiner to handle SINT_TO_FP and +FP_TO_SINT when the source operand is already in memory. + +//===---------------------------------------------------------------------===// + +Open code rint,floor,ceil,trunc: +http://gcc.gnu.org/ml/gcc-patches/2004-08/msg02006.html +http://gcc.gnu.org/ml/gcc-patches/2004-08/msg02011.html + +Opencode the sincos[f] libcall. + +//===---------------------------------------------------------------------===// + +None of the FPStack instructions are handled in +X86RegisterInfo::foldMemoryOperand, which prevents the spiller from +folding spill code into the instructions. + +//===---------------------------------------------------------------------===// + +Currently the x86 codegen isn't very good at mixing SSE and FPStack +code: + +unsigned int foo(double x) { return x; } + +foo: + subl $20, %esp + movsd 24(%esp), %xmm0 + movsd %xmm0, 8(%esp) + fldl 8(%esp) + fisttpll (%esp) + movl (%esp), %eax + addl $20, %esp + ret + +This will be solved when we go to a dynamic programming based isel. + +//===---------------------------------------------------------------------===// diff --git a/llvm/lib/Target/X86/README.txt b/llvm/lib/Target/X86/README.txt index 8e752e061e74..5084467657ec 100644 --- a/llvm/lib/Target/X86/README.txt +++ b/llvm/lib/Target/X86/README.txt @@ -29,62 +29,6 @@ unsigned test(unsigned long long X, unsigned Y) { This can be done trivially with a custom legalizer. What about overflow though? http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14224 -//===---------------------------------------------------------------------===// - -Some targets (e.g. athlons) prefer freep to fstp ST(0): -http://gcc.gnu.org/ml/gcc-patches/2004-04/msg00659.html - -//===---------------------------------------------------------------------===// - -On darwin/x86, we should codegen: - - ret double 0.000000e+00 - -as fld0/ret, not as: - - movl $0, 4(%esp) - movl $0, (%esp) - fldl (%esp) - ... - ret - -//===---------------------------------------------------------------------===// - -This should use fiadd on chips where it is profitable: -double foo(double P, int *I) { return P+*I; } - -We have fiadd patterns now but the followings have the same cost and -complexity. We need a way to specify the later is more profitable. - -def FpADD32m : FpI<(ops RFP:$dst, RFP:$src1, f32mem:$src2), OneArgFPRW, - [(set RFP:$dst, (fadd RFP:$src1, - (extloadf64f32 addr:$src2)))]>; - // ST(0) = ST(0) + [mem32] - -def FpIADD32m : FpI<(ops RFP:$dst, RFP:$src1, i32mem:$src2), OneArgFPRW, - [(set RFP:$dst, (fadd RFP:$src1, - (X86fild addr:$src2, i32)))]>; - // ST(0) = ST(0) + [mem32int] - -//===---------------------------------------------------------------------===// - -The FP stackifier needs to be global. Also, it should handle simple permutates -to reduce number of shuffle instructions, e.g. turning: - -fld P -> fld Q -fld Q fld P -fxch - -or: - -fxch -> fucomi -fucomi jl X -jg X - -Ideas: -http://gcc.gnu.org/ml/gcc-patches/2004-11/msg02410.html - - //===---------------------------------------------------------------------===// Improvements to the multiply -> shift/add algorithm: @@ -136,11 +80,6 @@ allocator. Delay codegen until post register allocation. //===---------------------------------------------------------------------===// -Add a target specific hook to DAG combiner to handle SINT_TO_FP and -FP_TO_SINT when the source operand is already in memory. - -//===---------------------------------------------------------------------===// - Model X86 EFLAGS as a real register to avoid redudant cmp / test. e.g. cmpl $1, %eax @@ -181,24 +120,6 @@ flags. //===---------------------------------------------------------------------===// -Open code rint,floor,ceil,trunc: -http://gcc.gnu.org/ml/gcc-patches/2004-08/msg02006.html -http://gcc.gnu.org/ml/gcc-patches/2004-08/msg02011.html - -//===---------------------------------------------------------------------===// - -Combine: a = sin(x), b = cos(x) into a,b = sincos(x). - -Expand these to calls of sin/cos and stores: - double sincos(double x, double *sin, double *cos); - float sincosf(float x, float *sin, float *cos); - long double sincosl(long double x, long double *sin, long double *cos); - -Doing so could allow SROA of the destination pointers. See also: -http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17687 - -//===---------------------------------------------------------------------===// - The instruction selector sometimes misses folding a load into a compare. The pattern is written as (cmp reg, (load p)). Because the compare isn't commutative, it is not matched with the load on both sides. The dag combiner @@ -219,11 +140,6 @@ target specific hook. //===---------------------------------------------------------------------===// -LSR should be turned on for the X86 backend and tuned to take advantage of its -addressing modes. - -//===---------------------------------------------------------------------===// - When compiled with unsafemath enabled, "main" should enable SSE DAZ mode and other fast SSE modes. @@ -293,11 +209,6 @@ The pattern isel got this one right. //===---------------------------------------------------------------------===// -We need to lower switch statements to tablejumps when appropriate instead of -always into binary branch trees. - -//===---------------------------------------------------------------------===// - SSE doesn't have [mem] op= reg instructions. If we have an SSE instruction like this: @@ -351,12 +262,6 @@ much sense (e.g. its an infinite loop). :) //===---------------------------------------------------------------------===// -None of the FPStack instructions are handled in -X86RegisterInfo::foldMemoryOperand, which prevents the spiller from -folding spill code into the instructions. - -//===---------------------------------------------------------------------===// - In many cases, LLVM generates code like this: _test: @@ -827,11 +732,6 @@ _test: //===---------------------------------------------------------------------===// -A Mac OS X IA-32 specific ABI bug wrt returning value > 8 bytes: -http://llvm.org/bugs/show_bug.cgi?id=729 - -//===---------------------------------------------------------------------===// - X86RegisterInfo::copyRegToReg() returns X86::MOVAPSrr for VR128. Is it possible to choose between movaps, movapd, and movdqa based on types of source and destination?