llvm-project/llvm/test/CodeGen/X86/win32-spill-xmm.ll

; RUN: llc -mcpu=generic -mtriple=i686-pc-windows-msvc -mattr=+sse < %s | FileCheck %s

; Check proper alignment of spilled vector

; CHECK-LABEL: spill_ok
; CHECK: subl    $32, %esp
; CHECK: movaps  %xmm3, (%esp)
; CHECK: movl    $0, 16(%esp)
; CHECK: calll   _bar
define void @spill_ok(i32, <16 x float> *) {
entry:
  %2 = alloca i32, i32 %0
  %3 = load <16 x float>, <16 x float> * %1, align 64
  tail call void @bar(<16 x float> %3, i32 0) nounwind
  ret void
}

declare void @bar(<16 x float> %a, i32 %b)

; Check that proper alignment of spilled vector does not affect vargs

; CHECK-LABEL: vargs_not_affected
; CHECK: movl 28(%esp), %eax
define i32 @vargs_not_affected(<4 x float> %v, i8* %f, ...) {
entry:
  %ap = alloca i8*, align 4
  %0 = bitcast i8** %ap to i8*
  call void @llvm.va_start(i8* %0)
  %argp.cur = load i8*, i8** %ap, align 4
  %argp.next = getelementptr inbounds i8, i8* %argp.cur, i32 4
  store i8* %argp.next, i8** %ap, align 4
  %1 = bitcast i8* %argp.cur to i32*
  %2 = load i32, i32* %1, align 4
  call void @llvm.va_end(i8* %0)
  ret i32 %2
}

declare void @llvm.va_start(i8*)

declare void @llvm.va_end(i8*)
Arguments spilled on the stack before a function call may have alignment requirements, for example in the case of vectors. These requirements are exploited by the code generator by using move instructions that have similar alignment requirements, e.g., movaps on x86. Although the code generator properly aligns the arguments with respect to the displacement of the stack pointer it computes, the displacement itself may cause misalignment. For example if we have %3 = load <16 x float>, <16 x float>* %1, align 64 call void @bar(<16 x float> %3, i32 0) the x86 back-end emits: movaps 32(%ecx), %xmm2 movaps (%ecx), %xmm0 movaps 16(%ecx), %xmm1 movaps 48(%ecx), %xmm3 subl $20, %esp <-- if %esp was 16-byte aligned before this instruction, it no longer will be afterwards movaps %xmm3, (%esp) <-- movaps requires 16-byte alignment, while %esp is not aligned as such. movl $0, 16(%esp) calll __bar To solve this, we need to make sure that the computed value with which the stack pointer is changed is a multiple af the maximal alignment seen during its computation. With this change we get proper alignment: subl $32, %esp movaps %xmm3, (%esp) Differential Revision: http://reviews.llvm.org/D12337 llvm-svn: 248786 2015-09-29 18:12:57 +08:00			`; RUN: llc -mcpu=generic -mtriple=i686-pc-windows-msvc -mattr=+sse < %s \| FileCheck %s`

			`; Check proper alignment of spilled vector`

			`; CHECK-LABEL: spill_ok`
			`; CHECK: subl $32, %esp`
			`; CHECK: movaps %xmm3, (%esp)`
			`; CHECK: movl $0, 16(%esp)`
			`; CHECK: calll _bar`
			`define void @spill_ok(i32, <16 x float> *) {`
			`entry:`
			`%2 = alloca i32, i32 %0`
			`%3 = load <16 x float>, <16 x float> * %1, align 64`
			`tail call void @bar(<16 x float> %3, i32 0) nounwind`
			`ret void`
			`}`

			`declare void @bar(<16 x float> %a, i32 %b)`

			`; Check that proper alignment of spilled vector does not affect vargs`

			`; CHECK-LABEL: vargs_not_affected`
[X86][MS] Fix the aligement mismatch of vector variable arguments on Win32 The alignment of vector variable arguments in callee side is 4, which is aligned with MSVC. But the caller aligns them to the size of vector arguments. It results in run fails. This patch fixes this problem by trimming it to 4 bytes for variable arguments on Win32. Fixed vector arguments are passed by pointer on Win32. So they don't have the problem. I don't find a doc in MSDN for this calling conversion, so I did several experiments here: https://godbolt.org/z/n1zn1Gx1z Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D108887 2021-09-08 08:22:46 +08:00			`; CHECK: movl 28(%esp), %eax`
Arguments spilled on the stack before a function call may have alignment requirements, for example in the case of vectors. These requirements are exploited by the code generator by using move instructions that have similar alignment requirements, e.g., movaps on x86. Although the code generator properly aligns the arguments with respect to the displacement of the stack pointer it computes, the displacement itself may cause misalignment. For example if we have %3 = load <16 x float>, <16 x float>* %1, align 64 call void @bar(<16 x float> %3, i32 0) the x86 back-end emits: movaps 32(%ecx), %xmm2 movaps (%ecx), %xmm0 movaps 16(%ecx), %xmm1 movaps 48(%ecx), %xmm3 subl $20, %esp <-- if %esp was 16-byte aligned before this instruction, it no longer will be afterwards movaps %xmm3, (%esp) <-- movaps requires 16-byte alignment, while %esp is not aligned as such. movl $0, 16(%esp) calll __bar To solve this, we need to make sure that the computed value with which the stack pointer is changed is a multiple af the maximal alignment seen during its computation. With this change we get proper alignment: subl $32, %esp movaps %xmm3, (%esp) Differential Revision: http://reviews.llvm.org/D12337 llvm-svn: 248786 2015-09-29 18:12:57 +08:00			`define i32 @vargs_not_affected(<4 x float> %v, i8* %f, ...) {`
			`entry:`
			`%ap = alloca i8*, align 4`
			`%0 = bitcast i8** %ap to i8*`
			`call void @llvm.va_start(i8* %0)`
			`%argp.cur = load i8, i8* %ap, align 4`
			`%argp.next = getelementptr inbounds i8, i8* %argp.cur, i32 4`
			`store i8* %argp.next, i8** %ap, align 4`
			`%1 = bitcast i8* %argp.cur to i32*`
			`%2 = load i32, i32* %1, align 4`
			`call void @llvm.va_end(i8* %0)`
			`ret i32 %2`
			`}`

			`declare void @llvm.va_start(i8*)`

			`declare void @llvm.va_end(i8*)`