llvm-project/llvm/test/CodeGen/AMDGPU/llvm.SI.fs.interp.ll

;RUN: llc < %s -march=amdgcn -mcpu=verde -verify-machineinstrs | FileCheck --check-prefix=GCN %s
;RUN: llc < %s -march=amdgcn -mcpu=kabini -verify-machineinstrs | FileCheck --check-prefix=GCN --check-prefix=16BANK %s
;RUN: llc < %s -march=amdgcn -mcpu=stoney -verify-machineinstrs | FileCheck --check-prefix=GCN --check-prefix=16BANK %s
;RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs | FileCheck --check-prefix=GCN %s

;GCN-LABEL: {{^}}main:
;GCN-NOT: s_wqm
;GCN: s_mov_b32 m0
;GCN-DAG: v_interp_mov_f32
;GCN-DAG: v_interp_p1_f32
;GCN-DAG: v_interp_p2_f32

define amdgpu_ps void @main(<16 x i8> addrspace(2)* inreg, <16 x i8> addrspace(2)* inreg, <32 x i8> addrspace(2)* inreg, i32 inreg, <2 x i32>) {
main_body:
  %5 = call float @llvm.SI.fs.constant(i32 0, i32 0, i32 %3)
  %6 = call float @llvm.SI.fs.interp(i32 0, i32 0, i32 %3, <2 x i32> %4)
  %7 = call float @llvm.SI.fs.interp(i32 1, i32 0, i32 %3, <2 x i32> %4)
  call void @llvm.SI.export(i32 15, i32 1, i32 1, i32 0, i32 1, float %5, float %6, float %7, float %7)
  ret void
}

; Thest that v_interp_p1 uses different source and destination registers
; on 16 bank LDS chips.

; 16BANK-LABEL: {{^}}v_interp_p1_bank16_bug:
; 16BANK-NOT: v_interp_p1_f32 [[DST:v[0-9]+]], [[DST]]

define amdgpu_ps void @v_interp_p1_bank16_bug([6 x <16 x i8>] addrspace(2)* byval, [17 x <16 x i8>] addrspace(2)* byval, [17 x <4 x i32>] addrspace(2)* byval, [34 x <8 x i32>] addrspace(2)* byval, float inreg, i32 inreg, <2 x i32>, <2 x i32>, <2 x i32>, <3 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, float, float, float, float, float, float, i32, float, float) {
main_body:
  %22 = call float @llvm.SI.fs.interp(i32 0, i32 0, i32 %5, <2 x i32> %7)
  %23 = call float @llvm.SI.fs.interp(i32 1, i32 0, i32 %5, <2 x i32> %7)
  %24 = call float @llvm.SI.fs.interp(i32 2, i32 0, i32 %5, <2 x i32> %7)
  %25 = call float @fabs(float %22)
  %26 = call float @fabs(float %23)
  %27 = call float @fabs(float %24)
  %28 = call i32 @llvm.SI.packf16(float %25, float %26)
  %29 = bitcast i32 %28 to float
  %30 = call i32 @llvm.SI.packf16(float %27, float 1.000000e+00)
  %31 = bitcast i32 %30 to float
  call void @llvm.SI.export(i32 15, i32 1, i32 1, i32 0, i32 1, float %29, float %31, float %29, float %31)
  ret void
}

; Function Attrs: readnone
declare float @fabs(float) #1

; Function Attrs: nounwind readnone
declare i32 @llvm.SI.packf16(float, float) #0

; Function Attrs: nounwind readnone
declare float @llvm.SI.fs.constant(i32, i32, i32) #0

; Function Attrs: nounwind readnone
declare float @llvm.SI.fs.interp(i32, i32, i32, <2 x i32>) #0

declare void @llvm.SI.export(i32, i32, i32, i32, i32, float, float, float, float)

attributes #0 = { nounwind readnone }
attributes #1 = { readnone }
R600/SI: Fix bug with v_interp_p1_f32 instructions on 16 bank lds chips The src and dst register cannot be the same on chips with 16 lds banks. llvm-svn: 238147 2015-05-26 00:15:54 +08:00			`;RUN: llc < %s -march=amdgcn -mcpu=verde -verify-machineinstrs \| FileCheck --check-prefix=GCN %s`
			`;RUN: llc < %s -march=amdgcn -mcpu=kabini -verify-machineinstrs \| FileCheck --check-prefix=GCN --check-prefix=16BANK %s`
AMDGPU/SI: Stoney has only 16 LDS banks Summary: This is a candidate for stable, along with all patches that add the "stoney" processor. Reviewers: tstellarAMD Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16485 llvm-svn: 258922 2016-01-27 19:19:45 +08:00			`;RUN: llc < %s -march=amdgcn -mcpu=stoney -verify-machineinstrs \| FileCheck --check-prefix=GCN --check-prefix=16BANK %s`
R600/SI: Fix bug with v_interp_p1_f32 instructions on 16 bank lds chips The src and dst register cannot be the same on chips with 16 lds banks. llvm-svn: 238147 2015-05-26 00:15:54 +08:00			`;RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs \| FileCheck --check-prefix=GCN %s`
R600/SI: Don't enable WQM for V_INTERP_* instructions v2 Doesn't seem necessary anymore. I think this was mostly compensating for not enabling WQM for texture sampling instructions. v2: Add test coverage Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 228373 2015-02-06 10:51:25 +08:00
R600/SI: Fix bug with v_interp_p1_f32 instructions on 16 bank lds chips The src and dst register cannot be the same on chips with 16 lds banks. llvm-svn: 238147 2015-05-26 00:15:54 +08:00			`;GCN-LABEL: {{^}}main:`
			`;GCN-NOT: s_wqm`
AMDGPU/SI: Don't mark VINTRP instructions as mayLoad Summary: These instructions technically do read from memory, but the memory is considered to be out of bounds for normal load/store instructions. shader-db stats: SGPRS: 1416075 -> 1413323 (-0.19 %) VGPRS: 867413 -> 863935 (-0.40 %) Spilled SGPRs: 1409 -> 1354 (-3.90 %) Spilled VGPRs: 63 -> 63 (0.00 %) Private memory VGPRs: 880 -> 880 (0.00 %) Scratch size: 2648 -> 2632 (-0.60 %) dwords per thread Code Size: 37889052 -> 37897340 (0.02 %) bytes LDS: 2147 -> 2147 (0.00 %) blocks Max Waves: 279243 -> 280369 (0.40 %) Wait states: 0 -> 0 (0.00 %) Reviewers: nhaehnle, mareko, arsenm Subscribers: kzhuravl, wdng, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27593 llvm-svn: 289219 2016-12-09 23:57:15 +08:00			`;GCN: s_mov_b32 m0`
			`;GCN-DAG: v_interp_mov_f32`
			`;GCN-DAG: v_interp_p1_f32`
			`;GCN-DAG: v_interp_p2_f32`
R600/SI: Don't enable WQM for V_INTERP_* instructions v2 Doesn't seem necessary anymore. I think this was mostly compensating for not enabling WQM for texture sampling instructions. v2: Add test coverage Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 228373 2015-02-06 10:51:25 +08:00
AMDGPU: Add a shader calling convention This makes it possible to distinguish between mesa shaders and other kernels even in the presence of compute shaders. Patch By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Differential Revision: http://reviews.llvm.org/D18559 llvm-svn: 265589 2016-04-07 03:40:20 +08:00			`define amdgpu_ps void @main(<16 x i8> addrspace(2)* inreg, <16 x i8> addrspace(2)* inreg, <32 x i8> addrspace(2)* inreg, i32 inreg, <2 x i32>) {`
R600/SI: Don't enable WQM for V_INTERP_* instructions v2 Doesn't seem necessary anymore. I think this was mostly compensating for not enabling WQM for texture sampling instructions. v2: Add test coverage Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 228373 2015-02-06 10:51:25 +08:00			`main_body:`
			`%5 = call float @llvm.SI.fs.constant(i32 0, i32 0, i32 %3)`
			`%6 = call float @llvm.SI.fs.interp(i32 0, i32 0, i32 %3, <2 x i32> %4)`
			`%7 = call float @llvm.SI.fs.interp(i32 1, i32 0, i32 %3, <2 x i32> %4)`
			`call void @llvm.SI.export(i32 15, i32 1, i32 1, i32 0, i32 1, float %5, float %6, float %7, float %7)`
			`ret void`
			`}`

R600/SI: Fix bug with v_interp_p1_f32 instructions on 16 bank lds chips The src and dst register cannot be the same on chips with 16 lds banks. llvm-svn: 238147 2015-05-26 00:15:54 +08:00			`; Thest that v_interp_p1 uses different source and destination registers`
			`; on 16 bank LDS chips.`

			`; 16BANK-LABEL: {{^}}v_interp_p1_bank16_bug:`
			`; 16BANK-NOT: v_interp_p1_f32 [[DST:v[0-9]+]], [[DST]]`

AMDGPU: Add a shader calling convention This makes it possible to distinguish between mesa shaders and other kernels even in the presence of compute shaders. Patch By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Differential Revision: http://reviews.llvm.org/D18559 llvm-svn: 265589 2016-04-07 03:40:20 +08:00			`define amdgpu_ps void @v_interp_p1_bank16_bug([6 x <16 x i8>] addrspace(2)* byval, [17 x <16 x i8>] addrspace(2)* byval, [17 x <4 x i32>] addrspace(2)* byval, [34 x <8 x i32>] addrspace(2)* byval, float inreg, i32 inreg, <2 x i32>, <2 x i32>, <2 x i32>, <3 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, float, float, float, float, float, float, i32, float, float) {`
R600/SI: Fix bug with v_interp_p1_f32 instructions on 16 bank lds chips The src and dst register cannot be the same on chips with 16 lds banks. llvm-svn: 238147 2015-05-26 00:15:54 +08:00			`main_body:`
			`%22 = call float @llvm.SI.fs.interp(i32 0, i32 0, i32 %5, <2 x i32> %7)`
			`%23 = call float @llvm.SI.fs.interp(i32 1, i32 0, i32 %5, <2 x i32> %7)`
			`%24 = call float @llvm.SI.fs.interp(i32 2, i32 0, i32 %5, <2 x i32> %7)`
			`%25 = call float @fabs(float %22)`
			`%26 = call float @fabs(float %23)`
			`%27 = call float @fabs(float %24)`
			`%28 = call i32 @llvm.SI.packf16(float %25, float %26)`
			`%29 = bitcast i32 %28 to float`
			`%30 = call i32 @llvm.SI.packf16(float %27, float 1.000000e+00)`
			`%31 = bitcast i32 %30 to float`
			`call void @llvm.SI.export(i32 15, i32 1, i32 1, i32 0, i32 1, float %29, float %31, float %29, float %31)`
			`ret void`
			`}`

			`; Function Attrs: readnone`
AMDGPU: Add a shader calling convention This makes it possible to distinguish between mesa shaders and other kernels even in the presence of compute shaders. Patch By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Differential Revision: http://reviews.llvm.org/D18559 llvm-svn: 265589 2016-04-07 03:40:20 +08:00			`declare float @fabs(float) #1`
R600/SI: Fix bug with v_interp_p1_f32 instructions on 16 bank lds chips The src and dst register cannot be the same on chips with 16 lds banks. llvm-svn: 238147 2015-05-26 00:15:54 +08:00
			`; Function Attrs: nounwind readnone`
AMDGPU: Add a shader calling convention This makes it possible to distinguish between mesa shaders and other kernels even in the presence of compute shaders. Patch By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Differential Revision: http://reviews.llvm.org/D18559 llvm-svn: 265589 2016-04-07 03:40:20 +08:00			`declare i32 @llvm.SI.packf16(float, float) #0`
R600/SI: Don't enable WQM for V_INTERP_* instructions v2 Doesn't seem necessary anymore. I think this was mostly compensating for not enabling WQM for texture sampling instructions. v2: Add test coverage Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 228373 2015-02-06 10:51:25 +08:00
			`; Function Attrs: nounwind readnone`
AMDGPU: Add a shader calling convention This makes it possible to distinguish between mesa shaders and other kernels even in the presence of compute shaders. Patch By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Differential Revision: http://reviews.llvm.org/D18559 llvm-svn: 265589 2016-04-07 03:40:20 +08:00			`declare float @llvm.SI.fs.constant(i32, i32, i32) #0`
R600/SI: Don't enable WQM for V_INTERP_* instructions v2 Doesn't seem necessary anymore. I think this was mostly compensating for not enabling WQM for texture sampling instructions. v2: Add test coverage Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 228373 2015-02-06 10:51:25 +08:00
			`; Function Attrs: nounwind readnone`
AMDGPU: Add a shader calling convention This makes it possible to distinguish between mesa shaders and other kernels even in the presence of compute shaders. Patch By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Differential Revision: http://reviews.llvm.org/D18559 llvm-svn: 265589 2016-04-07 03:40:20 +08:00			`declare float @llvm.SI.fs.interp(i32, i32, i32, <2 x i32>) #0`
R600/SI: Don't enable WQM for V_INTERP_* instructions v2 Doesn't seem necessary anymore. I think this was mostly compensating for not enabling WQM for texture sampling instructions. v2: Add test coverage Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 228373 2015-02-06 10:51:25 +08:00
			`declare void @llvm.SI.export(i32, i32, i32, i32, i32, float, float, float, float)`

AMDGPU: Add a shader calling convention This makes it possible to distinguish between mesa shaders and other kernels even in the presence of compute shaders. Patch By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Differential Revision: http://reviews.llvm.org/D18559 llvm-svn: 265589 2016-04-07 03:40:20 +08:00			`attributes #0 = { nounwind readnone }`
			`attributes #1 = { readnone }`