llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.gather4.d...

; RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs | FileCheck -check-prefixes=GCN,UNPACKED,GFX89 %s
; RUN: llc < %s -march=amdgcn -mcpu=gfx810 -verify-machineinstrs | FileCheck -check-prefixes=GCN,PACKED,GFX81,GFX89 %s
; RUN: llc < %s -march=amdgcn -mcpu=gfx900 -verify-machineinstrs | FileCheck -check-prefixes=GCN,PACKED,GFX9,GFX89 %s
; RUN: llc < %s -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs | FileCheck -check-prefixes=GCN,GFX10 %s

; GCN-LABEL: {{^}}image_gather4_b_2d_v4f16:
; UNPACKED: image_gather4_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x4 d16{{$}}
; PACKED: image_gather4_b v[0:1], v[0:2], s[0:7], s[8:11] dmask:0x4 d16{{$}}
; GFX10: image_gather4_b v[0:1], v[0:2], s[0:7], s[8:11] dmask:0x4 dim:SQ_RSRC_IMG_2D d16{{$}}
define amdgpu_ps <2 x float> @image_gather4_b_2d_v4f16(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %s, float %t) {
main_body:
  %tex = call <4 x half> @llvm.amdgcn.image.gather4.b.2d.v4f16.f32.f32(i32 4, float %bias, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
  %r = bitcast <4 x half> %tex to <2 x float>
  ret <2 x float> %r
}

declare <4 x half> @llvm.amdgcn.image.gather4.b.2d.v4f16.f32.f32(i32, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1

attributes #0 = { nounwind }
attributes #1 = { nounwind readonly }
attributes #2 = { nounwind readnone }
[AMDGPU] gfx1010 MIMG implementation Differential Revision: https://reviews.llvm.org/D61339 llvm-svn: 359698 2019-05-02 00:32:58 +08:00			`; RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs \| FileCheck -check-prefixes=GCN,UNPACKED,GFX89 %s`
			`; RUN: llc < %s -march=amdgcn -mcpu=gfx810 -verify-machineinstrs \| FileCheck -check-prefixes=GCN,PACKED,GFX81,GFX89 %s`
			`; RUN: llc < %s -march=amdgcn -mcpu=gfx900 -verify-machineinstrs \| FileCheck -check-prefixes=GCN,PACKED,GFX9,GFX89 %s`
			`; RUN: llc < %s -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs \| FileCheck -check-prefixes=GCN,GFX10 %s`
AMDGPU: Dimension-aware image intrinsics Summary: These new image intrinsics contain the texture type as part of their name and have each component of the address/coordinate as individual parameters. This is a preparatory step for implementing the A16 feature, where coordinates are passed as half-floats or -ints, but the Z compare value and texel offsets are still full dwords, making it difficult or impossible to distinguish between A16 on or off in the old-style intrinsics. Additionally, these intrinsics pass the 'texfailpolicy' and 'cachectrl' as i32 bit fields to reduce operand clutter and allow for future extensibility. v2: - gather4 supports 2darray images - fix a bug with 1D images on SI Change-Id: I099f309e0a394082a5901ea196c3967afb867f04 Reviewers: arsenm, rampitec, b-sumner Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D44939 llvm-svn: 329166 2018-04-04 18:58:54 +08:00
			`; GCN-LABEL: {{^}}image_gather4_b_2d_v4f16:`
[AMDGPU] Use v3f32 type in image instructions This should lower the amount of used registers for gfx9. I updated some of the changed tests with the update script because changing them by hand is tedious. Differential Revision: https://reviews.llvm.org/D73884 2020-01-29 21:04:56 +08:00			`; UNPACKED: image_gather4_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x4 d16{{$}}`
			`; PACKED: image_gather4_b v[0:1], v[0:2], s[0:7], s[8:11] dmask:0x4 d16{{$}}`
[AMDGPU] gfx1010 MIMG implementation Differential Revision: https://reviews.llvm.org/D61339 llvm-svn: 359698 2019-05-02 00:32:58 +08:00			`; GFX10: image_gather4_b v[0:1], v[0:2], s[0:7], s[8:11] dmask:0x4 dim:SQ_RSRC_IMG_2D d16{{$}}`
AMDGPU: Dimension-aware image intrinsics Summary: These new image intrinsics contain the texture type as part of their name and have each component of the address/coordinate as individual parameters. This is a preparatory step for implementing the A16 feature, where coordinates are passed as half-floats or -ints, but the Z compare value and texel offsets are still full dwords, making it difficult or impossible to distinguish between A16 on or off in the old-style intrinsics. Additionally, these intrinsics pass the 'texfailpolicy' and 'cachectrl' as i32 bit fields to reduce operand clutter and allow for future extensibility. v2: - gather4 supports 2darray images - fix a bug with 1D images on SI Change-Id: I099f309e0a394082a5901ea196c3967afb867f04 Reviewers: arsenm, rampitec, b-sumner Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D44939 llvm-svn: 329166 2018-04-04 18:58:54 +08:00			`define amdgpu_ps <2 x float> @image_gather4_b_2d_v4f16(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %s, float %t) {`
			`main_body:`
			`%tex = call <4 x half> @llvm.amdgcn.image.gather4.b.2d.v4f16.f32.f32(i32 4, float %bias, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)`
			`%r = bitcast <4 x half> %tex to <2 x float>`
			`ret <2 x float> %r`
			`}`

			`declare <4 x half> @llvm.amdgcn.image.gather4.b.2d.v4f16.f32.f32(i32, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1`

			`attributes #0 = { nounwind }`
			`attributes #1 = { nounwind readonly }`
			`attributes #2 = { nounwind readnone }`