llvm-project/llvm/test/CodeGen/AMDGPU/insert-subvector-unused-scr...

; RUN: llc -mtriple amdgcn-amd-- -mcpu=bonaire -verify-machineinstrs < %s | FileCheck -check-prefix=GCN %s

; Before the fix that this test was committed with, this code would leave
; an unused stack slot, causing ScratchSize to be non-zero.

; GCN-LABEL: store_v3i32:
; GCN:        ds_read_b32
; GCN:        ds_read_b64
; GCN:        ds_write_b32
; GCN:        ds_write_b64
; GCN: ScratchSize: 0
define amdgpu_kernel void @store_v3i32(<3 x i32> addrspace(3)* %out, <3 x i32> %a) nounwind {
  %val = load <3 x i32>, <3 x i32> addrspace(3)* %out
  %val.1 = add <3 x i32> %a, %val
  store <3 x i32> %val.1, <3 x i32> addrspace(3)* %out, align 16
  ret void
}

; GCN-LABEL: store_v5i32:
; GCN:        ds_read_b32
; GCN:        ds_read2_b64
; GCN:        ds_write_b32
; GCN:        ds_write2_b64
; GCN: ScratchSize: 0
define amdgpu_kernel void @store_v5i32(<5 x i32> addrspace(3)* %out, <5 x i32> %a) nounwind {
  %val = load <5 x i32>, <5 x i32> addrspace(3)* %out
  %val.1 = add <5 x i32> %a, %val
  store <5 x i32> %val.1, <5 x i32> addrspace(3)* %out, align 16
  ret void
}
[AMDGPU] Custom lower INSERT_SUBVECTOR v3, v4, v5, v8 Summary: Since the changes to introduce vec3 and vec5, INSERT_VECTOR for these sizes has been marked "expand", which made LegalizeDAG lower it to loads and stores via a stack slot. The code got optimized a bit later, but the now-unused stack slot was never deleted. This commit avoids that problem by custom lowering INSERT_SUBVECTOR into an EXTRACT_VECTOR_ELT and INSERT_VECTOR_ELT for each element in the subvector to insert. V2: Addressed review comments re test. Differential Revision: https://reviews.llvm.org/D63160 Change-Id: I9e3c13e36f68cfa3431bb9814851cc1f673274e1 llvm-svn: 365148 2019-07-05 01:38:24 +08:00			`; RUN: llc -mtriple amdgcn-amd-- -mcpu=bonaire -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s`

			`; Before the fix that this test was committed with, this code would leave`
			`; an unused stack slot, causing ScratchSize to be non-zero.`

			`; GCN-LABEL: store_v3i32:`
			`; GCN: ds_read_b32`
[MachineScheduler] Reduce reordering due to mem op clustering Summary: Mem op clustering adds a weak edge in the DAG between two loads or stores that should be clustered, but the direction of this edge is pretty arbitrary (it depends on the sort order of MemOpInfo, which represents the operands of a load or store). This often means that two loads or stores will get reordered even if they would naturally have been scheduled together anyway, which leads to test case churn and goes against the scheduler's "do no harm" philosophy. The fix makes sure that the direction of the edge always matches the original code order of the instructions. Reviewers: atrick, MatzeB, arsenm, rampitec, t.p.northover Subscribers: jvesely, wdng, nhaehnle, kristof.beyls, hiraditya, javed.absar, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72706 2020-01-14 23:40:52 +08:00			`; GCN: ds_read_b64`
[AMDGPU] Custom lower INSERT_SUBVECTOR v3, v4, v5, v8 Summary: Since the changes to introduce vec3 and vec5, INSERT_VECTOR for these sizes has been marked "expand", which made LegalizeDAG lower it to loads and stores via a stack slot. The code got optimized a bit later, but the now-unused stack slot was never deleted. This commit avoids that problem by custom lowering INSERT_SUBVECTOR into an EXTRACT_VECTOR_ELT and INSERT_VECTOR_ELT for each element in the subvector to insert. V2: Addressed review comments re test. Differential Revision: https://reviews.llvm.org/D63160 Change-Id: I9e3c13e36f68cfa3431bb9814851cc1f673274e1 llvm-svn: 365148 2019-07-05 01:38:24 +08:00			`; GCN: ds_write_b32`
			`; GCN: ds_write_b64`
			`; GCN: ScratchSize: 0`
			`define amdgpu_kernel void @store_v3i32(<3 x i32> addrspace(3)* %out, <3 x i32> %a) nounwind {`
			`%val = load <3 x i32>, <3 x i32> addrspace(3)* %out`
			`%val.1 = add <3 x i32> %a, %val`
			`store <3 x i32> %val.1, <3 x i32> addrspace(3)* %out, align 16`
			`ret void`
			`}`

			`; GCN-LABEL: store_v5i32:`
			`; GCN: ds_read_b32`
[MachineScheduler] Reduce reordering due to mem op clustering Summary: Mem op clustering adds a weak edge in the DAG between two loads or stores that should be clustered, but the direction of this edge is pretty arbitrary (it depends on the sort order of MemOpInfo, which represents the operands of a load or store). This often means that two loads or stores will get reordered even if they would naturally have been scheduled together anyway, which leads to test case churn and goes against the scheduler's "do no harm" philosophy. The fix makes sure that the direction of the edge always matches the original code order of the instructions. Reviewers: atrick, MatzeB, arsenm, rampitec, t.p.northover Subscribers: jvesely, wdng, nhaehnle, kristof.beyls, hiraditya, javed.absar, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72706 2020-01-14 23:40:52 +08:00			`; GCN: ds_read2_b64`
[AMDGPU] Custom lower INSERT_SUBVECTOR v3, v4, v5, v8 Summary: Since the changes to introduce vec3 and vec5, INSERT_VECTOR for these sizes has been marked "expand", which made LegalizeDAG lower it to loads and stores via a stack slot. The code got optimized a bit later, but the now-unused stack slot was never deleted. This commit avoids that problem by custom lowering INSERT_SUBVECTOR into an EXTRACT_VECTOR_ELT and INSERT_VECTOR_ELT for each element in the subvector to insert. V2: Addressed review comments re test. Differential Revision: https://reviews.llvm.org/D63160 Change-Id: I9e3c13e36f68cfa3431bb9814851cc1f673274e1 llvm-svn: 365148 2019-07-05 01:38:24 +08:00			`; GCN: ds_write_b32`
			`; GCN: ds_write2_b64`
			`; GCN: ScratchSize: 0`
			`define amdgpu_kernel void @store_v5i32(<5 x i32> addrspace(3)* %out, <5 x i32> %a) nounwind {`
			`%val = load <5 x i32>, <5 x i32> addrspace(3)* %out`
			`%val.1 = add <5 x i32> %a, %val`
			`store <5 x i32> %val.1, <5 x i32> addrspace(3)* %out, align 16`
			`ret void`
			`}`