llvm-project/llvm/test/CodeGen/AMDGPU/waitcnt-preexisting.mir

# RUN: llc -march=amdgcn -mcpu=hawaii -verify-machineinstrs -run-pass si-insert-waitcnts -o - %s | FileCheck -check-prefixes=GCN %s

# GCN-LABEL: name: test{{$}}
# GCN: S_WAITCNT -16257
# GCN: DS_READ2_B32
# GCN: DS_READ2_B32
# GCN: S_WAITCNT 383{{$}}
# GCN-NEXT: $vgpr1 = V_OR_B32_e32 1, killed $vgpr1, implicit $exec
# GCN-NEXT: $vgpr1 = V_MAX_U32_e32 killed $vgpr0, killed $vgpr1, implicit $exec
# GCN-NEXT: S_WAITCNT 127{{$}}
# GCN-NEXT: $vgpr1 = V_MAX_U32_e32 killed $vgpr2, killed $vgpr1, implicit $exec
--- |
  define amdgpu_cs void @test() {
    ret void
  }
...
---
name:            test
body:             |
  bb.0:
    liveins: $sgpr0, $sgpr1, $vgpr0

    renamable $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7 = S_LOAD_DWORDX8_IMM renamable $sgpr0_sgpr1, 480, 0, 0
    renamable $vgpr13 = V_LSHLREV_B32_e32 2, killed $vgpr0, implicit $exec
    S_WAITCNT -16257
    renamable $vgpr0_vgpr1 = DS_READ2_B32 renamable $vgpr13, 0, 1, 0, implicit $m0, implicit $exec
    renamable $vgpr2_vgpr3 = DS_READ2_B32 renamable $vgpr13, 2, 3, 0, implicit $m0, implicit $exec
    renamable $vgpr1 = V_OR_B32_e32 1, killed $vgpr1, implicit $exec
    renamable $vgpr1 = V_MAX_U32_e32 killed $vgpr0, killed $vgpr1, implicit $exec
    renamable $vgpr1 = V_MAX_U32_e32 killed $vgpr2, killed $vgpr1, implicit $exec
    renamable $vgpr1 = V_MAX_U32_e32 killed $vgpr3, killed $vgpr1, implicit $exec
    $vgpr0 = V_MOV_B32_e32 $vgpr1, implicit $exec, implicit $exec
    $vgpr2 = V_MOV_B32_e32 $vgpr1, implicit $exec, implicit $exec
    $vgpr3 = V_MOV_B32_e32 $vgpr1, implicit $exec, implicit $exec
    IMAGE_STORE_V4_V2 killed renamable $vgpr0_vgpr1_vgpr2_vgpr3, killed renamable $vgpr0_vgpr1, killed renamable $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, 15, -1, 1, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 16)
    S_ENDPGM 0
...
AMDGPU: Fix using unencodable instructions in tests There are a number of MIR tests using instructions on subtargets where they don't really exist. These are some of the easy cases that don't require splitting up test functions. 2020-06-05 03:31:28 +08:00			`# RUN: llc -march=amdgcn -mcpu=hawaii -verify-machineinstrs -run-pass si-insert-waitcnts -o - %s \| FileCheck -check-prefixes=GCN %s`
AMDGPU: Add testcase to demonstrate a condition with pre-existing waitcnt Relevant for https://reviews.llvm.org/D54226. llvm-svn: 346501 2018-11-09 23:13:12 +08:00
			`# GCN-LABEL: name: test{{$}}`
			`# GCN: S_WAITCNT -16257`
AMDGPU: Fix some more incorrect check lines 2020-02-26 17:46:07 +08:00			`# GCN: DS_READ2_B32`
			`# GCN: DS_READ2_B32`
AMDGPU/InsertWaitcnts: Untangle some semi-global state Summary: Reduce the statefulness of the algorithm in two ways: 1. More clearly split generateWaitcntInstBefore into two phases: the first one which determines the required wait, if any, without changing the ScoreBrackets, and the second one which actually inserts the wait and updates the brackets. 2. Communicate pre-existing s_waitcnt instructions using an argument to generateWaitcntInstBefore instead of through the ScoreBrackets. To simplify these changes, a Waitcnt structure is introduced which carries the counts of an s_waitcnt instruction in decoded form. There are some functional changes: 1. The FIXME for the VCCZ bug workaround was implemented: we only wait for SMEM instructions as required instead of waiting on all counters. 2. We now properly track pre-existing waitcnt's in all cases, which leads to less conservative waitcnts being emitted in some cases. s_load_dword ... s_waitcnt lgkmcnt(0) <-- pre-existing wait count ds_read_b32 v0, ... ds_read_b32 v1, ... s_waitcnt lgkmcnt(0) <-- this is too conservative use(v0) more code use(v1) This increases code size a bit, but the reduced latency should still be a win in basically all cases. The worst code size regressions in my shader-db are: WORST REGRESSIONS - Code Size Before After Delta Percentage 1724 1736 12 0.70 % shaders/private/f1-2015/1334.shader_test [0] 2276 2284 8 0.35 % shaders/private/f1-2015/1306.shader_test [0] 4632 4640 8 0.17 % shaders/private/ue4_elemental/62.shader_test [0] 2376 2384 8 0.34 % shaders/private/f1-2015/1308.shader_test [0] 3284 3292 8 0.24 % shaders/private/talos_principle/1955.shader_test [0] Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam Differential Revision: https://reviews.llvm.org/D54226 llvm-svn: 347848 2018-11-29 19:06:06 +08:00			`# GCN: S_WAITCNT 383{{$}}`
AMDGPU: Fix using unencodable instructions in tests There are a number of MIR tests using instructions on subtargets where they don't really exist. These are some of the easy cases that don't require splitting up test functions. 2020-06-05 03:31:28 +08:00			`# GCN-NEXT: $vgpr1 = V_OR_B32_e32 1, killed $vgpr1, implicit $exec`
AMDGPU: Add testcase to demonstrate a condition with pre-existing waitcnt Relevant for https://reviews.llvm.org/D54226. llvm-svn: 346501 2018-11-09 23:13:12 +08:00			`# GCN-NEXT: $vgpr1 = V_MAX_U32_e32 killed $vgpr0, killed $vgpr1, implicit $exec`
AMDGPU/InsertWaitcnts: Untangle some semi-global state Summary: Reduce the statefulness of the algorithm in two ways: 1. More clearly split generateWaitcntInstBefore into two phases: the first one which determines the required wait, if any, without changing the ScoreBrackets, and the second one which actually inserts the wait and updates the brackets. 2. Communicate pre-existing s_waitcnt instructions using an argument to generateWaitcntInstBefore instead of through the ScoreBrackets. To simplify these changes, a Waitcnt structure is introduced which carries the counts of an s_waitcnt instruction in decoded form. There are some functional changes: 1. The FIXME for the VCCZ bug workaround was implemented: we only wait for SMEM instructions as required instead of waiting on all counters. 2. We now properly track pre-existing waitcnt's in all cases, which leads to less conservative waitcnts being emitted in some cases. s_load_dword ... s_waitcnt lgkmcnt(0) <-- pre-existing wait count ds_read_b32 v0, ... ds_read_b32 v1, ... s_waitcnt lgkmcnt(0) <-- this is too conservative use(v0) more code use(v1) This increases code size a bit, but the reduced latency should still be a win in basically all cases. The worst code size regressions in my shader-db are: WORST REGRESSIONS - Code Size Before After Delta Percentage 1724 1736 12 0.70 % shaders/private/f1-2015/1334.shader_test [0] 2276 2284 8 0.35 % shaders/private/f1-2015/1306.shader_test [0] 4632 4640 8 0.17 % shaders/private/ue4_elemental/62.shader_test [0] 2376 2384 8 0.34 % shaders/private/f1-2015/1308.shader_test [0] 3284 3292 8 0.24 % shaders/private/talos_principle/1955.shader_test [0] Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam Differential Revision: https://reviews.llvm.org/D54226 llvm-svn: 347848 2018-11-29 19:06:06 +08:00			`# GCN-NEXT: S_WAITCNT 127{{$}}`
AMDGPU: Add testcase to demonstrate a condition with pre-existing waitcnt Relevant for https://reviews.llvm.org/D54226. llvm-svn: 346501 2018-11-09 23:13:12 +08:00			`# GCN-NEXT: $vgpr1 = V_MAX_U32_e32 killed $vgpr2, killed $vgpr1, implicit $exec`
			`--- \|`
			`define amdgpu_cs void @test() {`
			`ret void`
			`}`
			`...`
			`---`
			`name: test`
			`body: \|`
			`bb.0:`
			`liveins: $sgpr0, $sgpr1, $vgpr0`

[AMDGPU] gfx1010 VMEM and SMEM implementation Differential Revision: https://reviews.llvm.org/D61330 llvm-svn: 359621 2019-05-01 06:08:23 +08:00			`renamable $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7 = S_LOAD_DWORDX8_IMM renamable $sgpr0_sgpr1, 480, 0, 0`
AMDGPU: Add testcase to demonstrate a condition with pre-existing waitcnt Relevant for https://reviews.llvm.org/D54226. llvm-svn: 346501 2018-11-09 23:13:12 +08:00			`renamable $vgpr13 = V_LSHLREV_B32_e32 2, killed $vgpr0, implicit $exec`
			`S_WAITCNT -16257`
			`renamable $vgpr0_vgpr1 = DS_READ2_B32 renamable $vgpr13, 0, 1, 0, implicit $m0, implicit $exec`
			`renamable $vgpr2_vgpr3 = DS_READ2_B32 renamable $vgpr13, 2, 3, 0, implicit $m0, implicit $exec`
AMDGPU: Fix using unencodable instructions in tests There are a number of MIR tests using instructions on subtargets where they don't really exist. These are some of the easy cases that don't require splitting up test functions. 2020-06-05 03:31:28 +08:00			`renamable $vgpr1 = V_OR_B32_e32 1, killed $vgpr1, implicit $exec`
AMDGPU: Add testcase to demonstrate a condition with pre-existing waitcnt Relevant for https://reviews.llvm.org/D54226. llvm-svn: 346501 2018-11-09 23:13:12 +08:00			`renamable $vgpr1 = V_MAX_U32_e32 killed $vgpr0, killed $vgpr1, implicit $exec`
			`renamable $vgpr1 = V_MAX_U32_e32 killed $vgpr2, killed $vgpr1, implicit $exec`
			`renamable $vgpr1 = V_MAX_U32_e32 killed $vgpr3, killed $vgpr1, implicit $exec`
			`$vgpr0 = V_MOV_B32_e32 $vgpr1, implicit $exec, implicit $exec`
			`$vgpr2 = V_MOV_B32_e32 $vgpr1, implicit $exec, implicit $exec`
			`$vgpr3 = V_MOV_B32_e32 $vgpr1, implicit $exec, implicit $exec`
AMDGPU: Move MIMG MMO check to verifier 2020-05-29 08:38:16 +08:00			`IMAGE_STORE_V4_V2 killed renamable $vgpr0_vgpr1_vgpr2_vgpr3, killed renamable $vgpr0_vgpr1, killed renamable $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, 15, -1, 1, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 16)`
[AMDGPU] Add support for immediate operand for S_ENDPGM Summary: Add support for immediate operand in S_ENDPGM Change-Id: I0c56a076a10980f719fb2a8f16407e9c301013f6 Reviewers: alexshap Subscribers: qcolombet, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, eraman, arphaman, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59213 llvm-svn: 355902 2019-03-12 17:52:58 +08:00			`S_ENDPGM 0`
AMDGPU: Add testcase to demonstrate a condition with pre-existing waitcnt Relevant for https://reviews.llvm.org/D54226. llvm-svn: 346501 2018-11-09 23:13:12 +08:00			`...`