llvm-project/llvm/test/CodeGen/AMDGPU/fold-imm-copy.mir

# RUN: llc -march=amdgcn -run-pass si-fold-operands -verify-machineinstrs %s -o - | FileCheck -check-prefix=GCN %s

# GCN-LABEL:       name: fold-imm-copy
# GCN:             [[SREG:%[0-9+]]]:sreg_32_xm0 = S_MOV_B32 65535
# GCN:             V_AND_B32_e32 [[SREG]]

---
name: fold-imm-copy
tracksRegLiveness: true
body:             |
  bb.0:
    liveins: $vgpr0, $sgpr0_sgpr1
    %0:vgpr_32 = COPY $vgpr0
    %1:sgpr_64 = COPY $sgpr0_sgpr1
    %2:sgpr_128 = S_LOAD_DWORDX4_IMM %1, 9, 0, 0
    %3:sreg_32_xm0 = S_MOV_B32 2
    %4:vgpr_32 = V_LSHLREV_B32_e64 killed %3, %0, implicit $exec
    %5:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
    %6:vreg_64 = REG_SEQUENCE killed %4, %subreg.sub0, killed %5, %subreg.sub1
    %7:vgpr_32 = BUFFER_LOAD_DWORD_ADDR64 %6, %2, 0, 4, 0, 0, 0, 0, 0, implicit $exec
    %8:sreg_32_xm0 = S_MOV_B32 65535
    %9:vgpr_32 = COPY %8
    %10:vgpr_32 = V_AND_B32_e32 %7, %9, implicit $exec
...

---
# GCN-LABEL:       name: no_extra_fold_on_same_opnd
# The first XOR needs commuting to fold that immediate operand.
# GCN:             V_XOR_B32_e32 {{.*}} 0, %1
# GCN:             V_XOR_B32_e32 %2, %4.sub0
name: no_extra_fold_on_same_opnd
tracksRegLiveness: true
body:             |
  bb.0:
    %0:vgpr_32  = IMPLICIT_DEF
    %1:vgpr_32  = IMPLICIT_DEF
    %2:vgpr_32  = IMPLICIT_DEF
    %3:vgpr_32  = V_MOV_B32_e32 0, implicit $exec
    %4:vreg_64  = REG_SEQUENCE killed %0, %subreg.sub0, killed %3, %subreg.sub1
    %5:vgpr_32  = V_XOR_B32_e32 %1, %4.sub1, implicit $exec
    %6:vgpr_32  = V_XOR_B32_e32 %2, %4.sub0, implicit $exec
...
AMDGPU: Fix test verification This should run the verifier, and needs to enable trackRegLiveness. llvm-svn: 359882 2019-05-03 21:42:55 +08:00			`# RUN: llc -march=amdgcn -run-pass si-fold-operands -verify-machineinstrs %s -o - \| FileCheck -check-prefix=GCN %s`
[AMDGPU] Preliminary patch for divergence driven instruction selection. Operands Folding 1. Reviewers: rampitec Differential revision: https://reviews/llvm/org/D51316 llvm-svn: 341068 2018-08-30 21:55:04 +08:00
			`# GCN-LABEL: name: fold-imm-copy`
			`# GCN: [[SREG:%[0-9+]]]:sreg_32_xm0 = S_MOV_B32 65535`
			`# GCN: V_AND_B32_e32 [[SREG]]`

AMDGPU: Fix test verification This should run the verifier, and needs to enable trackRegLiveness. llvm-svn: 359882 2019-05-03 21:42:55 +08:00			`---`
[AMDGPU] Preliminary patch for divergence driven instruction selection. Operands Folding 1. Reviewers: rampitec Differential revision: https://reviews/llvm/org/D51316 llvm-svn: 341068 2018-08-30 21:55:04 +08:00			`name: fold-imm-copy`
AMDGPU: Fix test verification This should run the verifier, and needs to enable trackRegLiveness. llvm-svn: 359882 2019-05-03 21:42:55 +08:00			`tracksRegLiveness: true`
[AMDGPU] Preliminary patch for divergence driven instruction selection. Operands Folding 1. Reviewers: rampitec Differential revision: https://reviews/llvm/org/D51316 llvm-svn: 341068 2018-08-30 21:55:04 +08:00			`body: \|`
			`bb.0:`
			`liveins: $vgpr0, $sgpr0_sgpr1`
			`%0:vgpr_32 = COPY $vgpr0`
			`%1:sgpr_64 = COPY $sgpr0_sgpr1`
AMDGPU: Use SGPR_128 instead of SReg_128 for vregs SGPR_128 only includes the real allocatable SGPRs, and SReg_128 adds the additional non-allocatable TTMP registers. There's no point in allocating SReg_128 vregs. This shrinks the size of the classes regalloc needs to consider, which is usually good. llvm-svn: 374284 2019-10-10 15:11:33 +08:00			`%2:sgpr_128 = S_LOAD_DWORDX4_IMM %1, 9, 0, 0`
[AMDGPU] Preliminary patch for divergence driven instruction selection. Operands Folding 1. Reviewers: rampitec Differential revision: https://reviews/llvm/org/D51316 llvm-svn: 341068 2018-08-30 21:55:04 +08:00			`%3:sreg_32_xm0 = S_MOV_B32 2`
			`%4:vgpr_32 = V_LSHLREV_B32_e64 killed %3, %0, implicit $exec`
			`%5:vgpr_32 = V_MOV_B32_e32 0, implicit $exec`
			`%6:vreg_64 = REG_SEQUENCE killed %4, %subreg.sub0, killed %5, %subreg.sub1`
[AMDGPU] Extend buffer intrinsics with swizzling Summary: Extend cachepolicy operand in the new VMEM buffer intrinsics to supply information whether the buffer data is swizzled. Also, propagate this information to MIR. Intrinsics updated: int_amdgcn_raw_buffer_load int_amdgcn_raw_buffer_load_format int_amdgcn_raw_buffer_store int_amdgcn_raw_buffer_store_format int_amdgcn_raw_tbuffer_load int_amdgcn_raw_tbuffer_store int_amdgcn_struct_buffer_load int_amdgcn_struct_buffer_load_format int_amdgcn_struct_buffer_store int_amdgcn_struct_buffer_store_format int_amdgcn_struct_tbuffer_load int_amdgcn_struct_tbuffer_store Furthermore, disable merging of VMEM buffer instructions in SI Load/Store optimizer, if the "swizzled" bit on the instruction is on. The default value of the bit is 0, meaning that data in buffer is linear and buffer instructions can be merged. There is no difference in the generated code with this commit. However, in the future it will be expected that front-ends use buffer intrinsics with correct "swizzled" bit set. Reviewers: arsenm, nhaehnle, tpr Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, arphaman, jfb, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68200 llvm-svn: 373491 2019-10-03 01:22:36 +08:00			`%7:vgpr_32 = BUFFER_LOAD_DWORD_ADDR64 %6, %2, 0, 4, 0, 0, 0, 0, 0, implicit $exec`
[AMDGPU] Preliminary patch for divergence driven instruction selection. Operands Folding 1. Reviewers: rampitec Differential revision: https://reviews/llvm/org/D51316 llvm-svn: 341068 2018-08-30 21:55:04 +08:00			`%8:sreg_32_xm0 = S_MOV_B32 65535`
			`%9:vgpr_32 = COPY %8`
			`%10:vgpr_32 = V_AND_B32_e32 %7, %9, implicit $exec`
			`...`
[AMDGPU] Skip additional folding on the same operand. Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69355 2019-10-24 03:19:06 +08:00
			`---`
			`# GCN-LABEL: name: no_extra_fold_on_same_opnd`
			`# The first XOR needs commuting to fold that immediate operand.`
			`# GCN: V_XOR_B32_e32 {{.*}} 0, %1`
			`# GCN: V_XOR_B32_e32 %2, %4.sub0`
			`name: no_extra_fold_on_same_opnd`
			`tracksRegLiveness: true`
			`body: \|`
			`bb.0:`
			`%0:vgpr_32 = IMPLICIT_DEF`
			`%1:vgpr_32 = IMPLICIT_DEF`
			`%2:vgpr_32 = IMPLICIT_DEF`
			`%3:vgpr_32 = V_MOV_B32_e32 0, implicit $exec`
			`%4:vreg_64 = REG_SEQUENCE killed %0, %subreg.sub0, killed %3, %subreg.sub1`
			`%5:vgpr_32 = V_XOR_B32_e32 %1, %4.sub1, implicit $exec`
			`%6:vgpr_32 = V_XOR_B32_e32 %2, %4.sub0, implicit $exec`
			`...`