llvm-project/llvm/test/CodeGen/AMDGPU/subreg-coalescer-undef-use.ll

; RUN: llc -march=amdgcn -mcpu=SI -o - %s | FileCheck %s
; Don't crash when the use of an undefined value is only detected by the
; register coalescer because it is hidden with subregister insert/extract.
target triple="amdgcn--"

; CHECK-LABEL: foobar:
; CHECK: s_load_dword s2, s[0:1], 0x9
; CHECK-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xb
; CHECK-NEXT: s_waitcnt lgkmcnt(0)
; CHECK: v_mbcnt_lo_u32_b32_e64
; CHECK-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; CHECK-NEXT: s_and_saveexec_b64 s[2:3], vcc
; CHECK-NEXT: s_xor_b64 s[2:3], exec, s[2:3]
; BB0_1:
; CHECK: s_load_dword s0, s[0:1], 0xa
; CHECK-NEXT: s_waitcnt lgkmcnt(0)
; BB0_2:
; CHECK: s_or_b64 exec, exec, s[2:3]
; CHECK-NEXT: s_mov_b32 s7, 0xf000
; CHECK-NEXT: s_mov_b32 s6, -1
; CHECK-NEXT: buffer_store_dword v1, off, s[4:7], 0
; CHECK-NEXT: s_endpgm
define void @foobar(float %a0, float %a1, float addrspace(1)* %out) nounwind {
entry:
  %v0 = insertelement <4 x float> undef, float %a0, i32 0
  %tid = call i32 @llvm.amdgcn.mbcnt.lo(i32 -1, i32 0) #0
  %cnd = icmp eq i32 %tid, 0
  br i1 %cnd, label %ift, label %ife

ift:
  %v1 = insertelement <4 x float> undef, float %a1, i32 0
  br label %ife

ife:
  %val = phi <4 x float> [ %v1, %ift ], [ %v0, %entry ]
  %v2 = extractelement <4 x float> %val, i32 1
  store float %v2, float addrspace(1)* %out, align 4
  ret void
}

declare i32 @llvm.amdgcn.mbcnt.lo(i32, i32) #0

attributes #0 = { nounwind readnone }
Test for specific output in lit test llvm-svn: 241200 2015-07-02 06:34:59 +08:00			`; RUN: llc -march=amdgcn -mcpu=SI -o - %s \| FileCheck %s`
RegisterCoalescer: Cleanup empty subranges after shrinkToUses() A call to removeEmptySubranges() is necessary after every operation that potentially removes all segments from a subregister range; this case in the register coalescer was missing. llvm-svn: 241027 2015-06-30 08:33:44 +08:00			`; Don't crash when the use of an undefined value is only detected by the`
			`; register coalescer because it is hidden with subregister insert/extract.`
			`target triple="amdgcn--"`

Test for specific output in lit test llvm-svn: 241200 2015-07-02 06:34:59 +08:00			`; CHECK-LABEL: foobar:`
			`; CHECK: s_load_dword s2, s[0:1], 0x9`
			`; CHECK-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xb`
			`; CHECK-NEXT: s_waitcnt lgkmcnt(0)`
DetectDeadLanes: Increase precision when detecting undef inputs In case of COPY-like instruction we may be able to deduce that a certain input is unused, based on the used lanes of the register defined by the instruction. This even works accross otherwise incompatible copies (no need to have compatible lanemasks, completely unused operands are still completely unused). It even makes sense to redo the analysis in this case since we gained information for a case we previously stopped at because of the incompatible masks. llvm-svn: 268815 2016-05-07 06:43:50 +08:00			`; CHECK: v_mbcnt_lo_u32_b32_e64`
AMDGPU: Use unsigned compare for eq/ne For some reason there are both of these available, except for scalar 64-bit compares which only has u64. I'm not sure why there are both (I'm guessing it's for the one bit inputs we don't use), but for consistency always using the unsigned one. llvm-svn: 282832 2016-09-30 09:50:20 +08:00			`; CHECK-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0`
DetectDeadLanes: Increase precision when detecting undef inputs In case of COPY-like instruction we may be able to deduce that a certain input is unused, based on the used lanes of the register defined by the instruction. This even works accross otherwise incompatible copies (no need to have compatible lanemasks, completely unused operands are still completely unused). It even makes sense to redo the analysis in this case since we gained information for a case we previously stopped at because of the incompatible masks. llvm-svn: 268815 2016-05-07 06:43:50 +08:00			`; CHECK-NEXT: s_and_saveexec_b64 s[2:3], vcc`
Test for specific output in lit test llvm-svn: 241200 2015-07-02 06:34:59 +08:00			`; CHECK-NEXT: s_xor_b64 s[2:3], exec, s[2:3]`
			`; BB0_1:`
AMDGPU: Remove SIFixSGPRLiveRanges pass Summary: This pass is unnecessary and overly conservative. It was motivated by situations like def %vreg0:SGPR_32 ... if-block: .. def %vreg1:SGPR_32 ... else-block: ... use %vreg0:SGPR_32 ... and similar situations with uses after the non-uniform control flow, where we are not allowed to assign %vreg0 and %vreg1 to the same physical register, even though in the original, thread/workitem-based CFG, it looks like the live ranges of these registers do not overlap. However, by the time register allocation runs, we have moved to a wave-based CFG that accurately represents the fact that the wave may run through both the if- and the else-block. So the live ranges of %vreg0 and %vreg1 already overlap even without the SIFixSGPRLiveRanges pass. In addition to proving this change correct, I have tested it with Piglit and a small number of other tests. Reviewers: arsenm, tstellarAMD Subscribers: MatzeB, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19041 llvm-svn: 266345 2016-04-15 01:42:29 +08:00			`; CHECK: s_load_dword s0, s[0:1], 0xa`
Test for specific output in lit test llvm-svn: 241200 2015-07-02 06:34:59 +08:00			`; CHECK-NEXT: s_waitcnt lgkmcnt(0)`
			`; BB0_2:`
			`; CHECK: s_or_b64 exec, exec, s[2:3]`
			`; CHECK-NEXT: s_mov_b32 s7, 0xf000`
			`; CHECK-NEXT: s_mov_b32 s6, -1`
AMDGPU/SI: Assembler: Unify parsing/printing of operands. Summary: The goal is for each operand type to have its own parse function and at the same time share common code for tracking state as different instruction types share operand types (e.g. glc/glc_flat, etc). Introduce parseAMDGPUOperand which can parse any optional operand. DPP and Clamp/OMod have custom handling for now. Sam also suggested to have class hierarchy for operand types instead of table. This can be done in separate change. Remove parseVOP3OptionalOps, parseDS*OptionalOps, parseFlatOptionalOps, parseMubufOptionalOps, parseDPPOptionalOps. Reduce number of definitions of AsmOperand's and MatchClasses' by using common base class. Rename AsmMatcher/InstPrinter methods accordingly. Print immediate type when printing parsed immediate operand. Use 'off' if offset/index register is unused instead of skipping it to make it more readable (also agreed with SP3). Update tests. Reviewers: tstellarAMD, SamWot, artem.tamazov Subscribers: qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19584 llvm-svn: 268015 2016-04-29 17:02:30 +08:00			`; CHECK-NEXT: buffer_store_dword v1, off, s[4:7], 0`
Test for specific output in lit test llvm-svn: 241200 2015-07-02 06:34:59 +08:00			`; CHECK-NEXT: s_endpgm`
RegisterCoalescer: Cleanup empty subranges after shrinkToUses() A call to removeEmptySubranges() is necessary after every operation that potentially removes all segments from a subregister range; this case in the register coalescer was missing. llvm-svn: 241027 2015-06-30 08:33:44 +08:00			`define void @foobar(float %a0, float %a1, float addrspace(1)* %out) nounwind {`
			`entry:`
			`%v0 = insertelement <4 x float> undef, float %a0, i32 0`
AMDGPU/SI: Detect uniform branches and emit s_cbranch instructions Reviewers: arsenm Subscribers: mareko, MatzeB, qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16603 llvm-svn: 260765 2016-02-13 07:45:29 +08:00			`%tid = call i32 @llvm.amdgcn.mbcnt.lo(i32 -1, i32 0) #0`
			`%cnd = icmp eq i32 %tid, 0`
			`br i1 %cnd, label %ift, label %ife`
RegisterCoalescer: Cleanup empty subranges after shrinkToUses() A call to removeEmptySubranges() is necessary after every operation that potentially removes all segments from a subregister range; this case in the register coalescer was missing. llvm-svn: 241027 2015-06-30 08:33:44 +08:00
			`ift:`
			`%v1 = insertelement <4 x float> undef, float %a1, i32 0`
			`br label %ife`

			`ife:`
			`%val = phi <4 x float> [ %v1, %ift ], [ %v0, %entry ]`
			`%v2 = extractelement <4 x float> %val, i32 1`
			`store float %v2, float addrspace(1)* %out, align 4`
			`ret void`
			`}`
AMDGPU/SI: Detect uniform branches and emit s_cbranch instructions Reviewers: arsenm Subscribers: mareko, MatzeB, qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16603 llvm-svn: 260765 2016-02-13 07:45:29 +08:00
			`declare i32 @llvm.amdgcn.mbcnt.lo(i32, i32) #0`

			`attributes #0 = { nounwind readnone }`