llvm-project/llvm/test/CodeGen/AMDGPU/fp32_to_fp16.ll

; RUN:  llc -amdgpu-scalarize-global-loads=false  -march=amdgcn -verify-machineinstrs < %s | FileCheck -check-prefix=GCN -check-prefix=FUNC %s
; RUN:  llc -amdgpu-scalarize-global-loads=false  -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs < %s | FileCheck -check-prefix=GCN -check-prefix=FUNC %s
; RUN:  llc -amdgpu-scalarize-global-loads=false  -march=r600 -mcpu=cypress -verify-machineinstrs < %s | FileCheck -check-prefix=EG -check-prefix=FUNC %s

declare i16 @llvm.convert.to.fp16.f32(float) nounwind readnone

; FUNC-LABEL: {{^}}test_convert_fp32_to_fp16:
; GCN: buffer_load_dword [[VAL:v[0-9]+]]
; GCN: v_cvt_f16_f32_e32 [[RESULT:v[0-9]+]], [[VAL]]
; GCN: buffer_store_short [[RESULT]]

; EG: MEM_RAT MSKOR
; EG: VTX_READ_32
; EG: FLT32_TO_FLT16
define amdgpu_kernel void @test_convert_fp32_to_fp16(i16 addrspace(1)* noalias %out, float addrspace(1)* noalias %in) nounwind {
  %val = load float, float addrspace(1)* %in, align 4
  %cvt = call i16 @llvm.convert.to.fp16.f32(float %val) nounwind readnone
  store i16 %cvt, i16 addrspace(1)* %out, align 2
  ret void
}
[AMDGPU] Switch scalarize global loads ON by default Differential revision: https://reviews.llvm.org/D34407 llvm-svn: 307097 2017-07-05 01:32:00 +08:00			`; RUN: llc -amdgpu-scalarize-global-loads=false -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=FUNC %s`
			`; RUN: llc -amdgpu-scalarize-global-loads=false -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=FUNC %s`
			`; RUN: llc -amdgpu-scalarize-global-loads=false -march=r600 -mcpu=cypress -verify-machineinstrs < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s`
R600/SI: Add support for llvm.convert.{to\|from}.fp16 llvm-svn: 212676 2014-07-10 11:22:20 +08:00
CodeGen: extend f16 conversions to permit types > float. This makes the two intrinsics @llvm.convert.from.f16 and @llvm.convert.to.f16 accept types other than simple "float". This is only strictly needed for the truncate operation, since otherwise double rounding occurs and there's no way to represent the strict IEEE conversion. However, for symmetry we allow larger types in the extend too. During legalization, we can expand an "fp16_to_double" operation into two extends for convenience, but abort when the truncate isn't legal. A new libcall is probably needed here. Even after this commit, various target tweaks are needed to actually use the extended intrinsics. I've put these into separate commits for clarity, so there are no actual tests of f64 conversion here. llvm-svn: 213248 2014-07-17 18:51:23 +08:00			`declare i16 @llvm.convert.to.fp16.f32(float) nounwind readnone`
R600/SI: Add support for llvm.convert.{to\|from}.fp16 llvm-svn: 212676 2014-07-10 11:22:20 +08:00
AMDGPU/EG,CM: Add fp16 conversion instructions Differential Revision: https://reviews.llvm.org/D28164 llvm-svn: 291622 2017-01-11 08:12:39 +08:00			`; FUNC-LABEL: {{^}}test_convert_fp32_to_fp16:`
			`; GCN: buffer_load_dword [[VAL:v[0-9]+]]`
			`; GCN: v_cvt_f16_f32_e32 [[RESULT:v[0-9]+]], [[VAL]]`
			`; GCN: buffer_store_short [[RESULT]]`

			`; EG: MEM_RAT MSKOR`
			`; EG: VTX_READ_32`
			`; EG: FLT32_TO_FLT16`
AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default calling convention can be changed to a non-kernel. Converted with perl -pi -e 's/define void/define amdgpu_kernel void/' on the relevant test directories (and undoing in one place that actually wanted a non-kernel). llvm-svn: 298444 2017-03-22 05:39:51 +08:00			`define amdgpu_kernel void @test_convert_fp32_to_fp16(i16 addrspace(1)* noalias %out, float addrspace(1)* noalias %in) nounwind {`
[opaque pointer type] Add textual IR support for explicit type parameter to load instruction Essentially the same as the GEP change in r230786. A similar migration script can be used to update test cases, though a few more test case improvements/changes were required this time around: (r229269-r229278) import fileinput import sys import re pat = re.compile(r"((?:=\|:\|^)\sload (?:atomic )?(?:volatile )?(.?))(\| addrspace\(\d+\) )\($\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$)") for line in sys.stdin: sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line)) Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7649 llvm-svn: 230794 2015-02-28 05:17:42 +08:00			`%val = load float, float addrspace(1)* %in, align 4`
CodeGen: extend f16 conversions to permit types > float. This makes the two intrinsics @llvm.convert.from.f16 and @llvm.convert.to.f16 accept types other than simple "float". This is only strictly needed for the truncate operation, since otherwise double rounding occurs and there's no way to represent the strict IEEE conversion. However, for symmetry we allow larger types in the extend too. During legalization, we can expand an "fp16_to_double" operation into two extends for convenience, but abort when the truncate isn't legal. A new libcall is probably needed here. Even after this commit, various target tweaks are needed to actually use the extended intrinsics. I've put these into separate commits for clarity, so there are no actual tests of f64 conversion here. llvm-svn: 213248 2014-07-17 18:51:23 +08:00			`%cvt = call i16 @llvm.convert.to.fp16.f32(float %val) nounwind readnone`
R600/SI: Add support for llvm.convert.{to\|from}.fp16 llvm-svn: 212676 2014-07-10 11:22:20 +08:00			`store i16 %cvt, i16 addrspace(1)* %out, align 2`
			`ret void`
			`}`