llvm-project/llvm/test/CodeGen/AMDGPU/fmax3.f64.ll

; RUN: llc -march=amdgcn -verify-machineinstrs < %s | FileCheck -check-prefix=SI %s
; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs < %s | FileCheck -check-prefix=SI %s

declare double @llvm.maxnum.f64(double, double) nounwind readnone

; SI-LABEL: {{^}}test_fmax3_f64:
; SI-DAG: buffer_load_dwordx2 [[REGA:v\[[0-9]+:[0-9]+\]]], off, s[{{[0-9]+:[0-9]+}}], 0{{$}}
; SI-DAG: buffer_load_dwordx2 [[REGB:v\[[0-9]+:[0-9]+\]]], off, s[{{[0-9]+:[0-9]+}}], 0 offset:8
; SI: v_max_f64 [[REGA]], [[REGA]], [[REGB]]
; SI: buffer_load_dwordx2 [[REGC:v\[[0-9]+:[0-9]+\]]], off, s[{{[0-9]+:[0-9]+}}], 0 offset:16
; SI: v_max_f64 [[RESULT:v\[[0-9]+:[0-9]+\]]], [[REGA]], [[REGC]]
; SI: buffer_store_dwordx2 [[RESULT]],
; SI: s_endpgm
define amdgpu_kernel void @test_fmax3_f64(double addrspace(1)* %out, double addrspace(1)* %aptr) nounwind {
  %bptr = getelementptr double, double addrspace(1)* %aptr, i32 1
  %cptr = getelementptr double, double addrspace(1)* %aptr, i32 2
  %a = load volatile double, double addrspace(1)* %aptr, align 8
  %b = load volatile double, double addrspace(1)* %bptr, align 8
  %c = load volatile double, double addrspace(1)* %cptr, align 8
  %f0 = call double @llvm.maxnum.f64(double %a, double %b) nounwind readnone
  %f1 = call double @llvm.maxnum.f64(double %f0, double %c) nounwind readnone
  store double %f1, double addrspace(1)* %out, align 8
  ret void
}
Enable FeatureFlatForGlobal on Volcanic Islands This switches to the workaround that HSA defaults to for the mesa path. This should be applied to the 4.0 branch. Patch by Vedran Miletić <vedran@miletic.net> llvm-svn: 292982 2017-01-25 06:02:15 +08:00			`; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=SI %s`
			`; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -check-prefix=SI %s`
R600/SI: don't try min3/max3/med3 with f64 There are no opcodes for this. This also adds a test case. v2: make test more robust Patch by: Grigori Goronzy llvm-svn: 232386 2015-03-16 23:53:55 +08:00
			`declare double @llvm.maxnum.f64(double, double) nounwind readnone`

			`; SI-LABEL: {{^}}test_fmax3_f64:`
AMDGPU/SI: Assembler: Unify parsing/printing of operands. Summary: The goal is for each operand type to have its own parse function and at the same time share common code for tracking state as different instruction types share operand types (e.g. glc/glc_flat, etc). Introduce parseAMDGPUOperand which can parse any optional operand. DPP and Clamp/OMod have custom handling for now. Sam also suggested to have class hierarchy for operand types instead of table. This can be done in separate change. Remove parseVOP3OptionalOps, parseDS*OptionalOps, parseFlatOptionalOps, parseMubufOptionalOps, parseDPPOptionalOps. Reduce number of definitions of AsmOperand's and MatchClasses' by using common base class. Rename AsmMatcher/InstPrinter methods accordingly. Print immediate type when printing parsed immediate operand. Use 'off' if offset/index register is unused instead of skipping it to make it more readable (also agreed with SP3). Update tests. Reviewers: tstellarAMD, SamWot, artem.tamazov Subscribers: qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19584 llvm-svn: 268015 2016-04-29 17:02:30 +08:00			`; SI-DAG: buffer_load_dwordx2 [[REGA:v\[[0-9]+:[0-9]+\]]], off, s[{{[0-9]+:[0-9]+}}], 0{{$}}`
			`; SI-DAG: buffer_load_dwordx2 [[REGB:v\[[0-9]+:[0-9]+\]]], off, s[{{[0-9]+:[0-9]+}}], 0 offset:8`
R600/SI: don't try min3/max3/med3 with f64 There are no opcodes for this. This also adds a test case. v2: make test more robust Patch by: Grigori Goronzy llvm-svn: 232386 2015-03-16 23:53:55 +08:00			`; SI: v_max_f64 [[REGA]], [[REGA]], [[REGB]]`
AMDGPU/SI: Implement a custom MachineSchedStrategy Summary: GCNSchedStrategy re-uses most of GenericScheduler, it's just uses a different method to compute the excess and critical register pressure limits. It's not enabled by default, to enable it you need to pass -misched=gcn to llc. Shader DB stats: 32464 shaders in 17874 tests Totals: SGPRS: 1542846 -> 1643125 (6.50 %) VGPRS: 1005595 -> 904653 (-10.04 %) Spilled SGPRs: 29929 -> 27745 (-7.30 %) Spilled VGPRs: 334 -> 352 (5.39 %) Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread Code Size: 36688188 -> 37034900 (0.95 %) bytes LDS: 1913 -> 1913 (0.00 %) blocks Max Waves: 254101 -> 265125 (4.34 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 1338220 -> 1438499 (7.49 %) VGPRS: 886221 -> 785279 (-11.39 %) Spilled SGPRs: 29869 -> 27685 (-7.31 %) Spilled VGPRs: 334 -> 352 (5.39 %) Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread Code Size: 34315716 -> 34662428 (1.01 %) bytes LDS: 1551 -> 1551 (0.00 %) blocks Max Waves: 188127 -> 199151 (5.86 %) Wait states: 0 -> 0 (0.00 %) Reviewers: arsenm, mareko, nhaehnle, MatzeB, atrick Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D23688 llvm-svn: 279995 2016-08-30 03:42:52 +08:00			`; SI: buffer_load_dwordx2 [[REGC:v\[[0-9]+:[0-9]+\]]], off, s[{{[0-9]+:[0-9]+}}], 0 offset:16`
R600/SI: don't try min3/max3/med3 with f64 There are no opcodes for this. This also adds a test case. v2: make test more robust Patch by: Grigori Goronzy llvm-svn: 232386 2015-03-16 23:53:55 +08:00			`; SI: v_max_f64 [[RESULT:v\[[0-9]+:[0-9]+\]]], [[REGA]], [[REGC]]`
			`; SI: buffer_store_dwordx2 [[RESULT]],`
			`; SI: s_endpgm`
AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default calling convention can be changed to a non-kernel. Converted with perl -pi -e 's/define void/define amdgpu_kernel void/' on the relevant test directories (and undoing in one place that actually wanted a non-kernel). llvm-svn: 298444 2017-03-22 05:39:51 +08:00			`define amdgpu_kernel void @test_fmax3_f64(double addrspace(1)* %out, double addrspace(1)* %aptr) nounwind {`
R600/SI: don't try min3/max3/med3 with f64 There are no opcodes for this. This also adds a test case. v2: make test more robust Patch by: Grigori Goronzy llvm-svn: 232386 2015-03-16 23:53:55 +08:00			`%bptr = getelementptr double, double addrspace(1)* %aptr, i32 1`
			`%cptr = getelementptr double, double addrspace(1)* %aptr, i32 2`
AMDGPU: Add volatile to test loads and stores When the memory vectorizer is enabled, these tests break. These tests don't really care about the memory instructions, and it's easier to write check lines with the unmerged loads. llvm-svn: 266071 2016-04-12 21:38:18 +08:00			`%a = load volatile double, double addrspace(1)* %aptr, align 8`
			`%b = load volatile double, double addrspace(1)* %bptr, align 8`
			`%c = load volatile double, double addrspace(1)* %cptr, align 8`
R600/SI: don't try min3/max3/med3 with f64 There are no opcodes for this. This also adds a test case. v2: make test more robust Patch by: Grigori Goronzy llvm-svn: 232386 2015-03-16 23:53:55 +08:00			`%f0 = call double @llvm.maxnum.f64(double %a, double %b) nounwind readnone`
			`%f1 = call double @llvm.maxnum.f64(double %f0, double %c) nounwind readnone`
			`store double %f1, double addrspace(1)* %out, align 8`
			`ret void`
			`}`