llvm-project/llvm/test/CodeGen/NVPTX/vector-loads.ll

; RUN: llc < %s -march=nvptx -mcpu=sm_20 | FileCheck %s

; Even though general vector types are not supported in PTX, we can still
; optimize loads/stores with pseudo-vector instructions of the form:
;
; ld.v2.f32 {%f0, %f1}, [%r0]
;
; which will load two floats at once into scalar registers.

define void @foo(<2 x float>* %a) {
; CHECK: .func foo
; CHECK: ld.v2.f32 {%f{{[0-9]+}}, %f{{[0-9]+}}}
  %t1 = load <2 x float>, <2 x float>* %a
  %t2 = fmul <2 x float> %t1, %t1
  store <2 x float> %t2, <2 x float>* %a
  ret void
}

define void @foo2(<4 x float>* %a) {
; CHECK: .func foo2
; CHECK: ld.v4.f32 {%f{{[0-9]+}}, %f{{[0-9]+}}, %f{{[0-9]+}}, %f{{[0-9]+}}}
  %t1 = load <4 x float>, <4 x float>* %a
  %t2 = fmul <4 x float> %t1, %t1
  store <4 x float> %t2, <4 x float>* %a
  ret void
}

define void @foo3(<8 x float>* %a) {
; CHECK: .func foo3
; CHECK: ld.v4.f32 {%f{{[0-9]+}}, %f{{[0-9]+}}, %f{{[0-9]+}}, %f{{[0-9]+}}}
; CHECK-NEXT: ld.v4.f32 {%f{{[0-9]+}}, %f{{[0-9]+}}, %f{{[0-9]+}}, %f{{[0-9]+}}}
  %t1 = load <8 x float>, <8 x float>* %a
  %t2 = fmul <8 x float> %t1, %t1
  store <8 x float> %t2, <8 x float>* %a
  ret void
}


define void @foo4(<2 x i32>* %a) {
; CHECK: .func foo4
; CHECK: ld.v2.u32 {%r{{[0-9]+}}, %r{{[0-9]+}}}
  %t1 = load <2 x i32>, <2 x i32>* %a
  %t2 = mul <2 x i32> %t1, %t1
  store <2 x i32> %t2, <2 x i32>* %a
  ret void
}

define void @foo5(<4 x i32>* %a) {
; CHECK: .func foo5
; CHECK: ld.v4.u32 {%r{{[0-9]+}}, %r{{[0-9]+}}, %r{{[0-9]+}}, %r{{[0-9]+}}}
  %t1 = load <4 x i32>, <4 x i32>* %a
  %t2 = mul <4 x i32> %t1, %t1
  store <4 x i32> %t2, <4 x i32>* %a
  ret void
}

define void @foo6(<8 x i32>* %a) {
; CHECK: .func foo6
; CHECK: ld.v4.u32 {%r{{[0-9]+}}, %r{{[0-9]+}}, %r{{[0-9]+}}, %r{{[0-9]+}}}
; CHECK-NEXT: ld.v4.u32 {%r{{[0-9]+}}, %r{{[0-9]+}}, %r{{[0-9]+}}, %r{{[0-9]+}}}
  %t1 = load <8 x i32>, <8 x i32>* %a
  %t2 = mul <8 x i32> %t1, %t1
  store <8 x i32> %t2, <8 x i32>* %a
  ret void
}
[NVPTX] Disable vector registers Vectors were being manually scalarized by the backend. Instead, let the target-independent code do all of the work. The manual scalarization was from a time before good target-independent support for scalarization in LLVM. However, this forces us to specially-handle vector loads and stores, which we can turn into PTX instructions that produce/consume multiple operands. llvm-svn: 174968 2013-02-12 22:18:49 +08:00			`; RUN: llc < %s -march=nvptx -mcpu=sm_20 \| FileCheck %s`

			`; Even though general vector types are not supported in PTX, we can still`
			`; optimize loads/stores with pseudo-vector instructions of the form:`
			`;`
			`; ld.v2.f32 {%f0, %f1}, [%r0]`
			`;`
			`; which will load two floats at once into scalar registers.`

			`define void @foo(<2 x float>* %a) {`
			`; CHECK: .func foo`
Propagate DAG node ordering during type legalization and instruction selection A node's ordering is only propagated during legalization if (a) the new node does not have an ordering (is not a CSE'd node), or (b) the new node has an ordering that is higher than the node being legalized. llvm-svn: 177465 2013-03-20 08:10:32 +08:00			`; CHECK: ld.v2.f32 {%f{{[0-9]+}}, %f{{[0-9]+}}}`
[opaque pointer type] Add textual IR support for explicit type parameter to load instruction Essentially the same as the GEP change in r230786. A similar migration script can be used to update test cases, though a few more test case improvements/changes were required this time around: (r229269-r229278) import fileinput import sys import re pat = re.compile(r"((?:=\|:\|^)\sload (?:atomic )?(?:volatile )?(.?))(\| addrspace\(\d+\) )\($\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$)") for line in sys.stdin: sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line)) Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7649 llvm-svn: 230794 2015-02-28 05:17:42 +08:00			`%t1 = load <2 x float>, <2 x float>* %a`
[NVPTX] Disable vector registers Vectors were being manually scalarized by the backend. Instead, let the target-independent code do all of the work. The manual scalarization was from a time before good target-independent support for scalarization in LLVM. However, this forces us to specially-handle vector loads and stores, which we can turn into PTX instructions that produce/consume multiple operands. llvm-svn: 174968 2013-02-12 22:18:49 +08:00			`%t2 = fmul <2 x float> %t1, %t1`
			`store <2 x float> %t2, <2 x float>* %a`
			`ret void`
			`}`

			`define void @foo2(<4 x float>* %a) {`
			`; CHECK: .func foo2`
Propagate DAG node ordering during type legalization and instruction selection A node's ordering is only propagated during legalization if (a) the new node does not have an ordering (is not a CSE'd node), or (b) the new node has an ordering that is higher than the node being legalized. llvm-svn: 177465 2013-03-20 08:10:32 +08:00			`; CHECK: ld.v4.f32 {%f{{[0-9]+}}, %f{{[0-9]+}}, %f{{[0-9]+}}, %f{{[0-9]+}}}`
[opaque pointer type] Add textual IR support for explicit type parameter to load instruction Essentially the same as the GEP change in r230786. A similar migration script can be used to update test cases, though a few more test case improvements/changes were required this time around: (r229269-r229278) import fileinput import sys import re pat = re.compile(r"((?:=\|:\|^)\sload (?:atomic )?(?:volatile )?(.?))(\| addrspace\(\d+\) )\($\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$)") for line in sys.stdin: sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line)) Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7649 llvm-svn: 230794 2015-02-28 05:17:42 +08:00			`%t1 = load <4 x float>, <4 x float>* %a`
[NVPTX] Disable vector registers Vectors were being manually scalarized by the backend. Instead, let the target-independent code do all of the work. The manual scalarization was from a time before good target-independent support for scalarization in LLVM. However, this forces us to specially-handle vector loads and stores, which we can turn into PTX instructions that produce/consume multiple operands. llvm-svn: 174968 2013-02-12 22:18:49 +08:00			`%t2 = fmul <4 x float> %t1, %t1`
			`store <4 x float> %t2, <4 x float>* %a`
			`ret void`
			`}`

			`define void @foo3(<8 x float>* %a) {`
			`; CHECK: .func foo3`
Propagate DAG node ordering during type legalization and instruction selection A node's ordering is only propagated during legalization if (a) the new node does not have an ordering (is not a CSE'd node), or (b) the new node has an ordering that is higher than the node being legalized. llvm-svn: 177465 2013-03-20 08:10:32 +08:00			`; CHECK: ld.v4.f32 {%f{{[0-9]+}}, %f{{[0-9]+}}, %f{{[0-9]+}}, %f{{[0-9]+}}}`
			`; CHECK-NEXT: ld.v4.f32 {%f{{[0-9]+}}, %f{{[0-9]+}}, %f{{[0-9]+}}, %f{{[0-9]+}}}`
[opaque pointer type] Add textual IR support for explicit type parameter to load instruction Essentially the same as the GEP change in r230786. A similar migration script can be used to update test cases, though a few more test case improvements/changes were required this time around: (r229269-r229278) import fileinput import sys import re pat = re.compile(r"((?:=\|:\|^)\sload (?:atomic )?(?:volatile )?(.?))(\| addrspace\(\d+\) )\($\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$)") for line in sys.stdin: sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line)) Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7649 llvm-svn: 230794 2015-02-28 05:17:42 +08:00			`%t1 = load <8 x float>, <8 x float>* %a`
[NVPTX] Disable vector registers Vectors were being manually scalarized by the backend. Instead, let the target-independent code do all of the work. The manual scalarization was from a time before good target-independent support for scalarization in LLVM. However, this forces us to specially-handle vector loads and stores, which we can turn into PTX instructions that produce/consume multiple operands. llvm-svn: 174968 2013-02-12 22:18:49 +08:00			`%t2 = fmul <8 x float> %t1, %t1`
			`store <8 x float> %t2, <8 x float>* %a`
			`ret void`
			`}`



			`define void @foo4(<2 x i32>* %a) {`
			`; CHECK: .func foo4`
Propagate DAG node ordering during type legalization and instruction selection A node's ordering is only propagated during legalization if (a) the new node does not have an ordering (is not a CSE'd node), or (b) the new node has an ordering that is higher than the node being legalized. llvm-svn: 177465 2013-03-20 08:10:32 +08:00			`; CHECK: ld.v2.u32 {%r{{[0-9]+}}, %r{{[0-9]+}}}`
[opaque pointer type] Add textual IR support for explicit type parameter to load instruction Essentially the same as the GEP change in r230786. A similar migration script can be used to update test cases, though a few more test case improvements/changes were required this time around: (r229269-r229278) import fileinput import sys import re pat = re.compile(r"((?:=\|:\|^)\sload (?:atomic )?(?:volatile )?(.?))(\| addrspace\(\d+\) )\($\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$)") for line in sys.stdin: sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line)) Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7649 llvm-svn: 230794 2015-02-28 05:17:42 +08:00			`%t1 = load <2 x i32>, <2 x i32>* %a`
[NVPTX] Disable vector registers Vectors were being manually scalarized by the backend. Instead, let the target-independent code do all of the work. The manual scalarization was from a time before good target-independent support for scalarization in LLVM. However, this forces us to specially-handle vector loads and stores, which we can turn into PTX instructions that produce/consume multiple operands. llvm-svn: 174968 2013-02-12 22:18:49 +08:00			`%t2 = mul <2 x i32> %t1, %t1`
			`store <2 x i32> %t2, <2 x i32>* %a`
			`ret void`
			`}`

			`define void @foo5(<4 x i32>* %a) {`
			`; CHECK: .func foo5`
Propagate DAG node ordering during type legalization and instruction selection A node's ordering is only propagated during legalization if (a) the new node does not have an ordering (is not a CSE'd node), or (b) the new node has an ordering that is higher than the node being legalized. llvm-svn: 177465 2013-03-20 08:10:32 +08:00			`; CHECK: ld.v4.u32 {%r{{[0-9]+}}, %r{{[0-9]+}}, %r{{[0-9]+}}, %r{{[0-9]+}}}`
[opaque pointer type] Add textual IR support for explicit type parameter to load instruction Essentially the same as the GEP change in r230786. A similar migration script can be used to update test cases, though a few more test case improvements/changes were required this time around: (r229269-r229278) import fileinput import sys import re pat = re.compile(r"((?:=\|:\|^)\sload (?:atomic )?(?:volatile )?(.?))(\| addrspace\(\d+\) )\($\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$)") for line in sys.stdin: sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line)) Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7649 llvm-svn: 230794 2015-02-28 05:17:42 +08:00			`%t1 = load <4 x i32>, <4 x i32>* %a`
[NVPTX] Disable vector registers Vectors were being manually scalarized by the backend. Instead, let the target-independent code do all of the work. The manual scalarization was from a time before good target-independent support for scalarization in LLVM. However, this forces us to specially-handle vector loads and stores, which we can turn into PTX instructions that produce/consume multiple operands. llvm-svn: 174968 2013-02-12 22:18:49 +08:00			`%t2 = mul <4 x i32> %t1, %t1`
			`store <4 x i32> %t2, <4 x i32>* %a`
			`ret void`
			`}`

			`define void @foo6(<8 x i32>* %a) {`
			`; CHECK: .func foo6`
Propagate DAG node ordering during type legalization and instruction selection A node's ordering is only propagated during legalization if (a) the new node does not have an ordering (is not a CSE'd node), or (b) the new node has an ordering that is higher than the node being legalized. llvm-svn: 177465 2013-03-20 08:10:32 +08:00			`; CHECK: ld.v4.u32 {%r{{[0-9]+}}, %r{{[0-9]+}}, %r{{[0-9]+}}, %r{{[0-9]+}}}`
			`; CHECK-NEXT: ld.v4.u32 {%r{{[0-9]+}}, %r{{[0-9]+}}, %r{{[0-9]+}}, %r{{[0-9]+}}}`
[opaque pointer type] Add textual IR support for explicit type parameter to load instruction Essentially the same as the GEP change in r230786. A similar migration script can be used to update test cases, though a few more test case improvements/changes were required this time around: (r229269-r229278) import fileinput import sys import re pat = re.compile(r"((?:=\|:\|^)\sload (?:atomic )?(?:volatile )?(.?))(\| addrspace\(\d+\) )\($\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$)") for line in sys.stdin: sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line)) Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7649 llvm-svn: 230794 2015-02-28 05:17:42 +08:00			`%t1 = load <8 x i32>, <8 x i32>* %a`
[NVPTX] Disable vector registers Vectors were being manually scalarized by the backend. Instead, let the target-independent code do all of the work. The manual scalarization was from a time before good target-independent support for scalarization in LLVM. However, this forces us to specially-handle vector loads and stores, which we can turn into PTX instructions that produce/consume multiple operands. llvm-svn: 174968 2013-02-12 22:18:49 +08:00			`%t2 = mul <8 x i32> %t1, %t1`
			`store <8 x i32> %t2, <8 x i32>* %a`
			`ret void`
			`}`