Commit Graph

9 Commits

Author SHA1 Message Date
Konstantin Zhuravlyov c2c2eb7d01 AMDGPU: Add D16 instructions preserve unused bits feature
- Predicate D16 patterns on this new feature
- Added this new feature to gfx900/2/4

Differential Revision: https://reviews.llvm.org/D46366

llvm-svn: 331551
2018-05-04 20:06:57 +00:00
Yaxun Liu 0124b5484c [AMDGPU] Change constant addr space to 4
Differential Revision: https://reviews.llvm.org/D43170

llvm-svn: 325030
2018-02-13 18:00:25 +00:00
Yaxun Liu 2a22c5deff [AMDGPU] Switch to the new addr space mapping by default
This requires corresponding clang change.

Differential Revision: https://reviews.llvm.org/D40955

llvm-svn: 324101
2018-02-02 16:07:16 +00:00
Matt Arsenault 3f71c0e3ee AMDGPU: Select DS insts without m0 initialization
GFX9 stopped using m0 for most DS instructions. Select
a different instruction without the use. I think this will
be less error prone than trying to manually maintain m0
uses as needed.

llvm-svn: 319270
2017-11-29 00:55:57 +00:00
Matt Arsenault 45b98189bd AMDGPU: Don't use MUBUF vaddr if address may overflow
Effectively revert r263964. Before we would not
allow this if vaddr was not known to be positive.

llvm-svn: 318240
2017-11-15 00:45:43 +00:00
Matt Arsenault 4b7938c658 AMDGPU: Fix not converting d16 load/stores to offset
Fixes missed optimization with new MUBUF instructions.

llvm-svn: 318106
2017-11-13 23:24:26 +00:00
Matt Arsenault e1cd482fda AMDGPU: Select d16 loads into low component of register
llvm-svn: 318005
2017-11-13 00:22:09 +00:00
Matt Arsenault b866d10127 AMDGPU: Fix potentially incorrectly matching check lines
These check lines are supposed to make sure the new d16
load instructions aren't used, but the expected instruction
name is a prefix of the incorrect instruction name.

llvm-svn: 314714
2017-10-02 20:31:16 +00:00
Matt Arsenault b81495dccb AMDGPU: Match load d16 hi instructions
Also starts selecting global loads for constant address
in some cases. Some end up selecting to mubuf still, which
requires investigation.

We still get sub-optimal regalloc and extra waitcnts inserted
due to not really tracking the liveness of the separate register
halves.

llvm-svn: 313716
2017-09-20 05:01:53 +00:00