Matt Arsenault
68e70fb098
AMDGPU: Fix not using v_cvt_f16_[iu]16
...
We weren't treating i16->f16 casts as legal on targets with these
instructions, and always using a pair of casts through i32.
2020-01-07 15:10:07 -05:00
Matt Arsenault
e8d9d202bc
AMDGPU: Add run line to int_to_fp tests
...
This wasn't catching a regression on targets with legal i16 triggered
in a future commit.
2020-01-06 21:38:50 -05:00
Matt Arsenault
3dbeefa978
AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel
...
Currently the default C calling convention functions are treated
the same as compute kernels. Make this explicit so the default
calling convention can be changed to a non-kernel.
Converted with perl -pi -e 's/define void/define amdgpu_kernel void/'
on the relevant test directories (and undoing in one place that actually
wanted a non-kernel).
llvm-svn: 298444
2017-03-21 21:39:51 +00:00
Matt Arsenault
5d8eb25e78
AMDGPU: Use unsigned compare for eq/ne
...
For some reason there are both of these available, except
for scalar 64-bit compares which only has u64. I'm not sure
why there are both (I'm guessing it's for the one bit inputs we
don't use), but for consistency always using the
unsigned one.
llvm-svn: 282832
2016-09-30 01:50:20 +00:00
Matt Arsenault
3d1c1deb04
AMDGPU: Run SIFoldOperands after PeepholeOptimizer
...
PeepholeOptimizer cleans up redundant copies, which makes
the operand folding more effective.
shader-db stats:
Totals:
SGPRS: 34200 -> 34336 (0.40 %)
VGPRS: 22118 -> 21655 (-2.09 %)
Code Size: 632144 -> 633460 (0.21 %) bytes
LDS: 11 -> 11 (0.00 %) blocks
Scratch: 10240 -> 11264 (10.00 %) bytes per wave
Max Waves: 8822 -> 8918 (1.09 %)
Wait states: 0 -> 0 (0.00 %)
Totals from affected shaders:
SGPRS: 7704 -> 7840 (1.77 %)
VGPRS: 5169 -> 4706 (-8.96 %)
Code Size: 234444 -> 235760 (0.56 %) bytes
LDS: 2 -> 2 (0.00 %) blocks
Scratch: 0 -> 1024 (0.00 %) bytes per wave
Max Waves: 1188 -> 1284 (8.08 %)
Wait states: 0 -> 0 (0.00 %)
Increases:
SGPRS: 35 (0.01 %)
VGPRS: 1 (0.00 %)
Code Size: 59 (0.02 %)
LDS: 0 (0.00 %)
Scratch: 1 (0.00 %)
Max Waves: 48 (0.02 %)
Wait states: 0 (0.00 %)
Decreases:
SGPRS: 26 (0.01 %)
VGPRS: 54 (0.02 %)
Code Size: 68 (0.03 %)
LDS: 0 (0.00 %)
Scratch: 0 (0.00 %)
Max Waves: 4 (0.00 %)
Wait states: 0 (0.00 %)
llvm-svn: 266378
2016-04-14 21:58:24 +00:00
Matt Arsenault
7900334dd5
AMDGPU: Fold bitcasts of scalar constants to vectors
...
This cleans up some messes since the individual scalar components
can be CSEed.
llvm-svn: 266376
2016-04-14 21:58:07 +00:00
Matt Arsenault
9c47dd583a
AMDGPU: Remove some old intrinsic uses from tests
...
llvm-svn: 260493
2016-02-11 06:02:01 +00:00
Matt Arsenault
e8df879948
AMDGPU: Improve accuracy of instruction rates for some FP instructions
...
llvm-svn: 245774
2015-08-22 00:50:41 +00:00
Tom Stellard
e48fe2a27a
AMDGPU/SI: Add support for shrinking v_cndmask_b32_e32 instructions
...
Reviewers: arsenm
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11061
llvm-svn: 242146
2015-07-14 14:15:03 +00:00
Tom Stellard
45bb48ea19
R600 -> AMDGPU rename
...
llvm-svn: 239657
2015-06-13 03:28:10 +00:00