Matt Arsenault
3dbeefa978
AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel
...
Currently the default C calling convention functions are treated
the same as compute kernels. Make this explicit so the default
calling convention can be changed to a non-kernel.
Converted with perl -pi -e 's/define void/define amdgpu_kernel void/'
on the relevant test directories (and undoing in one place that actually
wanted a non-kernel).
llvm-svn: 298444
2017-03-21 21:39:51 +00:00
Matt Arsenault
3ea06336fc
AMDGPU: Remove some uses of llvm.SI.export in tests
...
Merge some of the old, smaller tests into more complete versions.
llvm-svn: 295792
2017-02-22 00:02:21 +00:00
Matt Arsenault
7aad8fd8f4
Enable FeatureFlatForGlobal on Volcanic Islands
...
This switches to the workaround that HSA defaults to
for the mesa path.
This should be applied to the 4.0 branch.
Patch by Vedran Miletić <vedran@miletic.net>
llvm-svn: 292982
2017-01-24 22:02:15 +00:00
Matt Arsenault
972034bda9
AMDGPU: Fix formatting of 1/2pi immediate
...
llvm-svn: 286912
2016-11-15 00:04:33 +00:00
Matt Arsenault
c88ba36eab
AMDGPU: Use 1/2pi inline imm on VI
...
I'm guessing at how it is supposed to be printed
llvm-svn: 285490
2016-10-29 04:05:06 +00:00
Matt Arsenault
b5f2bb1a88
AMDGPU: Change check prefix in test
...
llvm-svn: 285449
2016-10-28 20:33:01 +00:00
Matt Arsenault
bbb47da8a1
AMDGPU: Support commuting with immediate in src0
...
llvm-svn: 280970
2016-09-08 17:19:29 +00:00
Matt Arsenault
2b957b5a6f
AMDGPU: Make i64 loads/stores promote to v2i32
...
Now that unaligned access expansion should not attempt
to produce i64 accesses, we can remove the hack in
PreprocessISelDAG where this is done.
This allows splitting i64 private accesses while
allowing the new add nodes indexing the vector components
can be folded with the base pointer arithmetic.
llvm-svn: 268293
2016-05-02 20:07:26 +00:00
Matt Arsenault
3d1c1deb04
AMDGPU: Run SIFoldOperands after PeepholeOptimizer
...
PeepholeOptimizer cleans up redundant copies, which makes
the operand folding more effective.
shader-db stats:
Totals:
SGPRS: 34200 -> 34336 (0.40 %)
VGPRS: 22118 -> 21655 (-2.09 %)
Code Size: 632144 -> 633460 (0.21 %) bytes
LDS: 11 -> 11 (0.00 %) blocks
Scratch: 10240 -> 11264 (10.00 %) bytes per wave
Max Waves: 8822 -> 8918 (1.09 %)
Wait states: 0 -> 0 (0.00 %)
Totals from affected shaders:
SGPRS: 7704 -> 7840 (1.77 %)
VGPRS: 5169 -> 4706 (-8.96 %)
Code Size: 234444 -> 235760 (0.56 %) bytes
LDS: 2 -> 2 (0.00 %) blocks
Scratch: 0 -> 1024 (0.00 %) bytes per wave
Max Waves: 1188 -> 1284 (8.08 %)
Wait states: 0 -> 0 (0.00 %)
Increases:
SGPRS: 35 (0.01 %)
VGPRS: 1 (0.00 %)
Code Size: 59 (0.02 %)
LDS: 0 (0.00 %)
Scratch: 1 (0.00 %)
Max Waves: 48 (0.02 %)
Wait states: 0 (0.00 %)
Decreases:
SGPRS: 26 (0.01 %)
VGPRS: 54 (0.02 %)
Code Size: 68 (0.03 %)
LDS: 0 (0.00 %)
Scratch: 0 (0.00 %)
Max Waves: 4 (0.00 %)
Wait states: 0 (0.00 %)
llvm-svn: 266378
2016-04-14 21:58:24 +00:00
Matt Arsenault
9a19c240c0
AMDGPU: Materialize sign bits with bfrev
...
If a constant is the same as the reverse of an inline immediate,
this is 4 bytes smaller than having to embed a 32-bit literal.
llvm-svn: 263201
2016-03-11 07:42:49 +00:00
Matt Arsenault
0de924b76d
AMDGPU: Distribute SGPR->VGPR copies of REG_SEQUENCE
...
Make the REG_SEQUENCE be a VGPR, and do the register class
copy first.
llvm-svn: 251855
2015-11-02 23:15:42 +00:00
Tom Stellard
45bb48ea19
R600 -> AMDGPU rename
...
llvm-svn: 239657
2015-06-13 03:28:10 +00:00