Matt Arsenault
6c29c5acfe
AMDGPU: Allow SIShrinkInstructions to work in non-SSA
...
Immediates can be folded as long as the immediate is a vreg.
Also undo commuting instructions if it didn't fold an immediate.
llvm-svn: 307575
2017-07-10 19:53:57 +00:00
Stanislav Mekhanoshin
5fa289f0d8
[AMDGPU] Narrow lshl from 64 to 32 bit if possible
...
Turn expensive 64 bit shift into 32 bit if shift does not overflow int:
shl (ext x) => zext (shl x)
Differential Revision: https://reviews.llvm.org/D33367
llvm-svn: 303569
2017-05-22 16:58:10 +00:00
Matt Arsenault
3dbeefa978
AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel
...
Currently the default C calling convention functions are treated
the same as compute kernels. Make this explicit so the default
calling convention can be changed to a non-kernel.
Converted with perl -pi -e 's/define void/define amdgpu_kernel void/'
on the relevant test directories (and undoing in one place that actually
wanted a non-kernel).
llvm-svn: 298444
2017-03-21 21:39:51 +00:00
Matt Arsenault
10268f93e8
AMDGPU: Use v_med3_{f16|i16|u16}
...
llvm-svn: 296401
2017-02-27 22:40:39 +00:00
Matt Arsenault
f84e5d9a27
AMDGPU: Generalize matching of v_med3_f32
...
I think this is safe as long as no inputs are known to ever
be nans.
Also add an intrinsic for fmed3 to be able to handle all safe
math cases.
llvm-svn: 293598
2017-01-31 03:07:46 +00:00
Matt Arsenault
f411071d63
DAG: Consider nnan in isKnownNeverNaN
...
llvm-svn: 292328
2017-01-18 02:10:08 +00:00
Matt Arsenault
45f8216cee
AMDGPU: Remove superfluous string attributes from tests
...
Also fix v_mac.ll not testing right thing for fneg
llvm-svn: 275129
2016-07-11 23:35:48 +00:00
Matt Arsenault
3d1c1deb04
AMDGPU: Run SIFoldOperands after PeepholeOptimizer
...
PeepholeOptimizer cleans up redundant copies, which makes
the operand folding more effective.
shader-db stats:
Totals:
SGPRS: 34200 -> 34336 (0.40 %)
VGPRS: 22118 -> 21655 (-2.09 %)
Code Size: 632144 -> 633460 (0.21 %) bytes
LDS: 11 -> 11 (0.00 %) blocks
Scratch: 10240 -> 11264 (10.00 %) bytes per wave
Max Waves: 8822 -> 8918 (1.09 %)
Wait states: 0 -> 0 (0.00 %)
Totals from affected shaders:
SGPRS: 7704 -> 7840 (1.77 %)
VGPRS: 5169 -> 4706 (-8.96 %)
Code Size: 234444 -> 235760 (0.56 %) bytes
LDS: 2 -> 2 (0.00 %) blocks
Scratch: 0 -> 1024 (0.00 %) bytes per wave
Max Waves: 1188 -> 1284 (8.08 %)
Wait states: 0 -> 0 (0.00 %)
Increases:
SGPRS: 35 (0.01 %)
VGPRS: 1 (0.00 %)
Code Size: 59 (0.02 %)
LDS: 0 (0.00 %)
Scratch: 1 (0.00 %)
Max Waves: 48 (0.02 %)
Wait states: 0 (0.00 %)
Decreases:
SGPRS: 26 (0.01 %)
VGPRS: 54 (0.02 %)
Code Size: 68 (0.03 %)
LDS: 0 (0.00 %)
Scratch: 0 (0.00 %)
Max Waves: 4 (0.00 %)
Wait states: 0 (0.00 %)
llvm-svn: 266378
2016-04-14 21:58:24 +00:00
Matt Arsenault
5b39b34ca5
AMDGPU: Match fmed3 patterns with legacy fmin/fmax
...
llvm-svn: 259090
2016-01-28 20:53:48 +00:00
Matt Arsenault
f639c32739
AMDGPU: Match some med3 patterns
...
llvm-svn: 259089
2016-01-28 20:53:42 +00:00