Matt Arsenault
ebf46143ea
AMDGPU: Don't form fmed3 if it will require materialization
...
If there is a single use constant, it can be folded into the
min/max, but not into med3.
llvm-svn: 342443
2018-09-18 02:34:54 +00:00
Matt Arsenault
c3dc8e65e2
DAG: Enhance isKnownNeverNaN
...
Add a parameter for testing specifically for
sNaNs - at least one instruction pattern on AMDGPU
needs to check specifically for this.
Also handle more cases, and add a target hook
for custom nodes, similar to the hooks for known
bits.
llvm-svn: 338910
2018-08-03 18:27:52 +00:00
Konstantin Zhuravlyov
c40d9f2e5d
AMDGPU/GCN: Bring processors in sync with AMDGPUUsage
...
- Add gfx704
- Change bonaire to gfx704
- Remove gfx804
- Remove gfx901
- Remove gfx903
Differential Revision: https://reviews.llvm.org/D40046
llvm-svn: 320194
2017-12-08 20:52:28 +00:00
Matt Arsenault
70b9282015
AMDGPU: Fix -enable-var-scope violations
...
llvm-svn: 318004
2017-11-12 23:53:44 +00:00
Matt Arsenault
4e309b0861
AMDGPU: Start selecting global instructions
...
llvm-svn: 309470
2017-07-29 01:03:53 +00:00
Matt Arsenault
6c29c5acfe
AMDGPU: Allow SIShrinkInstructions to work in non-SSA
...
Immediates can be folded as long as the immediate is a vreg.
Also undo commuting instructions if it didn't fold an immediate.
llvm-svn: 307575
2017-07-10 19:53:57 +00:00
Stanislav Mekhanoshin
5fa289f0d8
[AMDGPU] Narrow lshl from 64 to 32 bit if possible
...
Turn expensive 64 bit shift into 32 bit if shift does not overflow int:
shl (ext x) => zext (shl x)
Differential Revision: https://reviews.llvm.org/D33367
llvm-svn: 303569
2017-05-22 16:58:10 +00:00
Matt Arsenault
3dbeefa978
AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel
...
Currently the default C calling convention functions are treated
the same as compute kernels. Make this explicit so the default
calling convention can be changed to a non-kernel.
Converted with perl -pi -e 's/define void/define amdgpu_kernel void/'
on the relevant test directories (and undoing in one place that actually
wanted a non-kernel).
llvm-svn: 298444
2017-03-21 21:39:51 +00:00
Matt Arsenault
10268f93e8
AMDGPU: Use v_med3_{f16|i16|u16}
...
llvm-svn: 296401
2017-02-27 22:40:39 +00:00
Matt Arsenault
f84e5d9a27
AMDGPU: Generalize matching of v_med3_f32
...
I think this is safe as long as no inputs are known to ever
be nans.
Also add an intrinsic for fmed3 to be able to handle all safe
math cases.
llvm-svn: 293598
2017-01-31 03:07:46 +00:00
Matt Arsenault
f411071d63
DAG: Consider nnan in isKnownNeverNaN
...
llvm-svn: 292328
2017-01-18 02:10:08 +00:00
Matt Arsenault
45f8216cee
AMDGPU: Remove superfluous string attributes from tests
...
Also fix v_mac.ll not testing right thing for fneg
llvm-svn: 275129
2016-07-11 23:35:48 +00:00
Matt Arsenault
3d1c1deb04
AMDGPU: Run SIFoldOperands after PeepholeOptimizer
...
PeepholeOptimizer cleans up redundant copies, which makes
the operand folding more effective.
shader-db stats:
Totals:
SGPRS: 34200 -> 34336 (0.40 %)
VGPRS: 22118 -> 21655 (-2.09 %)
Code Size: 632144 -> 633460 (0.21 %) bytes
LDS: 11 -> 11 (0.00 %) blocks
Scratch: 10240 -> 11264 (10.00 %) bytes per wave
Max Waves: 8822 -> 8918 (1.09 %)
Wait states: 0 -> 0 (0.00 %)
Totals from affected shaders:
SGPRS: 7704 -> 7840 (1.77 %)
VGPRS: 5169 -> 4706 (-8.96 %)
Code Size: 234444 -> 235760 (0.56 %) bytes
LDS: 2 -> 2 (0.00 %) blocks
Scratch: 0 -> 1024 (0.00 %) bytes per wave
Max Waves: 1188 -> 1284 (8.08 %)
Wait states: 0 -> 0 (0.00 %)
Increases:
SGPRS: 35 (0.01 %)
VGPRS: 1 (0.00 %)
Code Size: 59 (0.02 %)
LDS: 0 (0.00 %)
Scratch: 1 (0.00 %)
Max Waves: 48 (0.02 %)
Wait states: 0 (0.00 %)
Decreases:
SGPRS: 26 (0.01 %)
VGPRS: 54 (0.02 %)
Code Size: 68 (0.03 %)
LDS: 0 (0.00 %)
Scratch: 0 (0.00 %)
Max Waves: 4 (0.00 %)
Wait states: 0 (0.00 %)
llvm-svn: 266378
2016-04-14 21:58:24 +00:00
Matt Arsenault
5b39b34ca5
AMDGPU: Match fmed3 patterns with legacy fmin/fmax
...
llvm-svn: 259090
2016-01-28 20:53:48 +00:00
Matt Arsenault
f639c32739
AMDGPU: Match some med3 patterns
...
llvm-svn: 259089
2016-01-28 20:53:42 +00:00