forked from OSchip/llvm-project
f99ccf6516
This gives ~30x speedup compared to expanding Tanh into exp operations: ``` name old cpu/op new cpu/op delta BM_mlir_Tanh_f32/10 253ns ± 3% 55ns ± 7% -78.35% (p=0.000 n=44+41) BM_mlir_Tanh_f32/100 2.21µs ± 4% 0.14µs ± 8% -93.85% (p=0.000 n=48+49) BM_mlir_Tanh_f32/1k 22.6µs ± 4% 0.7µs ± 5% -96.68% (p=0.000 n=32+42) BM_mlir_Tanh_f32/10k 225µs ± 5% 7µs ± 6% -96.88% (p=0.000 n=49+55) name old time/op new time/op delta BM_mlir_Tanh_f32/10 259ns ± 1% 56ns ± 2% -78.31% (p=0.000 n=41+39) BM_mlir_Tanh_f32/100 2.27µs ± 1% 0.14µs ± 5% -93.89% (p=0.000 n=46+49) BM_mlir_Tanh_f32/1k 22.9µs ± 1% 0.8µs ± 4% -96.67% (p=0.000 n=30+42) BM_mlir_Tanh_f32/10k 230µs ± 0% 7µs ± 3% -96.88% (p=0.000 n=37+55) ``` This approximations is based on Eigen::generic_fast_tanh function Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D96739 |
||
---|---|---|
.. | ||
async-group.mlir | ||
async-value.mlir | ||
async.mlir | ||
bare_ptr_call_conv.mlir | ||
global_memref.mlir | ||
lit.local.cfg | ||
math_polynomial_approx.mlir | ||
memref_reinterpret_cast.mlir | ||
memref_reshape.mlir | ||
sgemm_naive_codegen.mlir | ||
simple.mlir | ||
unranked_memref.mlir | ||
utils.mlir |