Commit Graph

3 Commits

Author SHA1 Message Date
Stanislav Mekhanoshin 9d7b1c9ddb [AMDGPU] Always use rcp + mul with fast math
Regardless of relaxation options such as -cl-fast-relaxed-math
we are producing rather long code for fdiv via amdgcn_fdiv_fast
intrinsic. This intrinsic is used to replace fdiv with 2.5ulp
metadata and does not handle denormals, thus believed to be fast.

An fdiv instruction can also have fast math flag either by itself
or together with fpmath metadata. Clang used with a relaxation flag
always produces both metadata and fast flag:

%div = fdiv fast float %v, %0, !fpmath !12
!12 = !{float 2.500000e+00}

Current implementation ignores fast flag and favors metadata. An
instruction with just fast flag would be lowered to a fastest rcp +
mul, but that never happen on practice because of described mutual
clang and BE behavior.

This change allows an "fdiv fast" to be always lowered as rcp + mul.

Differential Revision: https://reviews.llvm.org/D34844

llvm-svn: 307308
2017-07-06 20:34:21 +00:00
Matt Arsenault 3dbeefa978 AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel
Currently the default C calling convention functions are treated
the same as compute kernels. Make this explicit so the default
calling convention can be changed to a non-kernel.

Converted with perl -pi -e 's/define void/define amdgpu_kernel void/'
on the relevant test directories (and undoing in one place that actually
wanted a non-kernel).

llvm-svn: 298444
2017-03-21 21:39:51 +00:00
Konstantin Zhuravlyov e14df4b236 [AMDGPU] Promote uniform i16 ops to i32 ops for targets that have 16 bit instructions
Differential Revision: https://reviews.llvm.org/D24125

llvm-svn: 282624
2016-09-28 20:05:39 +00:00