llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	5660bb6bc9	AMDGPU: Remove denormal subtarget features Switch to using the denormal-fp-math/denormal-fp-math-f32 attributes.	2020-04-02 17:17:12 -04:00
Matt Arsenault	75cf30918f	AMDGPU: Assume f32 denormals are enabled by default This will likely introduce catastrophic performance regressions on older subtargets, but should be correct. A follow up change will remove the old fp32-denormals subtarget features, and switch to using the new denormal-fp-math/denormal-fp-math-f32 attributes. Frontends should be making sure to add the denormal-fp-math-f32 attribute when appropriate to avoid performance regressions.	2020-04-02 17:17:12 -04:00
Changpeng Fang	884acbb9e1	AMDGPU: Enhancement on FDIV lowering in AMDGPUCodeGenPrepare Summary: The accuracy limit to use rcp is adjusted to 1.0 ulp from 2.5 ulp. Also, afn instead of arcp is used to allow inaccurate rcp to be used. Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D73588	2020-02-07 11:46:23 -08:00
Changpeng Fang	2531535984	AMDGPU: Implement FDIV optimizations in AMDGPUCodeGenPrepare Summary: RCP has the accuracy limit. If FDIV fpmath require high accuracy rcp may not meet the requirement. However, in DAG lowering, fpmath information gets lost, and thus we may generate either inaccurate rcp related computation or slow code for fdiv. In patch implements fdiv optimizations in the AMDGPUCodeGenPrepare, which could exactly know !fpmath. FastUnsafeRcpLegal: We determine whether it is legal to use rcp based on unsafe-fp-math, fast math flags, denormals and fpmath accuracy request. RCP Optimizations: 1/x -> rcp(x) when fast unsafe rcp is legal or fpmath >= 2.5ULP with denormals flushed. a/b -> a*rcp(b) when fast unsafe rcp is legal. Use fdiv.fast: a/b -> fdiv.fast(a, b) when RCP optimization is not performed and fpmath >= 2.5ULP with denormals flushed. 1/x -> fdiv.fast(1,x) when RCP optimization is not performed and fpmath >= 2.5ULP with denormals. Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D71293	2020-01-23 16:57:43 -08:00
Stanislav Mekhanoshin	9d7b1c9ddb	[AMDGPU] Always use rcp + mul with fast math Regardless of relaxation options such as -cl-fast-relaxed-math we are producing rather long code for fdiv via amdgcn_fdiv_fast intrinsic. This intrinsic is used to replace fdiv with 2.5ulp metadata and does not handle denormals, thus believed to be fast. An fdiv instruction can also have fast math flag either by itself or together with fpmath metadata. Clang used with a relaxation flag always produces both metadata and fast flag: %div = fdiv fast float %v, %0, !fpmath !12 !12 = !{float 2.500000e+00} Current implementation ignores fast flag and favors metadata. An instruction with just fast flag would be lowered to a fastest rcp + mul, but that never happen on practice because of described mutual clang and BE behavior. This change allows an "fdiv fast" to be always lowered as rcp + mul. Differential Revision: https://reviews.llvm.org/D34844 llvm-svn: 307308	2017-07-06 20:34:21 +00:00
Matt Arsenault	3dbeefa978	AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default calling convention can be changed to a non-kernel. Converted with perl -pi -e 's/define void/define amdgpu_kernel void/' on the relevant test directories (and undoing in one place that actually wanted a non-kernel). llvm-svn: 298444	2017-03-21 21:39:51 +00:00
Konstantin Zhuravlyov	e14df4b236	[AMDGPU] Promote uniform i16 ops to i32 ops for targets that have 16 bit instructions Differential Revision: https://reviews.llvm.org/D24125 llvm-svn: 282624	2016-09-28 20:05:39 +00:00

7 Commits