llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	4b47213951	AMDGPU: Switch backend default max workgroup size to 1024 Previously this would default to 256, not the maximum supported size of 1024. Using a maximum lower than the hardware maximum requires language runtimes to enforce this limit for correctness, which no language has correctly done. Switch the default to the conservatively correct maximum, and force frontends to opt-in to the more optimal 256 default maximum. I don't really understand why the changes in occupancy-levels.ll increased the computed occupancy, which I expected to decrease. I'm not sure if these tests should be forcing the old maximum.	2019-11-13 07:11:02 +05:30
Changpeng Fang	ba92059ca9	AMDGPU/SI: Extend promoting alloca to vector to arrays of up to 16 elements Summary: This patch extends the promotion of alloca to vector to the arrays of up to 16 elements. Also we introduce an option, -disable-promote-alloca-to-vector, to switch promotion to vector off, if needed. Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D33559 llvm-svn: 325372	2018-02-16 19:14:17 +00:00
Yaxun Liu	cc56a8b108	[AMDGPU] Change alloca addr space of r600 to 5 for amdgiz environment Differential Revision: https://reviews.llvm.org/D39657 llvm-svn: 317479	2017-11-06 14:32:33 +00:00
Stanislav Mekhanoshin	c90347d760	[AMDGPU] Generate range metadata for workitem id If workgroup size is known inform llvm about range returned by local id and local size queries. Differential Revision: https://reviews.llvm.org/D31804 llvm-svn: 300102	2017-04-12 20:48:56 +00:00
Matt Arsenault	3dbeefa978	AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default calling convention can be changed to a non-kernel. Converted with perl -pi -e 's/define void/define amdgpu_kernel void/' on the relevant test directories (and undoing in one place that actually wanted a non-kernel). llvm-svn: 298444	2017-03-21 21:39:51 +00:00
Konstantin Zhuravlyov	1d65026ca6	[AMDGPU] Wave and register controls - Implemented amdgpu-flat-work-group-size attribute - Implemented amdgpu-num-active-waves-per-eu attribute - Implemented amdgpu-num-sgpr attribute - Implemented amdgpu-num-vgpr attribute - Dynamic LDS constraints are in a separate patch Patch by Tom Stellard and Konstantin Zhuravlyov Differential Revision: https://reviews.llvm.org/D21562 llvm-svn: 280747	2016-09-06 20:22:28 +00:00
Matt Arsenault	8a028bf4d7	AMDGPU: Fix promote alloca pass creating huge arrays This was assuming it could use all memory before, which is a bad decision because it restricts occupancy. By default, only try to use enough space that could reduce occupancy to 7, an arbitrarily chosen limit. Based on the exist LDS usage, try to round up to the limit in the current tier instead of further hurting occupancy. This isn't ideal, because it doesn't accurately know how much space is going to be used for alignment padding. llvm-svn: 269708	2016-05-16 21:19:59 +00:00
Matt Arsenault	de4208122b	AMDGPU: Do not promote allocas with non-inbounds GEPs If we can't assume the pointer value isn't within the bounds of the object, it seems risky to try to replace the pointer calculations. llvm-svn: 259573	2016-02-02 21:16:12 +00:00
Matt Arsenault	e013246462	AMDGPU: Fix emitting invalid workitem intrinsics for HSA The AMDGPUPromoteAlloca pass was emitting the read.local.size calls, which with HSA was incorrectly selected to reading from the offset mesa uses off of the kernarg pointer. Error on intrinsics which aren't supported by HSA, and start emitting the correct IR to read the workgroup size out of the dispatch pointer. Also initialize the pass so it can be tested with opt, and start moving towards not depending on the subtarget as an argument. Start emitting errors for the intrinsics not handled with HSA. llvm-svn: 259297	2016-01-30 05:19:45 +00:00

9 Commits