llvm-project

History

Tom Stellard bd8a0856e2 AMDGPU/SI: Better handle s_wait insertion We can wait on either VM, EXP or LGKM. The waits are independent. Without this patch, a wait inserted because of one of them would also wait for all the previous others. This patch makes s_wait only wait for the ones we need for the next instruction. Here's an example of subtle perf reduction this patch solves: This is without the patch: buffer_load_format_xyzw v[8:11], v0, s[44:47], 0 idxen buffer_load_format_xyzw v[12:15], v0, s[48:51], 0 idxen s_load_dwordx4 s[44:47], s[8:9], 0xc s_waitcnt lgkmcnt(0) buffer_load_format_xyzw v[16:19], v0, s[52:55], 0 idxen s_load_dwordx4 s[48:51], s[8:9], 0x10 s_waitcnt vmcnt(1) buffer_load_format_xyzw v[20:23], v0, s[44:47], 0 idxen The s_waitcnt vmcnt(1) is useless. The reason it is added is because the last buffer_load_format_xyzw needs s[44:47], which was issued by the first s_load_dwordx4. It waits for all VM before that call to have finished. Internally after every instruction, 3 counters (for VM, EXP and LGTM) are updated after every instruction. For example buffer_load_format_xyzw will increase the VM counter, and s_load_dwordx4 the LGKM one. Without the patch, for every defined register, the current 3 counters are stored, and are used to know how long to wait when an instruction needs the register. Because of that, the s[44:47] counter includes that to use the register you need to wait for the previous buffer_load_format_xyzw. Instead this patch stores only the counters that matter for the register, and puts zero for the other ones, since we don't need any wait for them. Patch by: Axel Davy Differential Revision: http://reviews.llvm.org/D11883 llvm-svn: 245755		2015-08-21 22:47:27 +00:00
..
AArch64	AArch64: Fix testcase of r245640	2015-08-21 00:23:19 +00:00
AMDGPU	AMDGPU/SI: Better handle s_wait insertion	2015-08-21 22:47:27 +00:00
ARM	[ARM] Fix MachO CPU Subtype selection	2015-08-21 21:52:48 +00:00
BPF	[bpf] rename triple names bpf_be -> bpfeb	2015-06-05 16:11:14 +00:00
CPP	[opaque pointer type] Add textual IR support for explicit type parameter to the call instruction	2015-04-16 23:24:18 +00:00
Generic	Update test suite to make "ninja check" succeed without native backend builtin	2015-08-04 06:32:54 +00:00
Hexagon	DI: Disallow uniquable DICompileUnits	2015-08-03 17:26:41 +00:00
Inputs	DI: Disallow uniquable DICompileUnits	2015-08-03 17:26:41 +00:00
MIR	MIR Serialization: Serialize the pointer IR expression values in the machine	2015-08-21 21:54:12 +00:00
MSP430	[opaque pointer type] Add textual IR support for explicit type parameter to gep operator	2015-03-13 18:20:45 +00:00
Mips	Revert r229675 - [mips] Avoid redundant sign extension of the result of binary bitwise instructions.	2015-08-04 14:26:35 +00:00
NVPTX	Use 32-bit divides instead of 64-bit divides where possible.	2015-08-11 22:16:34 +00:00
PowerPC	[PowerPC] PPCVSXFMAMutate should not segfault on undef input registers	2015-08-21 21:34:24 +00:00
SPARC	[Sparc] Support user-specified stack object overalignment.	2015-08-21 04:17:56 +00:00
SystemZ	[DAGCombiner] Attempt to mask vectors before zero extension instead of after.	2015-08-15 13:27:30 +00:00
Thumb	DI: Disallow uniquable DICompileUnits	2015-08-03 17:26:41 +00:00
Thumb2	ARMLoadStoreOptimizer: Create LDRD/STRD on thumb2	2015-07-21 00:18:59 +00:00
WebAssembly	[WebAssembly] Use the default alignment for SIMD types.	2015-08-19 20:30:20 +00:00
WinEH	[WinEH] Calculate state numbers for the new EH representation	2015-08-18 19:07:12 +00:00
X86	[x86] enable machine combiner reassociations for 256-bit vector min/max	2015-08-21 21:04:21 +00:00
XCore	DI: Disallow uniquable DICompileUnits	2015-08-03 17:26:41 +00:00