llvm-project/llvm/test/CodeGen
Tom Stellard bd8a0856e2 AMDGPU/SI: Better handle s_wait insertion
We can wait on either VM, EXP or LGKM.
The waits are independent.

Without this patch, a wait inserted because of one of them
would also wait for all the previous others.
This patch makes s_wait only wait for the ones we need for the next
instruction.

Here's an example of subtle perf reduction this patch solves:

This is without the patch:

buffer_load_format_xyzw v[8:11], v0, s[44:47], 0 idxen
buffer_load_format_xyzw v[12:15], v0, s[48:51], 0 idxen
s_load_dwordx4 s[44:47], s[8:9], 0xc
s_waitcnt lgkmcnt(0)
buffer_load_format_xyzw v[16:19], v0, s[52:55], 0 idxen
s_load_dwordx4 s[48:51], s[8:9], 0x10
s_waitcnt vmcnt(1)
buffer_load_format_xyzw v[20:23], v0, s[44:47], 0 idxen

The s_waitcnt vmcnt(1) is useless.
The reason it is added is because the last
buffer_load_format_xyzw needs s[44:47], which was issued
by the first s_load_dwordx4. It waits for all VM
before that call to have finished.

Internally after every instruction, 3 counters (for VM, EXP and LGTM)
are updated after every instruction. For example buffer_load_format_xyzw
will
increase the VM counter, and s_load_dwordx4 the LGKM one.

Without the patch, for every defined register,
the current 3 counters are stored, and are used to know
how long to wait when an instruction needs the register.

Because of that, the s[44:47] counter includes that to use the register
you need to wait for the previous buffer_load_format_xyzw.

Instead this patch stores only the counters that matter for the
register,
and puts zero for the other ones, since we don't need any wait for them.

Patch by: Axel Davy

Differential Revision: http://reviews.llvm.org/D11883

llvm-svn: 245755
2015-08-21 22:47:27 +00:00
..
AArch64 AArch64: Fix testcase of r245640 2015-08-21 00:23:19 +00:00
AMDGPU AMDGPU/SI: Better handle s_wait insertion 2015-08-21 22:47:27 +00:00
ARM [ARM] Fix MachO CPU Subtype selection 2015-08-21 21:52:48 +00:00
BPF [bpf] rename triple names bpf_be -> bpfeb 2015-06-05 16:11:14 +00:00
CPP [opaque pointer type] Add textual IR support for explicit type parameter to the call instruction 2015-04-16 23:24:18 +00:00
Generic Update test suite to make "ninja check" succeed without native backend builtin 2015-08-04 06:32:54 +00:00
Hexagon DI: Disallow uniquable DICompileUnits 2015-08-03 17:26:41 +00:00
Inputs DI: Disallow uniquable DICompileUnits 2015-08-03 17:26:41 +00:00
MIR MIR Serialization: Serialize the pointer IR expression values in the machine 2015-08-21 21:54:12 +00:00
MSP430 [opaque pointer type] Add textual IR support for explicit type parameter to gep operator 2015-03-13 18:20:45 +00:00
Mips Revert r229675 - [mips] Avoid redundant sign extension of the result of binary bitwise instructions. 2015-08-04 14:26:35 +00:00
NVPTX Use 32-bit divides instead of 64-bit divides where possible. 2015-08-11 22:16:34 +00:00
PowerPC [PowerPC] PPCVSXFMAMutate should not segfault on undef input registers 2015-08-21 21:34:24 +00:00
SPARC [Sparc] Support user-specified stack object overalignment. 2015-08-21 04:17:56 +00:00
SystemZ [DAGCombiner] Attempt to mask vectors before zero extension instead of after. 2015-08-15 13:27:30 +00:00
Thumb DI: Disallow uniquable DICompileUnits 2015-08-03 17:26:41 +00:00
Thumb2 ARMLoadStoreOptimizer: Create LDRD/STRD on thumb2 2015-07-21 00:18:59 +00:00
WebAssembly [WebAssembly] Use the default alignment for SIMD types. 2015-08-19 20:30:20 +00:00
WinEH [WinEH] Calculate state numbers for the new EH representation 2015-08-18 19:07:12 +00:00
X86 [x86] enable machine combiner reassociations for 256-bit vector min/max 2015-08-21 21:04:21 +00:00
XCore DI: Disallow uniquable DICompileUnits 2015-08-03 17:26:41 +00:00