Go to file
Tom Stellard bd8a0856e2 AMDGPU/SI: Better handle s_wait insertion
We can wait on either VM, EXP or LGKM.
The waits are independent.

Without this patch, a wait inserted because of one of them
would also wait for all the previous others.
This patch makes s_wait only wait for the ones we need for the next
instruction.

Here's an example of subtle perf reduction this patch solves:

This is without the patch:

buffer_load_format_xyzw v[8:11], v0, s[44:47], 0 idxen
buffer_load_format_xyzw v[12:15], v0, s[48:51], 0 idxen
s_load_dwordx4 s[44:47], s[8:9], 0xc
s_waitcnt lgkmcnt(0)
buffer_load_format_xyzw v[16:19], v0, s[52:55], 0 idxen
s_load_dwordx4 s[48:51], s[8:9], 0x10
s_waitcnt vmcnt(1)
buffer_load_format_xyzw v[20:23], v0, s[44:47], 0 idxen

The s_waitcnt vmcnt(1) is useless.
The reason it is added is because the last
buffer_load_format_xyzw needs s[44:47], which was issued
by the first s_load_dwordx4. It waits for all VM
before that call to have finished.

Internally after every instruction, 3 counters (for VM, EXP and LGTM)
are updated after every instruction. For example buffer_load_format_xyzw
will
increase the VM counter, and s_load_dwordx4 the LGKM one.

Without the patch, for every defined register,
the current 3 counters are stored, and are used to know
how long to wait when an instruction needs the register.

Because of that, the s[44:47] counter includes that to use the register
you need to wait for the previous buffer_load_format_xyzw.

Instead this patch stores only the counters that matter for the
register,
and puts zero for the other ones, since we don't need any wait for them.

Patch by: Axel Davy

Differential Revision: http://reviews.llvm.org/D11883

llvm-svn: 245755
2015-08-21 22:47:27 +00:00
clang [SemaObjC] Remove unused code from test. 2015-08-21 20:28:16 +00:00
clang-tools-extra [clang-tidy] Migrate UseAuto from clang-modernize to clang-tidy. 2015-08-21 15:08:51 +00:00
compiler-rt [MSan] Deprecate __msan_set_death_callback() in favor of __sanitizer_set_death_callback(). 2015-08-21 22:45:12 +00:00
debuginfo-tests New round of fixes for "Always compile debuginfo-tests for the host triple" 2014-10-18 23:47:59 +00:00
libclc Remove files accidentally not removed in r244310 2015-08-13 23:43:12 +00:00
libcxx Remove completed items from TODO.TXT 2015-08-20 19:22:35 +00:00
libcxxabi Fix or disable C++11 tests in C++03 mode 2015-08-20 01:22:17 +00:00
libunwind unwind: fix invalid memory access 2015-08-21 03:21:31 +00:00
lld COFF: Improve debug helper function. 2015-08-21 07:01:10 +00:00
lldb XFAIL pthreads test on Windows. 2015-08-21 22:11:50 +00:00
llgo Update to new lists.llvm.org 2015-08-05 04:03:05 +00:00
llvm AMDGPU/SI: Better handle s_wait insertion 2015-08-21 22:47:27 +00:00
openmp Update z_Linux_asm.s to use platform macros 2015-08-20 19:46:14 +00:00
polly Fix 'unused variable' warning in NASSERTS build 2015-08-21 19:23:21 +00:00