Commit Graph

6 Commits

Author SHA1 Message Date
Chris Wilson 2564fe708b drm/i915: Disable semaphore busywaits on saturated systems
Asking the GPU to busywait on a memory address, perhaps not unexpectedly
in hindsight for a shared system, leads to bus contention that affects
CPU programs trying to concurrently access memory. This can manifest as
a drop in transcode throughput on highly over-saturated workloads.

The only clue offered by perf, is that the bus-cycles (perf stat -e
bus-cycles) jumped by 50% when enabling semaphores. This corresponds
with extra CPU active cycles being attributed to intel_idle's mwait.

This patch introduces a heuristic to try and detect when more than one
client is submitting to the GPU pushing it into an oversaturated state.
As we already keep track of when the semaphores are signaled, we can
inspect their state on submitting the busywait batch and if we planned
to use a semaphore but were too late, conclude that the GPU is
overloaded and not try to use semaphores in future requests. In
practice, this means we optimistically try to use semaphores for the
first frame of a transcode job split over multiple engines, and fail if
there are multiple clients active and continue not to use semaphores for
the subsequent frames in the sequence. Periodically, we try to
optimistically switch semaphores back on whenever the client waits to
catch up with the transcode results.

With 1 client, on Broxton J3455, with the relative fps normalized by %cpu:

x no semaphores
+ drm-tip
* patched
+------------------------------------------------------------------------+
|                                                    *                   |
|                                                    *+                  |
|                                                    **+                 |
|                                                    **+  x              |
|                                x               *  +**+  x              |
|                                x  x       *    *  +***x xx             |
|                                x  x       *    * *+***x *x             |
|                                x  x*   +  *    * *****x *x x           |
|                         +    x xx+x*   + ***   * ********* x   *       |
|                         +    x xx+x*   * *** +** ********* xx  *       |
|    *   +         ++++*  +    x*x****+*+* ***+*************+x*  *       |
|*+ +** *+ + +* + *++****** *xxx**********x***+*****************+*++    *|
|                                   |__________A_____M_____|             |
|                           |_______________A____M_________|             |
|                                 |____________A___M________|            |
+------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x 120       2.60475       3.50941       3.31123     3.2143953    0.21117399
+ 120        2.3826       3.57077       3.25101     3.1414161    0.28146407
Difference at 95.0% confidence
	-0.0729792 +/- 0.0629585
	-2.27039% +/- 1.95864%
	(Student's t, pooled s = 0.248814)
* 120       2.35536       3.66713        3.2849     3.2059917    0.24618565
No difference proven at 95.0% confidence

With 10 clients over-saturating the pipeline:

x no semaphores
+ drm-tip
* patched
+------------------------------------------------------------------------+
|                     ++                                        **       |
|                     ++                                        **       |
|                     ++                                        **       |
|                     ++                                        **       |
|                     ++                                    xx ***       |
|                     ++                                    xx ***       |
|                     ++                                    xxx***       |
|                     ++                                    xxx***       |
|                    +++                                    xxx***       |
|                    +++                                    xx****       |
|                    +++                                    xx****       |
|                    +++                                    xx****       |
|                    +++                                    xx****       |
|                    ++++                                   xx****       |
|                   +++++                                   xx****       |
|                   +++++                                 x x******      |
|                  ++++++                                 xxx*******     |
|                  ++++++                                 xxx*******     |
|                  ++++++                                 xxx*******     |
|                  ++++++                                 xx********     |
|                  ++++++                               xxxx********     |
|                  ++++++                               xxxx********     |
|                ++++++++                             xxxxx*********     |
|+ +  +        + ++++++++                           xxx*xx**********x*  *|
|                                                         |__A__|        |
|                 |__AM__|                                               |
|                                                            |__A_|      |
+------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x 120       2.47855        2.8972       2.72376     2.7193402   0.074604933
+ 120       1.17367       1.77459       1.71977     1.6966782   0.085850697
Difference at 95.0% confidence
	-1.02266 +/- 0.0203502
	-37.607% +/- 0.748352%
	(Student's t, pooled s = 0.0804246)
* 120       2.57868       3.00821       2.80142     2.7923878   0.058646477
Difference at 95.0% confidence
	0.0730476 +/- 0.0169791
	2.68622% +/- 0.624383%
	(Student's t, pooled s = 0.0671018)

Indicating that we've recovered the regression from enabling semaphores
on this saturated setup, with a hint towards an overall improvement.

Very similar, but of smaller magnitude, results are observed on both
Skylake(gt2) and Kabylake(gt4). This may be due to the reduced impact of
bus-cycles, where we see a 50% hit on Broxton, it is only 10% on the big
core, in this particular test.

One observation to make here is that for a greedy client trying to
maximise its own throughput, using semaphores is the right choice. It is
only the holistic system-wide view that semaphores of one client
impacts another and reduces the overall throughput where we would choose
to disable semaphores.

The most noticeable negactive impact this has is on the no-op
microbenchmarks, which are also very notable for having no cpu bus load.
In particular, this increases the runtime and energy consumption of
gem_exec_whisper.

Fixes: e886196469 ("drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Cc: Dmitry Ermilov <dmitry.ermilov@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190504070707.30902-1-chris@chris-wilson.co.uk
(cherry picked from commit ca6e56f654)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2019-05-07 12:46:19 +03:00
Chris Wilson 4c5896dc4c drm/i915: Hold a reference to the active HW context
For virtual engines, we need to keep the HW context alive while it
remains in use. For regular HW contexts, they are created and kept alive
until the end of the GEM context. For simplicity, generalise the
requirements and keep an active reference to each HW context.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190318212347.30146-2-chris@chris-wilson.co.uk
2019-03-19 08:21:13 +00:00
Chris Wilson 206c2f812f drm/i915: Lock the gem_context->active_list while dropping the link
On unpinning the intel_context, we remove it from the active list
inside the GEM context. This list is supposed to be guarded by the GEM
context mutex, so remember to take it!

Fixes: 7e3d9a5941 ("drm/i915: Track active engines within a context")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190318212347.30146-1-chris@chris-wilson.co.uk
2019-03-19 08:21:11 +00:00
Chris Wilson 0881954965 drm/i915: Introduce intel_context.pin_mutex for pin management
Introduce a mutex to start locking the HW contexts independently of
struct_mutex, with a view to reducing the coarse struct_mutex. The
intel_context.pin_mutex is used to guard the transition to and from being
pinned on the gpu, and so is required before starting to build any
request. The intel_context will then remain pinned until the request
completes, but the mutex can be released immediately unpin completion of
pinning the context.

A slight variant of the above is used by per-context sseu that wants to
inspect the pinned status of the context, and requires that it remains
stable (either !pinned or pinned) across its operation. By using the
pin_mutex to serialise operations while pin_count==0, we can take that
pin_mutex for stabilise the boolean pin status.

v2: for Tvrtko!
* Improved commit message.
* Dropped _gpu suffix from gen8_modify_rpcs_gpu.
v3: Repair the locking for sseu selftests

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190308132522.21573-7-chris@chris-wilson.co.uk
2019-03-08 14:04:19 +00:00
Chris Wilson 95f697eb02 drm/i915: Make context pinning part of intel_context_ops
Push the intel_context pin callback down from intel_engine_cs onto the
context itself by virtue of having a central caller for
intel_context_pin() being able to lookup the intel_context itself.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190308132522.21573-5-chris@chris-wilson.co.uk
2019-03-08 13:59:59 +00:00
Chris Wilson c4d52feb2c drm/i915: Move over to intel_context_lookup()
In preparation for an ever growing number of engines and so ever
increasing static array of HW contexts within the GEM context, move the
array over to an rbtree, allocated upon first use.

Unfortunately, this imposes an rbtree lookup at a few frequent callsites,
but we should be able to mitigate those by moving over to using the HW
context as our primary type and so only incur the lookup on the boundary
with the user GEM context and engines.

v2: Check for no HW context in guc_stage_desc_init

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190308132522.21573-4-chris@chris-wilson.co.uk
2019-03-08 13:59:52 +00:00