Rather than using wait_for_atomic() when chacking for a response from
the GuC, we can get the effect of a hybrid spin/sleep wait by breaking
it into two stages. First, spin-wait for up to 10us to minimise latency
for "quick" commands; then, if that times out, sleep-wait for up 10ms
(the maximum allowed for a "slow" command).
Being able to do this depends on the recent patch
18f4b84 drm/i915: Use atomic waits for short non-atomic ones
and is similar to the hybrid approach in
1758b90 drm/i915: Use a hybrid scheme for fast register waits
(although we can't use that as-is, because that interface doesn't quite
match what we need here).
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467815411-21756-1-git-send-email-david.s.gordon@intel.com
Since drm_i915_private is now a subclass of drm_device we do not need to
chase the drm_i915_private->dev backpointer and can instead simply
access drm_i915_private->drm directly.
text data bss dec hex filename
1068757 4565 416 1073738 10624a drivers/gpu/drm/i915/i915.ko
1066949 4565 416 1071930 105b3a drivers/gpu/drm/i915/i915.ko
Created by the coccinelle script:
@@
struct drm_i915_private *d;
identifier i;
@@
(
- d->dev->i
+ d->drm.i
|
- d->dev
+ &d->drm
)
and for good measure the dev_priv->dev backpointer was removed entirely.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467711623-2905-4-git-send-email-chris@chris-wilson.co.uk
Since we now subclass struct drm_device, we can save pointer dances by
noting the equivalence of struct drm_device and struct drm_i915_private,
i.e. by using to_i915().
text data bss dec hex filename
1073824 4562 416 1078802 107612 drivers/gpu/drm/i915/i915.ko
1068976 4562 416 1073954 106322 drivers/gpu/drm/i915/i915.ko
Created by the coccinelle script:
@@
expression E;
identifier p;
@@
- struct drm_i915_private *p = E->dev_private;
+ struct drm_i915_private *p = to_i915(E);
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467628477-25379-1-git-send-email-chris@chris-wilson.co.uk
No need for local struct drm_device * since dev_priv is the
correct thing to pass in to NEEDS_WaRsDisableCoarsePowerGating
anyway. Changed the macro definition for the latter to reflect
that as well.
v2: Alignment bikeshed.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1466518034-24838-1-git-send-email-tvrtko.ursulin@linux.intel.com
The ONLY places that guc_id (aka hw_id) should be used are those where
the value or address is determined by and shared with the GuC firmware;
specifically, when filling in the GuC-context-descriptor or the GuC
addon data, or putting an entry in the GuC's work queue.
It need not (and therefore should not) be used to index GuC statistics
or similar host-managed tracking data. In particular, i915_guc_submit()
produces (and debugfs decodes) GuC submission statistics which should be
indexed by driver-engine-id rather then guc-engine-id.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466432287-5799-1-git-send-email-david.s.gordon@intel.com
During a hibernate/resume cycle, the whole system is reset, including
the GuC and the doorbell hardware. Then the system is booted up, drivers
are loaded, etc -- the GuC firmware may be loaded and set running at
this point. But then, the booted kernel is replaced by the hibernated
image, and this resumed kernel will also try to reload the GuC firmware
(which will fail). To recover, we reset the GuC and try again (which
should work). But this GuC reset doesn't also reset the doorbell
hardware, so it can be left in a state inconsistent with that assumed
by the driver and/or the newly-loaded GuC firmware.
It would be better if the GuC reset also cleared all doorbell state,
but that's not how the hardware currently works; also, the driver cannot
directly reprogram the doorbell hardware (only the GuC can do that).
So this patch cycles through all doorbells, assigning and releasing each
in turn, so that all the doorbell hardware is left in a consistent
state, no matter how it was programmed by the previously-running kernel
and/or GuC firmware.
v2: don't use kmap_atomic() now that client page 0 is kept mapped.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465837054-16245-2-git-send-email-david.s.gordon@intel.com
This version doesn't update the doorbell bitmap, as that will
be done when the selected doorbell is associated with a client.
The call is now slightly earlier, just on the general principle
that potentially-failing operations should be done as early as
possible, to eliminate late failures and simplify recovery.
Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
This patch refactors the driver's handling and tracking of doorbells, in
preparation for a later one which will resolve a suspend-resume issue.
There are three resources to be managed:
1. Cachelines: a single line within the client-object's page 0
is snooped by doorbell hardware for writes from the host.
2. Doorbell registers: each defines one cacheline to be snooped.
3. Bitmap: tracks which doorbell registers are in use.
The doorbell setup/teardown protocol starts with:
1. Pick a cacheline: select_doorbell_cacheline()
2. Find an available doorbell register: assign_doorbell()
(These values are passed to the GuC via the shared context
descriptor; this part of the sequence remains unchanged).
3. Update the bitmap to reflect registers-in-use
4. Prepare the cacheline for use by setting its status to ENABLED
5. Ask the GuC to program the doorbell to snoop the cacheline
and of course teardown is very similar:
6. Set the cacheline to DISABLED
7. Ask the GuC to reprogram the doorbell to stop snooping
8. Record that the doorbell is not in use.
Operations 6-8 (guc_disable_doorbell(), host2guc_release_doorbell(), and
release_doorbell()) were called in sequence from guc_client_free(), but
are now moved into the teardown phase of the common function.
Steps 4-5 (guc_init_doorbell() and host2guc_allocate_doorbell()) were
similarly done as sequential steps in guc_client_alloc(), but since it
turns out that we don't need to be able to do them separately they're
now collected into the setup phase of the common function.
The only new code (and new capability) is the block tagged
/* Update the GuC's idea of the doorbell ID */
i.e. we can now *change* the doorbell register used by an existing
client, whereas previously it was set once for the entire lifetime
of the client. We will use this new feature in the next patch.
v2: Trivial independent fixes pushed ahead as separate patches.
MUCH longer commit message :) [Tvrtko Ursulin]
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Just code movement, no actual change to the function. This is in
preparation for the next patch, which will reorganise all the other
doorbell code, but doesn't change this function. So let's shuffle it
down near its caller rather than leaving it mixed in with the setup
code. Unlike the doorbell management code, this function is somewhat
time-critical, so putting it near its caller may even yield a tiny
performance improvement.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
These registers are not actually writable by the CPU; only the GuC can
actually program them. So let's not do writes that have no effect.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Bitmap operators are overkill when touching only one bit.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
There are four non-static functions in i915_guc_submission.c that take a
'dev' parameter. All are called only from GuC loader code, and can be
easily converted to accept 'dev_priv' instead.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465579766-31595-1-git-send-email-david.s.gordon@intel.com
Convert all static functions in i915_guc_submission.c that currently
take a 'dev' pointer to take 'dev_priv' instead (there are three,
guc_client_alloc(), guc_client_free(), and gem_allocate_guc_obj().
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
When resetting and reloading the GuC, the GuC submission management code
also needs to destroy and recreate the GuC client(s). Currently this is
done by a separate call from the GuC loader, but really, it's just an
internal detail of the submission code. So here we remove the call from
the loader (which is too late, really, because the GuC has already been
reloaded at this point) and put it into guc_submission_init() instead.
This means that any preexisting client is destroyed *before* the GuC
(re)load and then recreated after, iff the firmware was successfully
loaded. If the GuC reload fails, we don't recreate the client, so
fallback to execlists mode (if active) won't leak the client object
(previously, the now-unusable client would have been left allocated,
and leaked if the driver were unloaded).
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Just move the kernel_context member of drm_i915_private next to the
engines it is associated with.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1464098023-3294-7-git-send-email-chris@chris-wilson.co.uk
We want to give a name to the currently anonymous per-engine struct
inside the context, so that we can assign it to a local variable and
save clumsy typing. The name we have chosen is intel_context as it
reflects the HW facing portion of the context state (the logical context
state, the registers, the ringbuffer etc).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1464098023-3294-4-git-send-email-chris@chris-wilson.co.uk
Our goal is to rename the anonymous per-engine struct beneath the
current intel_context. However, after a lively debate resolving around
the confusion between intel_context_engine and intel_engine_context, the
realisation is that the two structs target different users. The outer
struct is API / user facing, and so carries the higher level GEM
information. The inner struct is hw facing. Thus we want to name the
inner struct intel_context and the outer one i915_gem_context. As the
first step, we need to rename the current struct:
s/struct intel_context/struct i915_gem_context/
which fits much better with its constructors already conveying the
i915_gem_context prefix!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1464098023-3294-1-git-send-email-chris@chris-wilson.co.uk
Mostly little optimisations and future-proofing against code breakage.
For instance, if the driver is correctly following the submission
protocol, the "out of space" condition is impossible, so the previous
runtime WARN_ON() is promoted to a GEM_BUG_ON() for a more dramatic
effect in development and less impact in end-user systems.
Similarly we can make alignment checking more stringent and replace
other WARN_ON() conditions that don't relate to the runtime hardware
state with either BUILD_BUG_ON() for compile-time-detectable issues, or
GEM_BUG_ON() for logical "can't happen" errors.
With those changes, we can convert it to void, as suggested by Chris
Wilson, and update the calling code appropriately.
v2:
Note that we're now putting the request seqno in the "fence_id"
field of each GuC-work-item, in case it turns up somewhere useful
(e.g. in a GuC log) [Tvrtko Ursulin].
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Rather than wait to see whether more space becomes available in the GuC
submission workqueue, we can just return -EAGAIN and let the caller try
again in a little while. This gets rid of an uninterruptable sleep in
the polling code :)
We'll also add a counter to the GuC client statistics, to see how often
we find the WQ full.
v2:
Flag the likely() code path (Tvtrko Ursulin).
v4:
Add/update comments about failure counters (Tvtrko Ursulin).
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
The knowledge of how to derive the relevant client from the request
should be localised within i915_guc_submission.c; the LRC code shouldn't
have to know about the internal details of the GuC submission process.
And all the information the GuC code needs should be encapsulated in (or
reachable from) the request.
v2:
GEM_BUG_ON() for bad GuC client (Tvrtko Ursulin).
Add/update kerneldoc explaining check_space/submit protocol
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Split the function of "enable_guc_submission" into two separate
options. The new one ("enable_guc_loading") controls only the
*fetching and loading* of the GuC firmware image. The existing
one is redefined to control only the *use* of the GuC for batch
submission once the firmware is loaded.
In addition, the degree of control has been refined from a simple
bool to an integer key, allowing several options:
-1 (default) whatever the platform default is
0 DISABLE don't load/use the GuC
1 BEST EFFORT try to load/use the GuC, fallback if not available
2 REQUIRE must load/use the GuC, else leave the GPU wedged
The new platform default (as coded here) will be to attempt to
load the GuC iff the device has a GuC that requires firmware,
but not yet to use it for submission. A later patch will change
to enable it if appropriate.
v4:
Changed some error-message levels, mostly ERROR->INFO, per
review comments by Tvrtko Ursulin.
v5:
Dropped one more error message, disabled GuC submission on
hypothetical firmware-free devices [Tvrtko Ursulin].
v6:
Logging tidy by Tvrtko Ursulin:
* Do not log falling back to execlists when wedging the GPU.
* Do not log fw load errors when load was disabled by user.
* Pass down some error code from fw load for log message to
make more sense.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> (v5)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Tested-by: Fiedorowicz, Lukasz <lukasz.fiedorowicz@intel.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> (v5)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Nick Hoath <nicholas.hoath@intel.com> (v6)
Pass drm_i915_private to the uncore init/fini routines and their
subservients as it is their native type.
text data bss dec hex filename
6309978 3578778 696320 10585076 a183f4 vmlinux
6309530 3578778 696320 10584628 a18234 vmlinux
a modest 400 bytes of saving, but 60 lines of code deleted!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1462885804-26750-1-git-send-email-chris@chris-wilson.co.uk
Propagate the real error from drm_gem_object_init(). Note this also
fixes some confusion in the error return from i915_gem_alloc_object...
v2:
(Matthew Auld)
- updated new users of gem_alloc_object from latest drm-nightly
- replaced occurrences of IS_ERR_OR_NULL() with IS_ERR()
v3:
(Joonas Lahtinen)
- fix double "From:" in commit message
- add goto teardown path
v4:
(Matthew Auld)
- rebase with i915_gem_alloc_object name change
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461587533-8841-1-git-send-email-matthew.auld@intel.com
[Joonas: Removed spurious " = NULL" from _init() function]
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Because having both i915_gem_object_alloc() and i915_gem_alloc_object()
(with different return conventions) is just too confusing!
(i915_gem_object_alloc() is the low-level memory allocator, and remains
unchanged, whereas i915_gem_alloc_object() is a constructor that ALSO
initialises the newly-allocated object.)
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461348872-4702-1-git-send-email-david.s.gordon@intel.com
Now that we keep the GuC client process descriptor permanently mapped,
we don't really need to keep a local copy of the GuC's work-queue-head.
So we can simplify the code a little by not doing this.
Signed-off-by: Alex Dai <yu.dai@intel.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Don't use kmap_atomic() for doorbell & process descriptor access.
This patch fixes the BUG shown below, where the thread could sleep
while holding a kmap_atomic mapping. In order not to need to call
kmap_atomic() in this code path, we now set up a permanent kernel
mapping of the shared doorbell and process-descriptor page, and
use that in all doorbell and process-descriptor related code.
BUG: scheduling while atomic: gem_close_race/1941/0x00000002
Modules linked in: hid_generic usbhid i915 asix usbnet libphy mii
i2c_algo_bit drm_kms_helper cfbfillrect syscopyarea cfbimgblt
sysfillrect sysimgblt fb_sys_fops cfbcopyarea drm coretemp i2c_hid
hid video pinctrl_sunrisepoint pinctrl_intel acpi_pad nls_iso8859_1
e1000e ptp psmouse pps_core ahci libahci
CPU: 0 PID: 1941 Comm: gem_close_race Tainted: G U 4.4.0-160121+ #123
Hardware name: Intel Corporation Skylake Client platform/Skylake AIO
DDR3L RVP10, BIOS SKLSE2R1.R00.X100.B01.1509220551 09/22/2015
0000000000013e40 ffff880166c27a78 ffffffff81280d02 ffff880172c13e40
ffff880166c27a88 ffffffff810c203a ffff880166c27ac8 ffffffff814ec808
ffff88016b7c6000 ffff880166c28000 00000000000f4240 0000000000000001
Call Trace:
[<ffffffff81280d02>] dump_stack+0x4b/0x79
[<ffffffff810c203a>] __schedule_bug+0x41/0x4f
[<ffffffff814ec808>] __schedule+0x5a8/0x690
[<ffffffff814ec927>] schedule+0x37/0x80
[<ffffffff814ef3fd>] schedule_hrtimeout_range_clock+0xad/0x130
[<ffffffff81090be0>] ? hrtimer_init+0x10/0x10
[<ffffffff814ef3f1>] ? schedule_hrtimeout_range_clock+0xa1/0x130
[<ffffffff814ef48e>] schedule_hrtimeout_range+0xe/0x10
[<ffffffff814eef9b>] usleep_range+0x3b/0x40
[<ffffffffa01ec109>] i915_guc_wq_check_space+0x119/0x210 [i915]
[<ffffffffa01da47c>] intel_logical_ring_alloc_request_extras+0x5c/0x70 [i915]
[<ffffffffa01cdbf1>] i915_gem_request_alloc+0x91/0x170 [i915]
[<ffffffffa01c1c07>] i915_gem_do_execbuffer.isra.25+0xbc7/0x12a0 [i915]
[<ffffffffa01cb785>] ? i915_gem_object_get_pages_gtt+0x225/0x3c0 [i915]
[<ffffffffa01d1fb6>] ? i915_gem_pwrite_ioctl+0xd6/0x9f0 [i915]
[<ffffffffa01c2e68>] i915_gem_execbuffer2+0xa8/0x250 [i915]
[<ffffffffa00f65d8>] drm_ioctl+0x258/0x4f0 [drm]
[<ffffffffa01c2dc0>] ? i915_gem_execbuffer+0x340/0x340 [i915]
[<ffffffff8111590d>] do_vfs_ioctl+0x2cd/0x4a0
[<ffffffff8111eac2>] ? __fget+0x72/0xb0
[<ffffffff81115b1c>] SyS_ioctl+0x3c/0x70
[<ffffffff814effd7>] entry_SYSCALL_64_fastpath+0x12/0x6a
------------[ cut here ]------------
v4:
Only tear down doorbell & kunmap() client object if we actually
succeeded in allocating a client object (Tvrtko Ursulin)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93847
Original-version-by: Alex Dai <yu.dai@intel.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Tvtrko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Having provided for_each_engine_id() for cases where the third (id)
argument is useful, we can now replace all the remaining instances with
a simpler version that takes only two parameters. In many cases, this
also allows the elimination of the local variable used in the iterator
(usually 'i').
v2:
s/dev_priv/(dev_priv__)/ in body of for_each_engine_masked() [Chris Wilson]
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1458757194-17783-2-git-send-email-david.s.gordon@intel.com
Equivalent to the existing for_each_engine() macro, this will replace
the latter wherever the third argument *is* actually wanted (in most
places, it is not used). The third argument is renamed to emphasise
that it is an engine id (type enum intel_engine_id). All the callers of
the macro that actually need the third argument are updated to use this
version, and the argument (generally 'i') is also updated to be 'id'.
Other callers (where the third argument is unused) are untouched for
now; they will be updated in the next patch.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Some trivial ones, first pass done with Coccinelle:
@@
@@
(
- I915_NUM_RINGS
+ I915_NUM_ENGINES
|
- intel_ring_flag
+ intel_engine_flag
|
- for_each_ring
+ for_each_engine
|
- i915_gem_request_get_ring
+ i915_gem_request_get_engine
|
- intel_ring_idle
+ intel_engine_idle
|
- i915_gem_reset_ring_status
+ i915_gem_reset_engine_status
|
- i915_gem_reset_ring_cleanup
+ i915_gem_reset_engine_cleanup
|
- init_ring_lists
+ init_engine_lists
)
But that didn't fully work so I cleaned it up with:
for f in *.[hc]; do sed -i -e s/I915_NUM_RINGS/I915_NUM_ENGINES/ $f; done
for f in *.[hc]; do sed -i -e s/i915_gem_request_get_ring/i915_gem_request_get_engine/ $f; done
for f in *.[hc]; do sed -i -e s/intel_ring_flag/intel_engine_flag/ $f; done
for f in *.[hc]; do sed -i -e s/intel_ring_idle/intel_engine_idle/ $f; done
for f in *.[hc]; do sed -i -e s/init_ring_lists/init_engine_lists/ $f; done
for f in *.[hc]; do sed -i -e s/i915_gem_reset_ring_cleanup/i915_gem_reset_engine_cleanup/ $f; done
for f in *.[hc]; do sed -i -e s/i915_gem_reset_ring_status/i915_gem_reset_engine_status/ $f; done
v2: Rebase.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Previously GuC uses ring id as engine id because of same definition.
But this is not true since this commit:
commit de1add3605
Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Date: Fri Jan 15 15:12:50 2016 +0000
drm/i915: Decouple execbuf uAPI from internal implementation
Added GuC engine id into GuC interface to decouple it from ring id used
by driver.
v2: Keep ring name print out in debugfs; using for_each_ring() where
possible to keep driver consistent. (Chris W.)
Signed-off-by: Alex Dai <yu.dai@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1453579094-29860-1-git-send-email-yu.dai@intel.com
Since:
commit 82352e908a
Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Date: Fri Jan 15 17:12:45 2016 +0000
drm/i915: Cache LRC state page in the context
and:
commit 0eb973d31d
Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Date: Fri Jan 15 15:10:28 2016 +0000
drm/i915: Cache ringbuffer GTT VMA
We can also remove the ring buffer start updates on every
context update since the address will not change for the
duration of the LRC pin.
For GuC we can remove the update altogether because it
only cares about the ring buffer start.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Alex Dai <yu.dai@intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1453466567-33369-1-git-send-email-tvrtko.ursulin@linux.intel.com
Now that we've eliminated a lot of uses of ring->default_context,
we can eliminate the pointer itself.
All the engines share the same default intel_context, so we can just
keep a single reference to it in the dev_priv structure rather than one
in each of the engine[] elements. This make refcounting more sensible
too, as we now have a refcount of one for the one pointer, rather than
a refcount of one but multiple pointers.
From an idea by Chris Wilson.
v2: transform an extra instance of ring->default_context introduced by
42f1cae8c drm/i915: Restore inhibiting the load of the default context
That patch's commentary includes:
v2: Mark the global default context as uninitialized on GPU reset so
that the context-local workarounds are reloaded upon re-enabling
The code implementing that now also benefits from the replacement of
the multiple (per-ring) pointers to the default context with a single
pointer to the unique kernel context.
v4: Rebased, remove underused local (Nick Hoath)
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Nick Hoath <nicholas.hoath@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1453230175-19330-3-git-send-email-david.s.gordon@intel.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
GuC needs to know which registers and how they will be saved and
restored during event such as engine reset or power state changes.
For now only the base address of reg state is initialized. The
detail register table probably will be setup in future GuC TDR or
Preemption patch series.
Signed-off-by: Alex Dai <yu.dai@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1450468812-4882-5-git-send-email-yu.dai@intel.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
GuC supports different scheduling policies for its four internal
queues. Currently these have been set to the same default values
as KMD_NORMAL queue.
Particularly POLICY_MAX_NUM_WI is set to 15 to match GuC internal
maximum submit queue numbers to avoid an out-of-space problem.
This value indicates max number of work items allowed to be queued
for one DPC process. A smaller value will let GuC schedule more
frequently while a larger number may increase chances to optimize
cmds (such as collapse cmds from same lrc) with risks that keeps
CS idle.
v1: tidy up code
Signed-off-by: Alex Dai <yu.dai@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1450468812-4882-4-git-send-email-yu.dai@intel.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The GuC firmware uses this for various purposes. The ADS itself is
a chunk of memory created by driver to share with GuC. Its members
are usually addresses telling where GuC to access them, including
things like scheduler policies, register list that will be saved
and restored during reset etc.
This is the first patch of a series to enable GuC ADS. For now, we
only create the ADS obj whilst keep it disabled.
v1: remove dead code checking return of kmap_atomic (Chris Wilson)
v2: use kmap instead of the atomic version of it.
Signed-off-by: Alex Dai <yu.dai@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1450468812-4882-3-git-send-email-yu.dai@intel.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Split GuC work queue space checking from submission and move it to
ring_alloc_request_extras. The reason is that failure in later
i915_add_request() won't be handled. In the case timeout happens,
driver can return early in order to handle the error.
v1: Move wq_reserve_space to ring_reserve_space
v2: Move wq_reserve_space to alloc_request_extras (Chris Wilson)
v3: The work queue head pointer is cached by driver now. So we can
quickly return if space is available.
s/reserve/check/g (Dave Gordon)
v4: Update cached wq head after ring doorbell; check wq space before
ring doorbell in case unexpected error happens; call wq space
check only when GuC submission is enabled. (Dave Gordon)
Signed-off-by: Alex Dai <yu.dai@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1450295155-10050-1-git-send-email-yu.dai@intel.com
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
commit 344df9809f ("drm/i915/skl: Disable coarse power gating up until F0")
failed to take into account that the same workaround is used in guc
when forcewake is sampled.
Wrap the condition check inside a macro and use it in both places
to fix the guc side scope.
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1450286318-6854-1-git-send-email-mika.kuoppala@intel.com
In various places, a single page of a (regular) GEM object is mapped into
CPU address space and updated. In each such case, either the page or the
the object should be marked dirty, to ensure that the modifications are
not discarded if the object is evicted under memory pressure.
The typical sequence is:
va = kmap_atomic(i915_gem_object_get_page(obj, pageno));
*(va+offset) = ...
kunmap_atomic(va);
Here we introduce i915_gem_object_get_dirty_page(), which performs the
same operation as i915_gem_object_get_page() but with the side-effect
of marking the returned page dirty in the pagecache. This will ensure
that if the object is subsequently evicted (due to memory pressure),
the changes are written to backing store rather than discarded.
Note that it works only for regular (shmfs-backed) GEM objects, but (at
least for now) those are the only ones that are updated in this way --
the objects in question are contexts and batchbuffers, which are always
shmfs-backed.
Separate patches deal with the cases where whole objects are (or may
be) dirtied.
v3: Mark two more pages dirty in the page-boundary-crossing
cases of the execbuffer relocation code [Chris Wilson]
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1449773486-30822-2-git-send-email-david.s.gordon@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
For now, remove the spinlocks that protected the GuC's
statistics block and work queue; they are only accessed
by code that already holds the global struct_mutex, and
so are redundant (until the big struct_mutex rewrite!).
The specific problem that the spinlocks caused was that
if the work queue was full, the driver would try to
spinwait for one jiffy, but with interrupts disabled the
jiffy count would not advance, leading to a system hang.
The issue was found using test case igt/gem_close_race.
The new version will usleep() instead, still holding
the struct_mutex but without any spinlocks.
v4: Reorganize commit message (Dave Gordon)
v3: Remove unnecessary whitespace churn
v2: Clean up wq_lock too
v1: Clean up host2guc lock as well
Signed-off-by: Alex Dai <yu.dai@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1449104189-27591-1-git-send-email-yu.dai@intel.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Make I915_READ and I915_WRITE more type safe by wrapping the register
offset in a struct. This should eliminate most of the fumbles we've had
with misplaced parens.
This only takes care of normal mmio registers. We could extend the idea
to other register types and define each with its own struct. That way
you wouldn't be able to accidentally pass the wrong thing to a specific
register access function.
The gpio_reg setup is probably the ugliest thing left. But I figure I'd
just leave it for now, and wait for some divine inspiration to strike
before making it nice.
As for the generated code, it's actually a bit better sometimes. Eg.
looking at i915_irq_handler(), we can see the following change:
lea 0x70024(%rdx,%rax,1),%r9d
mov $0x1,%edx
- movslq %r9d,%r9
- mov %r9,%rsi
- mov %r9,-0x58(%rbp)
- callq *0xd8(%rbx)
+ mov %r9d,%esi
+ mov %r9d,-0x48(%rbp)
callq *0xd8(%rbx)
So previously gcc thought the register offset might be signed and
decided to sign extend it, just in case. The rest appears to be
mostly just minor shuffling of instructions.
v2: i915_mmio_reg_{offset,equal,valid}() helpers added
s/_REG/_MMIO/ in the register defines
mo more switch statements left to worry about
ring_emit stuff got sorted in a prep patch
cmd parser, lrc context and w/a batch buildup also in prep patch
vgpu stuff cleaned up and moved to a prep patch
all other unrelated changes split out
v3: Rebased due to BXT DSI/BLC, MOCS, etc.
v4: Rebased due to churn, s/i915_mmio_reg_t/i915_reg_t/
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1447853606-2751-1-git-send-email-ville.syrjala@linux.intel.com
The size / offset information of all firmware ingredients are
now caculated from header. Driver will validate the header and
rsa key size. If any component is out of boundary, driver will
reject the loading too.
v6: Clean up warnings from make docs
v5: Tidy up GuC titles in kernel/Doc
v4: Now using 'size_dw' for those defined in css_header
v3: 1) Move DOC to intel_guc_fwif.h right before css_header
definition. Add more comments.
2) Change 'size' to 'len' or 'length' to avoid confusion.
3) Add UOS_RSA_SCRATCH_MAX_COUNT according to BSpec. And
driver validate size of RSA key now.
4) Add fw component size/offset info to intel_guc_fw.
v2: Add indent into DOC to make fixed-width format rather than
change the tmpl.
v1: 1) guc_css_header is defined as __packed now
2) Add and correct GuC related topics in kernel/Doc
Signed-off-by: Alex Dai <yu.dai@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Revision checks are almost always accompanied by a platform check. (The
exceptions are platform specific code.) Add helpers to check for a
platform and a revision range: IS_SKL_REVID() and IS_BXT_REVID(). In
most places this simplifies and clarifies the code. It will be obvious
that revid macros are used for the correct platform.
This should make it easier to find all the revision checks for
workarounds for each platform, and make it easier to remove them once we
drop support for early hardware revisions.
This should also make it easier to differentiate between Skylake and
Kabylake revision checks when Kabylake support is added.
v2: rebase
Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1445343722-3312-3-git-send-email-jani.nikula@intel.com
Add host2guc interface to notify GuC power state changes when
enter or resume from power saving state.
v3: Move intel_guc_suspend to i915_drm_suspend for consistency.
v2: Add GuC suspend/resume to runtime suspend/resume too
v1: Change to a more flexible way when fill host to GuC scratch
data in order to remove hard coding.
Signed-off-by: Alex Dai <yu.dai@intel.com>
Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
GuC expects two bits for Render and Media domain separately when
driver sends data via host2guc SAMPLE_FORCEWAKE. Bit 0 is for
Render and bit 1 is for Media domain.
v2: Keep sync with code for WaRsDoubleRc6WrlWithCoarsePowerGating
v1: Add parameters definition to avoid magic value
Signed-off-by: Alex Dai <yu.dai@intel.com>
Reviewed-by: Tom O'Rourke <Tom.O'Rourke@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
If rc6 is enabled, notify GuC so it can do proper forcewake before
command submission.
Signed-off-by: Alex Dai <yu.dai@intel.com>
Reviewed-by: Tom O'Rourke <Tom.O'Rourke@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>