OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Dave Airlie	41d1d0c51f	Merge tag 'drm-intel-gt-next-2021-04-06' of git://anongit.freedesktop.org/drm/drm-intel into drm-next Driver Changes: - Prepare for local/device memory support on DG1 by starting to use it for kernel internal allocations: context, ring and engine scratch (Matt A, CQ, Abdiel, Imre) - Sandybridge fix to avoid hard hang on ring resume (Chris) - Limit imported dma-buf size to int32 (Matt A) - Double check heartbeat timeout before resetting (Chris) - Use new tasklet API for execution list (Emil) - Fix SPDX checkpats warnings (Chris) - Fixes for various checkpatch warnings (Chris) - Selftest improvements (Chris) - Move the defer_request waiter active assertion to correct spot (Chris) - Make local-memory probing a GT operation (Matt, Tvrtko) - Protect against request freeing during cancellation on wedging (Chris) - Retire unexpected starting state error dumping (Chris) - Distinction of memory regions in debugging (Zbigniew) - Always flush the submission queue on checking for idle (Chris) - Consolidate 2big error check to helper (Matt) - Decrease number of subplatform bits (Tvrtko) - Remove unused internal request priority levels (Chris) - Document the unused internal header bits in buddy allocator (Matt) - Cleanup the region class/instance encoding (Matt) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/YGxksaZGXHnFxlwg@jlahtine-mobl.ger.corp.intel.com	2021-04-08 12:46:12 +10:00
Tvrtko Ursulin	9b4d0598ee	drm/i915: Request watchdog infrastructure Prepares the plumbing for setting request/fence expiration time. All code is put in place but is never activated due yet missing ability to actually configure the timer. Outline of the basic operation: A timer is started when request is ready for execution. If the request completes (retires) before the timer fires, timer is cancelled and nothing further happens. If the timer fires request is added to a lockless list and worker queued. Purpose of this is twofold: a) It allows request cancellation from a more friendly context and b) coalesces multiple expirations into a single event of consuming the list. Worker locklessly consumes the list of expired requests and cancels them all using previous added i915_request_cancel(). Associated timeout value is stored in rq->context.watchdog.timeout_us. v2: * Log expiration. v3: * Include more information about user timeline in the log message. v4: * Remove obsolete comment and fix formatting. (Matt) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210324121335.2307063-6-tvrtko.ursulin@linux.intel.com	2021-03-26 00:58:52 +01:00
Chris Wilson	38b237eab2	drm/i915: Individual request cancellation Currently, we cancel outstanding requests within a context when the context is closed. We may also want to cancel individual requests using the same graceful preemption mechanism. v2 (Tvrtko): * Cancel waiters carefully considering no timeline lock and RCU. * Fixed selftests. v3 (Tvrtko): * Remove error propagation to waiters for now. v4 (Tvrtko): * Rebase for extracted i915_request_active_engine. (Matt) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> [danvet: Resolve conflict because intel_engine_flush_scheduler is still called intel_engine_flush_submission] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210324121335.2307063-3-tvrtko.ursulin@linux.intel.com	2021-03-26 00:55:30 +01:00
Tvrtko Ursulin	7dbc19da5d	drm/i915: Extract active lookup engine to a helper Move active engine lookup to exported i915_request_active_engine. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> [danvet: Slight rebase, engine->sched.lock is still called engine->active.lock.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210324121335.2307063-2-tvrtko.ursulin@linux.intel.com	2021-03-26 00:48:08 +01:00
Chris Wilson	c10e4a7960	drm/i915: Protect against request freeing during cancellation on wedging As soon as we mark a request as completed, it may be retired. So when cancelling a request and marking it complete, make sure we first keep a reference to the request. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210201085715.27435-4-chris@chris-wilson.co.uk Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2021-03-24 19:30:36 +01:00
Maarten Lankhorst	12ca695d2c	drm/i915: Do not share hwsp across contexts any more, v8. Instead of sharing pages with breadcrumbs, give each timeline a single page. This allows unrelated timelines not to share locks any more during command submission. As an additional benefit, seqno wraparound no longer requires i915_vma_pin, which means we no longer need to worry about a potential -EDEADLK at a point where we are ready to submit. Changes since v1: - Fix erroneous i915_vma_acquire that should be a i915_vma_release (ickle). - Extra check for completion in intel_read_hwsp(). Changes since v2: - Fix inconsistent indent in hwsp_alloc() (kbuild) - memset entire cacheline to 0. Changes since v3: - Do same in intel_timeline_reset_seqno(), and clflush for good measure. Changes since v4: - Use refcounting on timeline, instead of relying on i915_active. - Fix waiting on kernel requests. Changes since v5: - Bump amount of slots to maximum (256), for best wraparounds. - Add hwsp_offset to i915_request to fix potential wraparound hang. - Ensure timeline wrap test works with the changes. - Assign hwsp in intel_timeline_read_hwsp() within the rcu lock to fix a hang. Changes since v6: - Rename i915_request_active_offset to i915_request_active_seqno(), and elaborate the function. (tvrtko) Changes since v7: - Move hunk to where it belongs. (jekstrand) - Replace CACHELINE_BYTES with TIMELINE_SEQNO_BYTES. (jekstrand) Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> #v1 Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210323155059.628690-2-maarten.lankhorst@linux.intel.com	2021-03-24 11:38:56 +01:00
Chris Wilson	baa7c2cd99	drm/i915: Refactor marking a request as EIO When wedging the device, we cancel all outstanding requests and mark them as EIO. Rather than duplicate the small function to do so between each submission backend, export one. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Andi Shyti <andi.shyti@intel.com> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210109163455.28466-3-chris@chris-wilson.co.uk	2021-01-09 18:29:07 +00:00
Chris Wilson	16f2941ad3	drm/i915/gt: Replace direct submit with direct call to tasklet Rather than having special case code for opportunistically calling process_csb() and performing a direct submit while holding the engine spinlock for submitting the request, simply call the tasklet directly. This allows us to retain the direct submission path, including the CS draining to allow fast/immediate submissions, without requiring any duplicated code paths, and most importantly greatly simplifying the control flow by removing reentrancy. This will enable us to close a few races in the virtual engines in the next few patches. The trickiest part here is to ensure that paired operations (such as schedule_in/schedule_out) remain under consistent locking domains, e.g. when pulled outside of the engine->active.lock v2: Use bh kicking, see commit `3c53776e29` ("Mark HI and TASKLET softirq synchronous"). v3: Update engine-reset to be tasklet aware Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201224135544.1713-1-chris@chris-wilson.co.uk	2020-12-24 15:02:35 +00:00
Chris Wilson	9bb36cf660	drm/i915: Check for rq->hwsp validity after acquiring RCU lock Since we allow removing the timeline map at runtime, there is a risk that rq->hwsp points into a stale page. To control that risk, we hold the RCU read lock while reading *rq->hwsp, but we missed a couple of important barriers. First, the unpinning / removal of the timeline map must be after all RCU readers into that map are complete, i.e. after an rcu barrier (in this case courtesy of call_rcu()). Secondly, we must make sure that the rq->hwsp we are about to dereference under the RCU lock is valid. In this case, we make the rq->hwsp pointer safe during i915_request_retire() and so we know that rq->hwsp may become invalid only after the request has been signaled. Therefore is the request is not yet signaled when we acquire rq->hwsp under the RCU, we know that rq->hwsp will remain valid for the duration of the RCU read lock. This is a very small window that may lead to either considering the request not completed (causing a delay until the request is checked again, any wait for the request is not affected) or dereferencing an invalid pointer. Fixes: `3adac4689f` ("drm/i915: Introduce concept of per-timeline (context) HWSP") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: <stable@vger.kernel.org> # v5.1+ Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201218122421.18344-1-chris@chris-wilson.co.uk	2020-12-18 18:49:40 +00:00
Chris Wilson	c744d50363	drm/i915/gt: Split the breadcrumb spinlock between global and contexts As we funnel more and more contexts into the breadcrumbs on an engine, the hold time of b->irq_lock grows. As we may then contend with the b->irq_lock during request submission, this increases the burden upon the engine->active.lock and so directly impacts both our execution latency and client latency. If we split the b->irq_lock by introducing a per-context spinlock to manage the signalers within a context, we then only need the b->irq_lock for enabling/disabling the interrupt and can avoid taking the lock for walking the list of contexts within the signal worker. Even with the current setup, this greatly reduces the number of times we have to take and fight for b->irq_lock. Furthermore, this closes the race between enabling the signaling context while it is in the process of being signaled and removed: <4>[ 416.208555] list_add corruption. prev->next should be next (ffff8881951d5910), but was dead000000000100. (prev=ffff8882781bb870). <4>[ 416.208573] WARNING: CPU: 7 PID: 0 at lib/list_debug.c:28 __list_add_valid+0x4d/0x70 <4>[ 416.208575] Modules linked in: i915(+) vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio mei_hdcp x86_pkg_temp_thermal coretemp ax88179_178a usbnet mii crct10dif_pclmul snd_intel_dspcfg crc32_pclmul snd_hda_codec snd_hwdep ghash_clmulni_intel snd_hda_core e1000e snd_pcm ptp pps_core mei_me mei prime_numbers intel_lpss_pci [last unloaded: i915] <4>[ 416.208611] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G U 5.8.0-CI-CI_DRM_8852+ #1 <4>[ 416.208614] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake Y LPDDR4x T4 RVP TLC, BIOS ICLSFWR1.R00.3212.A00.1905212112 05/21/2019 <4>[ 416.208627] RIP: 0010:__list_add_valid+0x4d/0x70 <4>[ 416.208631] Code: c3 48 89 d1 48 c7 c7 60 18 33 82 48 89 c2 e8 ea e0 b6 ff 0f 0b 31 c0 c3 48 89 c1 4c 89 c6 48 c7 c7 b0 18 33 82 e8 d3 e0 b6 ff <0f> 0b 31 c0 c3 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 00 19 33 82 e8 <4>[ 416.208633] RSP: 0018:ffffc90000280e18 EFLAGS: 00010086 <4>[ 416.208636] RAX: 0000000000000000 RBX: ffff888250a44880 RCX: 0000000000000105 <4>[ 416.208639] RDX: 0000000000000105 RSI: ffffffff82320c5b RDI: 00000000ffffffff <4>[ 416.208641] RBP: ffff8882781bb870 R08: 0000000000000000 R09: 0000000000000001 <4>[ 416.208643] R10: 00000000054d2957 R11: 000000006abbd991 R12: ffff8881951d58c8 <4>[ 416.208646] R13: ffff888286073880 R14: ffff888286073848 R15: ffff8881951d5910 <4>[ 416.208669] FS: 0000000000000000(0000) GS:ffff88829c180000(0000) knlGS:0000000000000000 <4>[ 416.208671] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 416.208673] CR2: 0000556231326c48 CR3: 0000000005610001 CR4: 0000000000760ee0 <4>[ 416.208675] PKRU: 55555554 <4>[ 416.208677] Call Trace: <4>[ 416.208679] <IRQ> <4>[ 416.208751] i915_request_enable_breadcrumb+0x278/0x400 [i915] <4>[ 416.208839] __i915_request_submit+0xca/0x2a0 [i915] <4>[ 416.208892] __execlists_submission_tasklet+0x480/0x1830 [i915] <4>[ 416.208942] execlists_submission_tasklet+0xc4/0x130 [i915] <4>[ 416.208947] tasklet_action_common.isra.17+0x6c/0x1c0 <4>[ 416.208954] __do_softirq+0xdf/0x498 <4>[ 416.208960] ? handle_fasteoi_irq+0x150/0x150 <4>[ 416.208964] asm_call_on_stack+0xf/0x20 <4>[ 416.208966] </IRQ> <4>[ 416.208969] do_softirq_own_stack+0xa1/0xc0 <4>[ 416.208972] irq_exit_rcu+0xb5/0xc0 <4>[ 416.208976] common_interrupt+0xf7/0x260 <4>[ 416.208980] asm_common_interrupt+0x1e/0x40 <4>[ 416.208985] RIP: 0010:cpuidle_enter_state+0xb6/0x410 <4>[ 416.208987] Code: 00 31 ff e8 9c 3e 89 ff 80 7c 24 0b 00 74 12 9c 58 f6 c4 02 0f 85 31 03 00 00 31 ff e8 e3 6c 90 ff e8 fe a4 94 ff fb 45 85 ed <0f> 88 c7 02 00 00 49 63 c5 4c 2b 24 24 48 8d 14 40 48 8d 14 90 48 <4>[ 416.208989] RSP: 0018:ffffc90000143e70 EFLAGS: 00000206 <4>[ 416.208991] RAX: 0000000000000007 RBX: ffffe8ffffda8070 RCX: 0000000000000000 <4>[ 416.208993] RDX: 0000000000000000 RSI: ffffffff8238b4ee RDI: ffffffff8233184f <4>[ 416.208995] RBP: ffffffff826b4e00 R08: 0000000000000000 R09: 0000000000000000 <4>[ 416.208997] R10: 0000000000000001 R11: 0000000000000000 R12: 00000060e7f24a8f <4>[ 416.208998] R13: 0000000000000003 R14: 0000000000000003 R15: 0000000000000003 <4>[ 416.209012] cpuidle_enter+0x24/0x40 <4>[ 416.209016] do_idle+0x22f/0x2d0 <4>[ 416.209022] cpu_startup_entry+0x14/0x20 <4>[ 416.209025] start_secondary+0x158/0x1a0 <4>[ 416.209030] secondary_startup_64+0xa4/0xb0 <4>[ 416.209039] irq event stamp: 10186977 <4>[ 416.209042] hardirqs last enabled at (10186976): [<ffffffff810b9363>] tasklet_action_common.isra.17+0xe3/0x1c0 <4>[ 416.209044] hardirqs last disabled at (10186977): [<ffffffff81a5e5ed>] _raw_spin_lock_irqsave+0xd/0x50 <4>[ 416.209047] softirqs last enabled at (10186968): [<ffffffff810b9a1a>] irq_enter_rcu+0x6a/0x70 <4>[ 416.209049] softirqs last disabled at (10186969): [<ffffffff81c00f4f>] asm_call_on_stack+0xf/0x20 <4>[ 416.209317] list_del corruption, ffff8882781bb870->next is LIST_POISON1 (dead000000000100) <4>[ 416.209317] WARNING: CPU: 7 PID: 46 at lib/list_debug.c:47 __list_del_entry_valid+0x4e/0x90 <4>[ 416.209317] Modules linked in: i915(+) vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio mei_hdcp x86_pkg_temp_thermal coretemp ax88179_178a usbnet mii crct10dif_pclmul snd_intel_dspcfg crc32_pclmul snd_hda_codec snd_hwdep ghash_clmulni_intel snd_hda_core e1000e snd_pcm ptp pps_core mei_me mei prime_numbers intel_lpss_pci [last unloaded: i915] <4>[ 416.209317] CPU: 7 PID: 46 Comm: ksoftirqd/7 Tainted: G U W 5.8.0-CI-CI_DRM_8852+ #1 <4>[ 416.209317] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake Y LPDDR4x T4 RVP TLC, BIOS ICLSFWR1.R00.3212.A00.1905212112 05/21/2019 <4>[ 416.209317] RIP: 0010:__list_del_entry_valid+0x4e/0x90 <4>[ 416.209317] Code: 2e 48 8b 32 48 39 fe 75 3a 48 8b 50 08 48 39 f2 75 48 b8 01 00 00 00 c3 48 89 fe 48 89 c2 48 c7 c7 38 19 33 82 e8 62 e0 b6 ff <0f> 0b 31 c0 c3 48 89 fe 48 c7 c7 70 19 33 82 e8 4e e0 b6 ff 0f 0b <4>[ 416.209317] RSP: 0018:ffffc90000280de8 EFLAGS: 00010086 <4>[ 416.209317] RAX: 0000000000000000 RBX: ffff8882781bb848 RCX: 0000000000010104 <4>[ 416.209317] RDX: 0000000000010104 RSI: ffffffff8238b4ee RDI: 00000000ffffffff <4>[ 416.209317] RBP: ffff8882781bb880 R08: 0000000000000000 R09: 0000000000000001 <4>[ 416.209317] R10: 000000009fb6666e R11: 00000000feca9427 R12: ffffc90000280e18 <4>[ 416.209317] R13: ffff8881951d5930 R14: dead0000000000d8 R15: ffff8882781bb880 <4>[ 416.209317] FS: 0000000000000000(0000) GS:ffff88829c180000(0000) knlGS:0000000000000000 <4>[ 416.209317] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 416.209317] CR2: 0000556231326c48 CR3: 0000000005610001 CR4: 0000000000760ee0 <4>[ 416.209317] PKRU: 55555554 <4>[ 416.209317] Call Trace: <4>[ 416.209317] <IRQ> <4>[ 416.209317] remove_signaling_context.isra.13+0xd/0x70 [i915] <4>[ 416.209513] signal_irq_work+0x1f7/0x4b0 [i915] This is caused by virtual engines where although we take the breadcrumb lock on each of the active engines, they may be different engines on different requests, It turns out that the b->irq_lock was not a sufficient proxy for the engine->active.lock in the case of more than one request, so introduce an explicit lock around ce->signals. v2: ce->signal_lock is acquired with only RCU protection and so must be treated carefully and not cleared during reallocation. We also then need to confirm that the ce we lock is the same as we found in the breadcrumb list. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2276 Fixes: `c18636f763` ("drm/i915: Remove requirement for holding i915_request.lock for breadcrumbs") Fixes: `2854d86632` ("drm/i915/gt: Replace intel_engine_transfer_stale_breadcrumbs") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201126140407.31952-4-chris@chris-wilson.co.uk	2020-11-26 18:35:03 +00:00
Chris Wilson	6cfe66eb71	drm/i915/gt: Track signaled breadcrumbs outside of the breadcrumb spinlock Make b->signaled_requests a lockless-list so that we can manipulate it outside of the b->irq_lock. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201123113717.20500-2-chris@chris-wilson.co.uk	2020-11-23 17:26:38 +00:00
Chris Wilson	562675d09a	drm/i915/gt: Update request status flags for debug pretty-printer We plan to expand upon the number of available statuses for when we pretty-print the requests along the timelines, and so need a new set of flags. We have settled upon: Unready [U] - initial status after being submitted, the request is not ready for execution as it is waiting for external fences Ready [R] - all fences the request was waiting on have been signaled, and the request is now ready for execution and will be in a backend queue - a ready request may still need to wait on semaphores [internal fences] Ready/virtual [V] - same as ready, but queued over multiple backends Executing [E] - the request has been transferred from the backend queue and submitted for execution on HW - a completed request may still be regarded as executing, its status may not be updated until it is retired and removed from the lists Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201119165616.10834-3-chris@chris-wilson.co.uk	2020-11-19 20:34:14 +00:00
Chris Wilson	1f0e785a9c	drm/i915: Lift i915_request_show() Extract i915_request_show for reuse in other request chain pretty printers. For a bonus point, quietly change the seqno format from %llx to %lld to match everywhere else. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201119165616.10834-2-chris@chris-wilson.co.uk	2020-11-19 20:31:04 +00:00
Chris Wilson	b3786b2937	drm/i915/gt: Distinguish the virtual breadcrumbs from the irq breadcrumbs On the virtual engines, we only use the intel_breadcrumbs for tracking signaling of stale breadcrumbs from the irq_workers. They do not have any associated interrupt handling, active requests are passed to a physical engine and associated breadcrumb interrupt handler. This causes issues for us as we need to ensure that we do not actually try and enable interrupts and the powermanagement required for them on the virtual engine, as they will never be disabled. Instead, let's specify the physical engine used for interrupt handler on a particular breadcrumb. v2: Drop b->irq_armed = true mocking for no interrupt HW Fixes: `4fe6abb8f5` ("drm/i915/gt: Ignore irq enabling on the virtual engines") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200731154834.8378-4-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2020-09-07 14:23:55 +03:00
Chris Wilson	27a5dcfe73	drm/i915/gem: Remove disordered per-file request list for throttling I915_GEM_THROTTLE dates back to the time before contexts where there was just a single engine, and therefore a single timeline and request list globally. That request list was in execution/retirement order, and so walking it to find a particular aged request made sense and could be split per file. That is no more. We now have many timelines with a file, as many as the user wants to construct (essentially per-engine, per-context). Each of those run independently and so make the single list futile. Remove the disordered list, and iterate over all the timelines to find a request to wait on in each to satisfy the criteria that the CPU is no more than 20ms ahead of its oldest request. It should go without saying that the I915_GEM_THROTTLE ioctl is no longer used as the primary means of throttling, so it makes sense to push the complication into the ioctl where it only impacts upon its few irregular users, rather than the execbuf/retire where everybody has to pay the cost. Fortunately, the few users do not create vast amount of contexts, so the loops over contexts/engines should be concise. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200728152010.30701-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2020-09-07 13:13:50 +03:00
Chris Wilson	e971fe9128	drm/i915: Mark up inline getters as taking a const i915_request Since these inline routines only return the desired pointer from the i915_request(after checking the preconditions for acquiring said pointer), they can be const. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200616183139.4061-1-chris@chris-wilson.co.uk	2020-06-16 21:13:58 +01:00
Chris Wilson	5a83399536	drm/i915: Drop i915_request.i915 backpointer We infrequently use the direct i915 backpointer from the i915_request, so do we really need to waste the space in the struct for it? 8 bytes from the most frequently allocated struct vs an 3 bytes and pointer chasing in using rq->engine->i915? Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200602220953.21178-1-chris@chris-wilson.co.uk	2020-06-03 13:53:39 +01:00
Chris Wilson	fc0e127022	drm/i915: Improve execute_cb struct packing Reduce the irq_work llist for attaching the callbacks to the signal for both smaller structs (two fewer pointers!) and simpler [debug] code: Function old new delta irq_execute_cb 35 34 -1 __igt_breadcrumbs_smoketest 1684 1682 -2 i915_request_retire 2003 1996 -7 __i915_request_create 1047 1040 -7 __notify_execute_cb 135 126 -9 __i915_request_ctor 188 178 -10 __await_execution.part.constprop 451 440 -11 igt_wait_request 924 714 -210 One minor artifact is that the order of cb exection is reversed. No current use cases are affected by that change. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200526112051.10229-1-chris@chris-wilson.co.uk	2020-05-26 12:24:28 +01:00
Chris Wilson	18e4af04d2	drm/i915: Drop no-semaphore boosting Now that we have fast timeslicing on semaphores, we no longer need to prioritise none-semaphore work as we will yield any work blocked on a semaphore to the next in the queue. Previously with no timeslicing, blocking on the semaphore caused extremely bad scheduling with multiple clients utilising multiple rings. Now, there is no impact and we can remove the complication. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200513173504.28322-1-chris@chris-wilson.co.uk	2020-05-14 06:14:33 +01:00
Chris Wilson	795d4d7fa3	drm/i915: Mark the addition of the initial-breadcrumb in the request The initial-breadcrumb is used to mark the end of the awaiting and the beginning of the user payload. We verify that we do not start the user payload before all signaler are completed, checking our semaphore setup by looking for the initial breadcrumb being written too early. We also want to ensure that we do not add semaphore waits after we have already closed the semaphore section, an issue for later deferred waits. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200513165937.9508-2-chris@chris-wilson.co.uk	2020-05-13 21:09:54 +01:00
Chris Wilson	43acd6516c	drm/i915: Keep a per-engine request pool Add a tiny per-engine request mempool so that we should always have a request available for powermanagement allocations from tricky contexts. This reserve is expected to be only used for kernel contexts when barriers must be emitted [almost] without fail. The main consumer for this reserved request is expected to be engine-pm, for which we know that there will always be at least the previous pm request that we can reuse under mempressure (so there should always be a spare request for engine_park()). This is an alternative to using a comparatively bulky mempool, which requires custom handling for both our reserved allocation requirement and to protect our TYPESAFE_BY_RCU slab cache. The advantage of mempool would be that it would allow us to keep a larger per-engine request pool. However, converting over to mempool is straightforward should the need arise. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-and-tested-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200402184037.21630-1-chris@chris-wilson.co.uk	2020-04-03 15:20:03 +01:00
Chris Wilson	209df10bb4	drm/i915: Defer semaphore priority bumping to a workqueue Since the semaphore fence may be signaled from inside an interrupt handler from inside a request holding its request->lock, we cannot then enter into the engine->active.lock for processing the semaphore priority bump as we may traverse our call tree and end up on another held request. CPU 0: [ 2243.218864] _raw_spin_lock_irqsave+0x9a/0xb0 [ 2243.218867] i915_schedule_bump_priority+0x49/0x80 [i915] [ 2243.218869] semaphore_notify+0x6d/0x98 [i915] [ 2243.218871] __i915_sw_fence_complete+0x61/0x420 [i915] [ 2243.218874] ? kmem_cache_free+0x211/0x290 [ 2243.218876] i915_sw_fence_complete+0x58/0x80 [i915] [ 2243.218879] dma_i915_sw_fence_wake+0x3e/0x80 [i915] [ 2243.218881] signal_irq_work+0x571/0x690 [i915] [ 2243.218883] irq_work_run_list+0xd7/0x120 [ 2243.218885] irq_work_run+0x1d/0x50 [ 2243.218887] smp_irq_work_interrupt+0x21/0x30 [ 2243.218889] irq_work_interrupt+0xf/0x20 CPU 1: [ 2242.173107] _raw_spin_lock+0x8f/0xa0 [ 2242.173110] __i915_request_submit+0x64/0x4a0 [i915] [ 2242.173112] __execlists_submission_tasklet+0x8ee/0x2120 [i915] [ 2242.173114] ? i915_sched_lookup_priolist+0x1e3/0x2b0 [i915] [ 2242.173117] execlists_submit_request+0x2e8/0x2f0 [i915] [ 2242.173119] submit_notify+0x8f/0xc0 [i915] [ 2242.173121] __i915_sw_fence_complete+0x61/0x420 [i915] [ 2242.173124] ? _raw_spin_unlock_irqrestore+0x39/0x40 [ 2242.173137] i915_sw_fence_complete+0x58/0x80 [i915] [ 2242.173140] i915_sw_fence_commit+0x16/0x20 [i915] Closes: https://gitlab.freedesktop.org/drm/intel/issues/1318 Fixes: `b7404c7ecb` ("drm/i915: Bump ready tasks ahead of busywaits") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: <stable@vger.kernel.org> # v5.2+ Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200310101720.9944-1-chris@chris-wilson.co.uk	2020-03-10 23:12:38 +00:00
Chris Wilson	89f077ab90	drm/i915: Mark up unlocked update of i915_request.hwsp_seqno During i915_request_retire() we decouple the i915_request.hwsp_seqno from the intel_timeline so that it may be freed before the request is released. However, we need to warn the compiler that the pointer may update under its nose. [ 171.438899] BUG: KCSAN: data-race in i915_request_await_dma_fence [i915] / i915_request_retire [i915] [ 171.438920] [ 171.438932] write to 0xffff8881e7e28ce0 of 8 bytes by task 148 on cpu 2: [ 171.439174] i915_request_retire+0x1ea/0x660 [i915] [ 171.439408] retire_requests+0x7a/0xd0 [i915] [ 171.439640] engine_retire+0xa1/0xe0 [i915] [ 171.439657] process_one_work+0x3b1/0x690 [ 171.439671] worker_thread+0x80/0x670 [ 171.439685] kthread+0x19a/0x1e0 [ 171.439701] ret_from_fork+0x1f/0x30 [ 171.439721] [ 171.439739] read to 0xffff8881e7e28ce0 of 8 bytes by task 696 on cpu 1: [ 171.439990] i915_request_await_dma_fence+0x162/0x520 [i915] [ 171.440230] i915_request_await_object+0x2fe/0x470 [i915] [ 171.440467] i915_gem_do_execbuffer+0x45dc/0x4c20 [i915] [ 171.440704] i915_gem_execbuffer2_ioctl+0x2c3/0x580 [i915] [ 171.440722] drm_ioctl_kernel+0xe4/0x120 [ 171.440736] drm_ioctl+0x297/0x4c7 [ 171.440750] ksys_ioctl+0x89/0xb0 [ 171.440766] __x64_sys_ioctl+0x42/0x60 [ 171.440788] do_syscall_64+0x6e/0x2c0 [ 171.440802] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200309110934.868-1-chris@chris-wilson.co.uk	2020-03-09 18:23:59 +00:00
Chris Wilson	36e191f064	drm/i915: Apply i915_request_skip() on submission Trying to use i915_request_skip() prior to i915_request_add() causes us to try and fill the ring upto request->postfix, which has not yet been set, and so may cause us to memset() past the end of the ring. Instead of skipping the request immediately, just flag the error on the request (only accepting the first fatal error we see) and then clear the request upon submission. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200304121849.2448028-1-chris@chris-wilson.co.uk	2020-03-04 14:29:50 +00:00
Chris Wilson	f1766e3a78	drm/i915: Fix typo in kerneldoc function name A forgetful copy'n'paste left the name of the old function intact, and did not introduce the new function 'i915_request_is_ready' Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200117101639.2908469-1-chris@chris-wilson.co.uk	2020-01-17 13:20:18 +00:00
Chris Wilson	32ff621fd7	drm/i915/gt: Allow temporary suspension of inflight requests In order to support out-of-line error capture, we need to remove the active request from HW and put it to one side while a worker compresses and stores all the details associated with that request. (As that compression may take an arbitrary user-controlled amount of time, we want to let the engine continue running on other workloads while the hanging request is dumped.) Not only do we need to remove the active request, but we also have to remove its context and all requests that were dependent on it (both in flight, queued and future submission). Finally once the capture is complete, we need to be able to resubmit the request and its dependents and allow them to execute. v2: Replace stack recursion with a simple list. v3: Check all the parents, not just the first, when searching for a stuck ancestor! References: https://gitlab.freedesktop.org/drm/intel/issues/738 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200116184754.2860848-2-chris@chris-wilson.co.uk	2020-01-16 19:56:16 +00:00
Chris Wilson	672c368f93	drm/i915: Keep track of request among the scheduling lists If we keep track of when the i915_request.sched.link is on the HW runlist, or in the priority queue we can simplify our interactions with the request (such as during rescheduling). This also simplifies the next patch where we introduce a new in-between list, for requests that are ready but neither on the run list or in the queue. v2: Update i915_sched_node.link explanation for current usage where it is a link on both the queue and on the runlists. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200116184754.2860848-1-chris@chris-wilson.co.uk	2020-01-16 19:56:15 +00:00
Chris Wilson	e1c31fb5dd	drm/i915: Merge i915_request.flags with i915_request.fence.flags As we already have a flags field buried within i915_request, reuse it! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200106114234.2529613-3-chris@chris-wilson.co.uk	2020-01-06 14:38:55 +00:00
Chris Wilson	41d329e287	drm/i915: Add spaces before compound GEM_TRACE Add a space between the prefixed format and the users format so that the join are not mistakenly combined into one long word. Fixes: `639f2f2489` ("drm/i915: Introduce new macros for tracing") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191223204411.2355304-1-chris@chris-wilson.co.uk	2019-12-25 12:14:34 +00:00
Chris Wilson	6a8679c048	drm/i915: Mark the GEM context link as RCU protected The only protection for intel_context.gem_cotext is granted by RCU, so annotate it as a rcu protected pointer and carefully dereference it in the few occasions we need to use it. Fixes: `9f3ccd40ac` ("drm/i915: Drop GEM context as a direct link from i915_request") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Andi Shyti <andi.shyti@intel.com> Acked-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191222233558.2201901-1-chris@chris-wilson.co.uk	2019-12-23 13:08:47 +00:00
Chris Wilson	9f3ccd40ac	drm/i915: Drop GEM context as a direct link from i915_request Keep the intel_context as being the primary state for i915_request, with the GEM context a backpointer from the low level state for the rarer cases we need client information. Our goal is to remove such references to clients from the backend, and leave the HW submission agnostic to client interfaces and self-contained. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Andi Shyti <andi.shyti@intel.com> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191220101230.256839-1-chris@chris-wilson.co.uk	2019-12-20 10:52:21 +00:00
Chris Wilson	b81e4d9b59	drm/i915/gt: Track engine round-trip times Knowing the round trip time of an engine is useful for tracking the health of the system as well as providing a metric for the baseline responsiveness of the engine. We can use the latter metric for automatically tuning our waits in selftests and when idling so we don't confuse a slower system with a dead one. Upon idling the engine, we send one last pulse to switch the context away from precious user state to the volatile kernel context. We know the engine is idle at this point, and the pulse is non-preemptible, so this provides us with a good measurement of the round trip time. It also provides us with faster engine parking for ringbuffer submission, which is a welcome bonus (e.g. softer-rc6). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Stuart Summers <stuart.summers@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191219105043.4169050-1-chris@chris-wilson.co.uk Link: https://patchwork.freedesktop.org/patch/msgid/20191219124353.8607-2-chris@chris-wilson.co.uk	2019-12-19 17:03:57 +00:00
Chris Wilson	85bedbf191	drm/i915/gt: Eliminate the trylock for reading a timeline's hwsp As we stash a pointer to the HWSP cacheline on the request, when reading it we only need confirm that the cacheline is still valid by checking that the request and timeline are still intact. v2: Protect hwsp_cachline with RCU Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191217011659.3092130-1-chris@chris-wilson.co.uk	2019-12-17 16:59:48 +00:00
Venkata Sandeep Dhanalakota	639f2f2489	drm/i915: Introduce new macros for tracing New macros ENGINE_TRACE(), CE_TRACE(), RQ_TRACE() and GT_TRACE() are introduce to tag device name and engine name with contexts and requests tracing in i915. Cc: Sudeep Dutt <sudeep.dutt@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191213155152.69182-2-venkata.s.dhanalakota@intel.com	2019-12-13 20:16:23 +00:00
Chris Wilson	c3eb54aad9	drm/i915: Mark up "sentinel" requests Sometimes we want to emit a terminator request, a request that flushes the pipeline and allows no request to come after it. This can be used for a "preempt-to-idle" to ensure that upon processing the context-switch to that request, all other active contexts have been flushed. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191012070136.32058-1-chris@chris-wilson.co.uk	2019-10-12 08:51:17 +01:00
Chris Wilson	6610197542	drm/i915: Move request runtime management onto gt Requests are run from the gt and are tided into the gt runtime power management, so pull the runtime request management under gt/ Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-12-chris@chris-wilson.co.uk	2019-10-04 15:39:26 +01:00
Chris Wilson	f33a8a5160	drm/i915: Merge wait_for_timelines with retire_request wait_for_timelines is essentially the same loop as retiring requests (with an extra timeout), so merge the two into one routine. v2: i915_retire_requests_timeout and keep VT'd w/a as !interruptible Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-10-chris@chris-wilson.co.uk	2019-10-04 15:39:23 +01:00
Chris Wilson	7e80576266	drm/i915: Drop struct_mutex from around i915_retire_requests() We don't need to hold struct_mutex now for retiring requests, so drop it from i915_retire_requests() and i915_gem_wait_for_idle(), finally removing I915_WAIT_LOCKED for good. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-8-chris@chris-wilson.co.uk	2019-10-04 15:39:17 +01:00
Chris Wilson	b1e3177bd1	drm/i915: Coordinate i915_active with its own mutex Forgo the struct_mutex serialisation for i915_active, and interpose its own mutex handling for active/retire. This is a multi-layered sleight-of-hand. First, we had to ensure that no active/retire callbacks accidentally inverted the mutex ordering rules, nor assumed that they were themselves serialised by struct_mutex. More challenging though, is the rule over updating elements of the active rbtree. Instead of the whole i915_active now being serialised by struct_mutex, allocations/rotations of the tree are serialised by the i915_active.mutex and individual nodes are serialised by the caller using the i915_timeline.mutex (we need to use nested spinlocks to interact with the dma_fence callback lists). The pain point here is that instead of a single mutex around execbuf, we now have to take a mutex for active tracker (one for each vma, context, etc) and a couple of spinlocks for each fence update. The improvement in fine grained locking allowing for multiple concurrent clients (eventually!) should be worth it in typical loads. v2: Add some comments that barely elucidate anything :( Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-6-chris@chris-wilson.co.uk	2019-10-04 15:39:12 +01:00
Chris Wilson	c0bb487dc1	drm/i915: Only enqueue already completed requests If we are asked to submit a completed request, just move it onto the active-list without modifying it's payload. If we try to emit the modified payload of a completed request, we risk racing with the ring->head update during retirement which may advance the head past our breadcrumb and so we generate a warning for the emission being behind the RING_HEAD. v2: Commentary for the sneaky, shared responsibility between functions. v3: Spelling mistakes and bonus assertion Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190923110056.15176-3-chris@chris-wilson.co.uk	2019-09-23 16:21:37 +01:00
Chris Wilson	d19d71fc2b	drm/i915: Mark i915_request.timeline as a volatile, rcu pointer The request->timeline is only valid until the request is retired (i.e. before it is completed). Upon retiring the request, the context may be unpinned and freed, and along with it the timeline may be freed. We therefore need to be very careful when chasing rq->timeline that the pointer does not disappear beneath us. The vast majority of users are in a protected context, either during request construction or retirement, where the timeline->mutex is held and the timeline cannot disappear. It is those few off the beaten path (where we access a second timeline) that need extra scrutiny -- to be added in the next patch after first adding the warnings about dangerous access. One complication, where we cannot use the timeline->mutex itself, is during request submission onto hardware (under spinlocks). Here, we want to check on the timeline to finalize the breadcrumb, and so we need to impose a second rule to ensure that the request->timeline is indeed valid. As we are submitting the request, it's context and timeline must be pinned, as it will be used by the hardware. Since it is pinned, we know the request->timeline must still be valid, and we cannot submit the idle barrier until after we release the engine->active.lock, ergo while submitting and holding that spinlock, a second thread cannot release the timeline. v2: Don't be lazy inside selftests; hold the timeline->mutex for as long as we need it, and tidy up acquiring the timeline with a bit of refactoring (i915_active_add_request) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190919111912.21631-1-chris@chris-wilson.co.uk	2019-09-20 10:24:09 +01:00
Chris Wilson	e5dadff4b0	drm/i915: Protect request retirement with timeline->mutex Forgo the struct_mutex requirement for request retirement as we have been transitioning over to only using the timeline->mutex for controlling the lifetime of a request on that timeline. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190815205709.24285-4-chris@chris-wilson.co.uk	2019-08-15 23:21:13 +01:00
Chris Wilson	a79ca656b6	drm/i915: Push the wakeref->count deferral to the backend If the backend wishes to defer the wakeref parking, make it responsible for unlocking the wakeref (i.e. bumping the counter). This allows it to time the unlock much more carefully in case it happens to needs the wakeref to be active during its deferral. For instance, during engine parking we may choose to emit an idle barrier (a request). To do so, we borrow the engine->kernel_context timeline and to ensure exclusive access we keep the engine->wakeref.count as 0. However, to submit that request to HW may require a intel_engine_pm_get() (e.g. to keep the submission tasklet alive) and before we allow that we have to rewake our wakeref to avoid a recursive deadlock. <4> [257.742916] IRQs not enabled as expected <4> [257.742930] WARNING: CPU: 0 PID: 0 at kernel/softirq.c:169 __local_bh_enable_ip+0xa9/0x100 <4> [257.742936] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 btusb btrtl btbcm btintel snd_hda_intel snd_intel_nhlt bluetooth snd_hda_codec coretemp snd_hwdep crct10dif_pclmul snd_hda_core crc32_pclmul ecdh_generic ecc ghash_clmulni_intel snd_pcm r8169 realtek lpc_ich prime_numbers i2c_hid <4> [257.742991] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G U W 5.3.0-rc3-g5d0a06cd532c-drmtip_340+ #1 <4> [257.742998] Hardware name: GIGABYTE GB-BXBT-1900/MZBAYAB-00, BIOS F6 02/17/2015 <4> [257.743008] RIP: 0010:__local_bh_enable_ip+0xa9/0x100 <4> [257.743017] Code: 37 5b 5d c3 8b 80 50 08 00 00 85 c0 75 a9 80 3d 0b be 25 01 00 75 a0 48 c7 c7 f3 0c 06 ac c6 05 fb bd 25 01 01 e8 77 84 ff ff <0f> 0b eb 89 48 89 ef e8 3b 41 06 00 eb 98 e8 e4 5c f4 ff 5b 5d c3 <4> [257.743025] RSP: 0018:ffffa78600003cb8 EFLAGS: 00010086 <4> [257.743035] RAX: 0000000000000000 RBX: 0000000000000200 RCX: 0000000000010302 <4> [257.743042] RDX: 0000000080010302 RSI: 0000000000000000 RDI: 00000000ffffffff <4> [257.743050] RBP: ffffffffc0494bb3 R08: 0000000000000000 R09: 0000000000000001 <4> [257.743058] R10: 0000000014c8f0e9 R11: 00000000fee2ff8e R12: ffffa23ba8c38008 <4> [257.743065] R13: ffffa23bacc579c0 R14: ffffa23bb7db0f60 R15: ffffa23b9cc8c430 <4> [257.743074] FS: 0000000000000000(0000) GS:ffffa23bbba00000(0000) knlGS:0000000000000000 <4> [257.743082] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4> [257.743089] CR2: 00007fe477b20778 CR3: 000000011f72a000 CR4: 00000000001006f0 <4> [257.743096] Call Trace: <4> [257.743104] <IRQ> <4> [257.743265] __i915_request_commit+0x240/0x5d0 [i915] <4> [257.743427] ? __i915_request_create+0x228/0x4c0 [i915] <4> [257.743584] __engine_park+0x64/0x250 [i915] <4> [257.743730] ____intel_wakeref_put_last+0x1c/0x70 [i915] <4> [257.743878] i915_sample+0x2ee/0x310 [i915] <4> [257.744030] ? i915_pmu_cpu_offline+0xb0/0xb0 [i915] <4> [257.744040] __hrtimer_run_queues+0x11e/0x4b0 <4> [257.744068] hrtimer_interrupt+0xea/0x250 <4> [257.744079] ? lockdep_hardirqs_off+0x79/0xd0 <4> [257.744101] smp_apic_timer_interrupt+0x96/0x280 <4> [257.744114] apic_timer_interrupt+0xf/0x20 <4> [257.744125] RIP: 0010:__do_softirq+0xb3/0x4ae v2: Keep the priority_hint assert v3: That assert was desperately trying to point out my bug. Sorry, little assert. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111378 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190813190705.23869-1-chris@chris-wilson.co.uk	2019-08-13 21:09:49 +01:00
Lionel Landwerlin	2a98f4e65b	drm/i915: add infrastructure to hold off preemption on a request We want to set this flag in the next commit on requests containing perf queries so that the result of the perf query can just be a delta of global counters, rather than doing post processing of the OA buffer. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> [ickle: add basic selftest for nopreempt] Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190709164227.25859-1-chris@chris-wilson.co.uk	2019-07-09 21:26:40 +01:00
Tvrtko Ursulin	f0c02c1b91	drm/i915: Rename i915_timeline to intel_timeline and move under gt Move all timeline code under gt and rename to intel_gt prefix. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190621070811.7006-32-tvrtko.ursulin@linux.intel.com	2019-06-21 13:48:53 +01:00
Chris Wilson	22b7a426bb	drm/i915/execlists: Preempt-to-busy When using a global seqno, we required a precise stop-the-workd event to handle preemption and unwind the global seqno counter. To accomplish this, we would preempt to a special out-of-band context and wait for the machine to report that it was idle. Given an idle machine, we could very precisely see which requests had completed and which we needed to feed back into the run queue. However, now that we have scrapped the global seqno, we no longer need to precisely unwind the global counter and only track requests by their per-context seqno. This allows us to loosely unwind inflight requests while scheduling a preemption, with the enormous caveat that the requests we put back on the run queue are still _inflight_ (until the preemption request is complete). This makes request tracking much more messy, as at any point then we can see a completed request that we believe is not currently scheduled for execution. We also have to be careful not to rewind RING_TAIL past RING_HEAD on preempting to the running context, and for this we use a semaphore to prevent completion of the request before continuing. To accomplish this feat, we change how we track requests scheduled to the HW. Instead of appending our requests onto a single list as we submit, we track each submission to ELSP as its own block. Then upon receiving the CS preemption event, we promote the pending block to the inflight block (discarding what was previously being tracked). As normal CS completion events arrive, we then remove stale entries from the inflight tracker. v2: Be a tinge paranoid and ensure we flush the write into the HWS page for the GPU semaphore to pick in a timely fashion. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190620142052.19311-1-chris@chris-wilson.co.uk	2019-06-20 16:52:36 +01:00
Chris Wilson	422d7df4f0	drm/i915: Replace engine->timeline with a plain list To continue the onslaught of removing the assumption of a global execution ordering, another casualty is the engine->timeline. Without an actual timeline to track, it is overkill and we can replace it with a much less grand plain list. We still need a list of requests inflight, for the simple purpose of finding inflight requests (for retiring, resetting, preemption etc). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190614164606.15633-3-chris@chris-wilson.co.uk	2019-06-14 19:03:40 +01:00
Chris Wilson	f71e01a78b	drm/i915: Extend execution fence to support a callback In the next patch, we will want to configure the slave request depending on which physical engine the master request is executed on. For this, we introduce a callback from the execute fence to convey this information. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-8-chris@chris-wilson.co.uk	2019-05-22 08:40:45 +01:00
Chris Wilson	78e41ddd21	drm/i915: Apply an execution_mask to the virtual_engine Allow the user to direct which physical engines of the virtual engine they wish to execute one, as sometimes it is necessary to override the load balancing algorithm. v2: Only kick the virtual engines on context-out if required Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-7-chris@chris-wilson.co.uk	2019-05-22 08:40:43 +01:00
Chris Wilson	46472b3efb	drm/i915: Move i915_request_alloc into selftests/ Having transitioned GEM over to using intel_context as its primary means of tracking the GEM context and engine combined and using i915_request_create(), we can move the older i915_request_alloc() helper function into selftests/ where the remaining users are confined. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190426163336.15906-9-chris@chris-wilson.co.uk	2019-04-26 18:32:20 +01:00

1 2

84 Commits