Commit Graph

20 Commits

Author SHA1 Message Date
Bjorn Andersson 02b9f26362 drm/msm/gpu: Drop qos request if devm_devfreq_add_device() fails
In the event that devm_devfreq_add_device() fails the device's qos freq
list is left referencing df->idle_freq and df->boost_freq. Attempting to
initialize devfreq again after a probe deferral will then cause invalid
memory accesses in dev_pm_qos_add_request().

Fix this by dropping the requests in the error path.

Fixes: 7c0ffcd40b ("drm/msm/gpu: Respect PM QoS constraints")
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Patchwork: https://patchwork.freedesktop.org/patch/493001/
Link: https://lore.kernel.org/r/20220708162632.3529864-1-bjorn.andersson@linaro.org
Signed-off-by: Rob Clark <robdclark@chromium.org>
2022-08-15 10:19:18 -07:00
Douglas Anderson 6694482a70 drm/msm: Avoid unclocked GMU register access in 6xx gpu_busy
From testing on sc7180-trogdor devices, reading the GMU registers
needs the GMU clocks to be enabled. Those clocks get turned on in
a6xx_gmu_resume(). Confusingly enough, that function is called as a
result of the runtime_pm of the GPU "struct device", not the GMU
"struct device". Unfortunately the current a6xx_gpu_busy() grabs a
reference to the GMU's "struct device".

The fact that we were grabbing the wrong reference was easily seen to
cause crashes that happen if we change the GPU's pm_runtime usage to
not use autosuspend. It's also believed to cause some long tail GPU
crashes even with autosuspend.

We could look at changing it so that we do pm_runtime_get_if_in_use()
on the GPU's "struct device", but then we run into a different
problem. pm_runtime_get_if_in_use() will return 0 for the GPU's
"struct device" the whole time when we're in the "autosuspend
delay". That is, when we drop the last reference to the GPU but we're
waiting a period before actually suspending then we'll think the GPU
is off. One reason that's bad is that if the GPU didn't actually turn
off then the cycle counter doesn't lose state and that throws off all
of our calculations.

Let's change the code to keep track of the suspend state of
devfreq. msm_devfreq_suspend() is always called before we actually
suspend the GPU and msm_devfreq_resume() after we resume it. This
means we can use the suspended state to know if we're powered or not.

NOTE: one might wonder when exactly our status function is called when
devfreq is supposed to be disabled. The stack crawl I captured was:
  msm_devfreq_get_dev_status
  devfreq_simple_ondemand_func
  devfreq_update_target
  qos_notifier_call
  qos_max_notifier_call
  blocking_notifier_call_chain
  pm_qos_update_target
  freq_qos_apply
  apply_constraint
  __dev_pm_qos_update_request
  dev_pm_qos_update_request
  msm_devfreq_idle_work

Fixes: eadf79286a ("drm/msm: Check for powered down HW in the devfreq callbacks")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/489124/
Link: https://lore.kernel.org/r/20220610124639.v4.1.Ie846c5352bc307ee4248d7cab998ab3016b85d06@changeid
Signed-off-by: Rob Clark <robdclark@chromium.org>
2022-07-06 08:38:06 -07:00
Wan Jiabing 4400c3a1d4 drm/msm: Use div64_ul instead of do_div
Fix following coccicheck warning:
drivers/gpu/drm/msm/msm_gpu_devfreq.c:72:1-7: WARNING: do_div() does a 64-by-32 division, please consider using div64_ul instead.

Use div64_ul instead of do_div to avoid a possible truncation.

Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Patchwork: https://patchwork.freedesktop.org/patch/483499/
Link: https://lore.kernel.org/r/20220426132126.686447-1-wanjiabing@vivo.com
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
2022-07-05 06:01:11 +03:00
Chia-I Wu 78f815c1cf drm/msm: return the average load over the polling period
simple_ondemand interacts poorly with clamp_to_idle.  It only looks at
the load since the last get_dev_status call, while it should really look
at the load over polling_ms.  When clamp_to_idle true, it almost always
picks the lowest frequency on active because the gpu is idle between
msm_devfreq_idle/msm_devfreq_active.

This logic could potentially be moved into devfreq core.

Fixes: 7c0ffcd40b ("drm/msm/gpu: Respect PM QoS constraints")
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Cc: Rob Clark <robdclark@chromium.org>
Link: https://lore.kernel.org/r/20220416003314.59211-3-olvaffe@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
2022-04-21 15:05:23 -07:00
Chia-I Wu 15c411980b drm/msm: simplify gpu_busy callback
Move tracking and busy time calculation to msm_devfreq_get_dev_status.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Cc: Rob Clark <robdclark@chromium.org>
Link: https://lore.kernel.org/r/20220416003314.59211-2-olvaffe@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
2022-04-21 15:05:23 -07:00
Chia-I Wu 69f06a5d85 drm/msm: remove explicit devfreq status reset
It is redundant since commit 7c0ffcd40b ("drm/msm/gpu: Respect PM QoS
constraints") because dev_pm_qos_update_request triggers get_dev_status.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Cc: Rob Clark <robdclark@chromium.org>
Link: https://lore.kernel.org/r/20220416003314.59211-1-olvaffe@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
2022-04-21 15:05:23 -07:00
Rob Clark 05afd57f4d drm/msm/gpu: Fix crash on devices without devfreq support (v2)
Avoid going down devfreq paths on devices where devfreq is not
initialized.

v2: Change has_devfreq() logic [Dmitry]

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Reported-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Fixes: 6aa89ae1fb ("drm/msm/gpu: Cancel idle/boost work on suspend")
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20220308184844.1121029-1-robdclark@gmail.com
2022-03-08 13:55:23 -08:00
Rob Clark 6aa89ae1fb drm/msm/gpu: Cancel idle/boost work on suspend
With system suspend using pm_runtime_force_suspend() we can't rely on
the pm_runtime_get_if_in_use() trick to deal with devfreq callbacks
after (or racing with) suspend.  So flush any pending idle or boost
work in the suspend path.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Link: https://lore.kernel.org/r/20220108180913.814448-3-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
2022-01-25 08:54:41 -08:00
Rob Clark 7c0ffcd40b drm/msm/gpu: Respect PM QoS constraints
Re-work the boost and idle clamping to use PM QoS requests instead, so
they get aggreggated with other requests (such as cooling device).

This does have the minor side-effect that devfreq sysfs min_freq/
max_freq files now reflect the boost and idle clamping, as they show
(despite what they are documented to show) the aggregated min/max freq.
Fixing that in devfreq does not look straightforward after considering
that OPPs can be dynamically added/removed.  However writes to the
sysfs files still behave as expected.

v2: Use 64b math to avoid potential 32b overflow

Signed-off-by: Rob Clark <robdclark@chromium.org>
Link: https://lore.kernel.org/r/20211120200103.1051459-3-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
2021-11-28 10:01:40 -08:00
Akhil P Oommen 2a1ac5ba90 drm/msm: Increase gpu boost interval
Currently, we boost gpu freq after 25ms of inactivity. This regresses
some of the 30 fps usecases where the workload on gpu (at 33ms internval)
is very small which it can finish at the lowest OPP before the deadline.
Lets increase this inactivity threshold to 50ms (same as the current
devfreq interval) to fix this.

Signed-off-by: Akhil P Oommen <akhilpo@codeaurora.org>
Link: https://lore.kernel.org/r/20211118154903.1.I2ed37cd8ad45a5a94d9de53330f973a62bd1fb29@changeid
Signed-off-by: Rob Clark <robdclark@chromium.org>
2021-11-28 10:00:20 -08:00
Rob Clark 5dbe2711e4 drm/msm/gpu: Fix check for devices without devfreq
Looks like 658f4c8296 ("drm/msm/devfreq: Add 1ms delay before
clamping freq") was badly rebased on top of efb8a170a3 ("drm/msm:
Fix devfreq NULL pointer dereference on a3xx") and ended up with
the NULL check in the wrong place.

Fixes: 658f4c8296 ("drm/msm/devfreq: Add 1ms delay before clamping freq")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Link: https://lore.kernel.org/r/20211120200103.1051459-2-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
2021-11-21 12:56:51 -08:00
Rob Clark 26b6f1c870 drm/msm/gpu: Fix idle_work time
This was supposed to be a relative timer, not absolute.

Fixes: 658f4c8296 ("drm/msm/devfreq: Add 1ms delay before clamping freq")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20211120200103.1051459-1-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
2021-11-21 12:56:30 -08:00
Rob Clark 59ba1b2b48 drm/msm/devfreq: Fix OPP refcnt leak
Reported-by: Douglas Anderson <dianders@chromium.org>
Fixes: 9bc9557017 ("drm/msm: Devfreq tuning")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Tested-By: Steev Klimaszewski <steev@kali.org>
Reviewed-by: Akhil P Oommen <akhilpo@codeaurora.org>
Link: https://lore.kernel.org/r/20211105202021.181092-1-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
2021-11-21 12:46:28 -08:00
Dave Airlie de99e64798 Merge tag 'drm-msm-next-2021-10-26' of https://gitlab.freedesktop.org/drm/msm into drm-next
* eDP support in DP sub-driver (for newer SoCs with native eDP output)
* dpu irq handling cleanup
* CRC support for making igt happy
* Support for NO_CONNECTOR bridges
* dsi: 14nm phy support for msm8953
* mdp5: support for msm8x53, sdm450, sdm632
* various smaller fixes and cleanups

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rob Clark <robdclark@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGsH9EwcpqGNNRJeL99NvFFjHX3SUg+nTYu0dHG5U9+QuA@mail.gmail.com
2021-10-28 15:07:48 +10:00
Rob Clark 5ca6779d2f drm/msm/devfreq: Restrict idle clamping to a618 for now
Until we better understand the stability issues caused by frequent
frequency changes, lets limit them to a618.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Tested-by: John Stultz <john.stultz@linaro.org>
Tested-by: Caleb Connolly <caleb.connolly@linaro.org>
Link: https://lore.kernel.org/r/20211018153627.2787882-1-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
2021-10-18 14:31:57 -07:00
Rob Clark 658f4c8296 drm/msm/devfreq: Add 1ms delay before clamping freq
Add a short delay before clamping to idle frequency on active->idle
transition.  It takes ~0.5ms to increase the freq again on the next
idle->active transition, so this helps avoid extra freq transitions
on workloads that bounce between CPU and GPU.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Link: https://lore.kernel.org/r/20210927230455.1066297-2-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
2021-10-15 16:35:40 -07:00
Stephan Gerhold efb8a170a3 drm/msm: Fix devfreq NULL pointer dereference on a3xx
There is no devfreq on a3xx at the moment since gpu_busy is not
implemented. This means that msm_devfreq_init() will return early
and the entire devfreq setup is skipped.

However, msm_devfreq_active() and msm_devfreq_idle() are still called
unconditionally later, causing a NULL pointer dereference:

  Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
  Internal error: Oops: 96000004 [#1] PREEMPT SMP
  CPU: 0 PID: 133 Comm: ring0 Not tainted 5.15.0-rc1 #4
  Hardware name: Longcheer L8150 (DT)
  pc : mutex_lock_io+0x2bc/0x2f0
  lr : msm_devfreq_active+0x3c/0xe0 [msm]
  Call trace:
   mutex_lock_io+0x2bc/0x2f0
   msm_gpu_submit+0x164/0x180 [msm]
   msm_job_run+0x54/0xe0 [msm]
   drm_sched_main+0x2b0/0x4a0 [gpu_sched]
   kthread+0x154/0x160
   ret_from_fork+0x10/0x20

Fix this by adding a check in msm_devfreq_active/idle() which ensures
that devfreq was actually initialized earlier.

Fixes: 9bc9557017 ("drm/msm: Devfreq tuning")
Reported-by: Nikita Travkin <nikita@trvn.ru>
Tested-by: Nikita Travkin <nikita@trvn.ru>
Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Link: https://lore.kernel.org/r/20210913164556.16284-1-stephan@gerhold.net
Signed-off-by: Rob Clark <robdclark@chromium.org>
2021-10-11 17:30:53 -07:00
Rob Clark 9bc9557017 drm/msm: Devfreq tuning
This adds a few things to try and make frequency scaling better match
the workload:

1) Longer polling interval to avoid whip-lashing between too-high and
   too-low frequencies in certain workloads, like mobile games which
   throttle themselves to 30fps.

   Previously our polling interval was short enough to let things
   ramp down to minimum freq in the "off" frame, but long enough to
   not react quickly enough when rendering started on the next frame,
   leading to uneven frame times.  (Ie. rather than a consistent 33ms
   it would alternate between 16/33/48ms.)

2) Awareness of when the GPU is active vs idle.  Since we know when
   the GPU is active vs idle, we can clamp the frequency down to the
   minimum while it is idle.  (If it is idle for long enough, then
   the autosuspend delay will eventually kick in and power down the
   GPU.)

   Since devfreq has no knowledge of powered-but-idle, this takes a
   small bit of trickery to maintain a "fake" frequency while idle.
   This, combined with the longer polling period allows devfreq to
   arrive at a reasonable "active" frequency, while still clamping
   to minimum freq when idle to reduce power draw.

3) Boost.  Because simple_ondemand needs to see a certain threshold
   of busyness to ramp up, we could end up needing multiple polling
   cycles before it reacts appropriately on interactive workloads
   (ex. scrolling a web page after reading for some time), on top
   of the already lengthened polling interval, when we see a idle
   to active transition after a period of idle time we boost the
   frequency that we return to.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Link: https://lore.kernel.org/r/20210726144653.2180096-4-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
2021-07-27 17:54:36 -07:00
Rob Clark 552fce98b0 drm/msm: Split out get_freq() helper
In the next patch, it grows a bit more, so lets not duplicate the logic
in multiple places.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20210726144653.2180096-3-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
2021-07-27 17:54:36 -07:00
Rob Clark af5b4fff0f drm/msm: Split out devfreq handling
Before we start adding more cleverness, split it into it's own file.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20210726144653.2180096-2-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
2021-07-27 17:54:36 -07:00