[PATCH] drm/i915: Don't wait forever in drop_caches
Jani Nikula
jani.nikula at linux.intel.com
Wed Nov 2 12:12:00 UTC 2022
On Tue, 01 Nov 2022, John.C.Harrison at Intel.com wrote:
> From: John Harrison <John.C.Harrison at Intel.com>
>
> At the end of each test, IGT does a drop caches call via sysfs with
sysfs?
> special flags set. One of the possible paths waits for idle with an
> infinite timeout. That causes problems for debugging issues when CI
> catches a "can't go idle" test failure. Best case, the CI system times
> out (after 90s), attempts a bunch of state dump actions and then
> reboots the system to recover it. Worst case, the CI system can't do
> anything at all and then times out (after 1000s) and simply reboots.
> Sometimes a serial port log of dmesg might be available, sometimes not.
>
> So rather than making life hard for ourselves, change the timeout to
> be 10s rather than infinite. Also, trigger the standard
> wedge/reset/recover sequence so that testing can continue with a
> working system (if possible).
>
> Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
> ---
> drivers/gpu/drm/i915/i915_debugfs.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index ae987e92251dd..9d916fbbfc27c 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -641,6 +641,9 @@ DEFINE_SIMPLE_ATTRIBUTE(i915_perf_noa_delay_fops,
> DROP_RESET_ACTIVE | \
> DROP_RESET_SEQNO | \
> DROP_RCU)
> +
> +#define DROP_IDLE_TIMEOUT (HZ * 10)
I915_IDLE_ENGINES_TIMEOUT is defined in i915_drv.h. It's also only used
here.
I915_GEM_IDLE_TIMEOUT is defined in i915_gem.h. It's only used in
gt/intel_gt.c.
I915_GT_SUSPEND_IDLE_TIMEOUT is defined and used only in intel_gt_pm.c.
I915_IDLE_ENGINES_TIMEOUT is in ms, the rest are in jiffies.
My head spins.
BR,
Jani.
> +
> static int
> i915_drop_caches_get(void *data, u64 *val)
> {
> @@ -661,7 +664,9 @@ gt_drop_caches(struct intel_gt *gt, u64 val)
> intel_gt_retire_requests(gt);
>
> if (val & (DROP_IDLE | DROP_ACTIVE)) {
> - ret = intel_gt_wait_for_idle(gt, MAX_SCHEDULE_TIMEOUT);
> + ret = intel_gt_wait_for_idle(gt, DROP_IDLE_TIMEOUT);
> + if (ret == -ETIME)
> + intel_gt_set_wedged(gt);
> if (ret)
> return ret;
> }
--
Jani Nikula, Intel Open Source Graphics Center
More information about the dri-devel
mailing list