[PATCH] drm/i915: Don't wait forever in drop_caches

Jani Nikula jani.nikula at linux.intel.com
Wed Nov 2 12:12:00 UTC 2022


On Tue, 01 Nov 2022, John.C.Harrison at Intel.com wrote:
> From: John Harrison <John.C.Harrison at Intel.com>
>
> At the end of each test, IGT does a drop caches call via sysfs with

sysfs?

> special flags set. One of the possible paths waits for idle with an
> infinite timeout. That causes problems for debugging issues when CI
> catches a "can't go idle" test failure. Best case, the CI system times
> out (after 90s), attempts a bunch of state dump actions and then
> reboots the system to recover it. Worst case, the CI system can't do
> anything at all and then times out (after 1000s) and simply reboots.
> Sometimes a serial port log of dmesg might be available, sometimes not.
>
> So rather than making life hard for ourselves, change the timeout to
> be 10s rather than infinite. Also, trigger the standard
> wedge/reset/recover sequence so that testing can continue with a
> working system (if possible).
>
> Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index ae987e92251dd..9d916fbbfc27c 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -641,6 +641,9 @@ DEFINE_SIMPLE_ATTRIBUTE(i915_perf_noa_delay_fops,
>  		  DROP_RESET_ACTIVE | \
>  		  DROP_RESET_SEQNO | \
>  		  DROP_RCU)
> +
> +#define DROP_IDLE_TIMEOUT	(HZ * 10)

I915_IDLE_ENGINES_TIMEOUT is defined in i915_drv.h. It's also only used
here.

I915_GEM_IDLE_TIMEOUT is defined in i915_gem.h. It's only used in
gt/intel_gt.c.

I915_GT_SUSPEND_IDLE_TIMEOUT is defined and used only in intel_gt_pm.c.

I915_IDLE_ENGINES_TIMEOUT is in ms, the rest are in jiffies.

My head spins.


BR,
Jani.


> +
>  static int
>  i915_drop_caches_get(void *data, u64 *val)
>  {
> @@ -661,7 +664,9 @@ gt_drop_caches(struct intel_gt *gt, u64 val)
>  		intel_gt_retire_requests(gt);
>  
>  	if (val & (DROP_IDLE | DROP_ACTIVE)) {
> -		ret = intel_gt_wait_for_idle(gt, MAX_SCHEDULE_TIMEOUT);
> +		ret = intel_gt_wait_for_idle(gt, DROP_IDLE_TIMEOUT);
> +		if (ret == -ETIME)
> +			intel_gt_set_wedged(gt);
>  		if (ret)
>  			return ret;
>  	}

-- 
Jani Nikula, Intel Open Source Graphics Center


More information about the dri-devel mailing list