[PATCH v5 01/16] drm/sched: Document what the timedout_job method should do
Daniel Vetter
daniel at ffwll.ch
Tue Jun 29 09:05:43 UTC 2021
On Tue, Jun 29, 2021 at 09:34:55AM +0200, Boris Brezillon wrote:
> The documentation is a bit vague and doesn't really describe what the
> ->timedout_job() is expected to do. Let's add a few more details.
>
> v5:
> * New patch
>
> Suggested-by: Daniel Vetter <daniel.vetter at ffwll.ch>
> Signed-off-by: Boris Brezillon <boris.brezillon at collabora.com>
Reviewed-by: Daniel Vetter <daniel.vetter at ffwll.ch>
> ---
> include/drm/gpu_scheduler.h | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index 10225a0a35d0..65700511e074 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -239,6 +239,20 @@ struct drm_sched_backend_ops {
> * @timedout_job: Called when a job has taken too long to execute,
> * to trigger GPU recovery.
> *
> + * This method is called in a workqueue context.
> + *
> + * Drivers typically issue a reset to recover from GPU hangs, and this
> + * procedure usually follows the following workflow:
> + *
> + * 1. Stop the scheduler using drm_sched_stop(). This will park the
> + * scheduler thread and cancel the timeout work, guaranteeing that
> + * nothing is queued while we reset the hardware queue
> + * 2. Try to gracefully stop non-faulty jobs (optional)
> + * 3. Issue a GPU reset (driver-specific)
> + * 4. Re-submit jobs using drm_sched_resubmit_jobs()
> + * 5. Restart the scheduler using drm_sched_start(). At that point, new
> + * jobs can be queued, and the scheduler thread is unblocked
> + *
> * Return DRM_GPU_SCHED_STAT_NOMINAL, when all is normal,
> * and the underlying driver has started or completed recovery.
> *
> --
> 2.31.1
>
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
More information about the dri-devel
mailing list