[PATCH] drm/sched: Improve teardown documentation
Philipp Stanner
pstanner at redhat.com
Thu Nov 7 09:18:01 UTC 2024
On Tue, 2024-11-05 at 15:31 +0100, Philipp Stanner wrote:
> If jobs are still enqueued in struct drm_gpu_scheduler.pending_list
> when drm_sched_fini() gets called, those jobs will be leaked since
> that
> function stops both job-submission and (automatic) job-cleanup. It
> is,
> thus, up to the driver to take care of preventing leaks.
>
> The related function drm_sched_wqueue_stop() also prevents automatic
> job
> cleanup.
>
> Those pitfals are not reflected in the documentation, currently.
>
> Explicitly inform about the leak problem in the docstring of
> drm_sched_fini().
>
> Additionally, detail the purpose of drm_sched_wqueue_{start,stop} and
> hint at the consequences for automatic cleanup.
>
> Signed-off-by: Philipp Stanner <pstanner at redhat.com>
Got an off-list (mail from dri-devel apparently got lost) RB from
Christian.
Applied to drm-misc-next.
P.
> ---
> Hi,
>
> in our discussion about my proposed waitque-cleanup for this problem
> Sima suggested [1] that we document the problems first and by doing
> so get
> to a consenus what the problems actually are and how we could solve
> them.
>
> This is my proposal for documenting the leaks on teardown. Feedback
> very
> welcome.
>
> P.
>
> [1]
> https://lore.kernel.org/dri-devel/ZtidJ8S9THvzkQ-6@phenom.ffwll.local/
> ---
> drivers/gpu/drm/scheduler/sched_main.c | 23 +++++++++++++++++++++--
> 1 file changed, 21 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> b/drivers/gpu/drm/scheduler/sched_main.c
> index e97c6c60bc96..3dfa9db89484 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -1333,6 +1333,19 @@ EXPORT_SYMBOL(drm_sched_init);
> * @sched: scheduler instance
> *
> * Tears down and cleans up the scheduler.
> + *
> + * This stops submission of new jobs to the hardware through
> + * drm_sched_backend_ops.run_job(). Consequently,
> drm_sched_backend_ops.free_job()
> + * will not be called for all jobs still in
> drm_gpu_scheduler.pending_list.
> + * There is no solution for this currently. Thus, it is up to the
> driver to make
> + * sure that
> + * a) drm_sched_fini() is only called after for all submitted jobs
> + * drm_sched_backend_ops.free_job() has been called or that
> + * b) the jobs for which drm_sched_backend_ops.free_job() has not
> been called
> + * after drm_sched_fini() ran are freed manually.
> + *
> + * FIXME: Take care of the above problem and prevent this function
> from leaking
> + * the jobs in drm_gpu_scheduler.pending_list under any
> circumstances.
> */
> void drm_sched_fini(struct drm_gpu_scheduler *sched)
> {
> @@ -1428,8 +1441,10 @@ EXPORT_SYMBOL(drm_sched_wqueue_ready);
>
> /**
> * drm_sched_wqueue_stop - stop scheduler submission
> - *
> * @sched: scheduler instance
> + *
> + * Stops the scheduler from pulling new jobs from entities. It also
> stops
> + * freeing jobs automatically through
> drm_sched_backend_ops.free_job().
> */
> void drm_sched_wqueue_stop(struct drm_gpu_scheduler *sched)
> {
> @@ -1441,8 +1456,12 @@ EXPORT_SYMBOL(drm_sched_wqueue_stop);
>
> /**
> * drm_sched_wqueue_start - start scheduler submission
> - *
> * @sched: scheduler instance
> + *
> + * Restarts the scheduler after drm_sched_wqueue_stop() has stopped
> it.
> + *
> + * This function is not necessary for 'conventional' startup. The
> scheduler is
> + * fully operational after drm_sched_init() succeeded.
> */
> void drm_sched_wqueue_start(struct drm_gpu_scheduler *sched)
> {
More information about the dri-devel
mailing list