[PATCH v3 12/13] drm/sched/doc: Add Entity teardown documentaion

Danilo Krummrich me at dakr.org
Sat Sep 16 18:06:15 UTC 2023


On 9/12/23 04:16, Matthew Brost wrote:
> Provide documentation to guide in ways to teardown an entity.
> 
> Signed-off-by: Matthew Brost <matthew.brost at intel.com>
> ---
>   Documentation/gpu/drm-mm.rst             |  6 ++++++
>   drivers/gpu/drm/scheduler/sched_entity.c | 19 +++++++++++++++++++
>   2 files changed, 25 insertions(+)
> 
> diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
> index c19b34b1c0ed..cb4d6097897e 100644
> --- a/Documentation/gpu/drm-mm.rst
> +++ b/Documentation/gpu/drm-mm.rst
> @@ -552,6 +552,12 @@ Overview
>   .. kernel-doc:: drivers/gpu/drm/scheduler/sched_main.c
>      :doc: Overview
>   
> +Entity teardown
> +---------------

While I think it is good to document this as well, my concern was more about tearing
down the drm_gpu_scheduler. (See also my response to patch 11 of this series.)

How do we ensure that the pending_list is actually empty before calling
drm_sched_fini()? If we don't, we potentially leak memory.

For instance, we could let drm_sched_fini() (or a separate drm_sched_teardown())
cancel run work first and leave free work running until the pending_list is empty.

If we think drivers should take care themselves (e.g. through reference counting jobs
per scheduler), we should document this and explain why we can't have the scheduler do
this for us.

> +
> +.. kernel-doc:: drivers/gpu/drm/scheduler/sched_entity.c
> +   :doc: Entity teardown
> +
>   Scheduler Function References
>   -----------------------------
>   
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> index 37557fbb96d0..76f3e10218bb 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -21,6 +21,25 @@
>    *
>    */
>   
> +/**
> + * DOC: Entity teardown
> + *
> + * Drivers can teardown down an entity for several reasons. Reasons typically
> + * are a user closes the entity via an IOCTL, the FD associated with the entity
> + * is closed, or the entity encounters an error. The GPU scheduler provides the
> + * basic infrastructure to do this in a few different ways.
> + *
> + * 1. Let the entity run dry (both the pending list and job queue) and then call
> + * drm_sched_entity_fini. The backend can accelerate the process of running dry.
> + * For example set a flag so run_job is a NOP and set the TDR to a low value to
> + * signal all jobs in a timely manner (this example works for
> + * DRM_SCHED_POLICY_SINGLE_ENTITY).
> + *
> + * 2. Kill the entity directly via drm_sched_entity_flush /
> + * drm_sched_entity_fini ensuring all pending and queued jobs are off the
> + * hardware and signaled.
> + */
> +
>   #include <linux/kthread.h>
>   #include <linux/slab.h>
>   #include <linux/completion.h>


More information about the dri-devel mailing list