[PATCH v5 2/3] drm/sched: Adjust outdated docu for run_job()

Maíra Canal mcanal at igalia.com
Thu Feb 20 13:28:53 UTC 2025


Hi Philipp,

On 20/02/25 08:28, Philipp Stanner wrote:
> The documentation for drm_sched_backend_ops.run_job() mentions a certain
> function called drm_sched_job_recovery(). This function does not exist.
> What's actually meant is drm_sched_resubmit_jobs(), which is by now also
> deprecated.
> 
> Remove the mention of the removed function.
> 
> Discourage the behavior of drm_sched_backend_ops.run_job() being called
> multiple times for the same job.

It looks odd to me that this patch removes lines that were added in
patch 1/3. Maybe you could change the patchset order and place this one
as the first.

> 
> Signed-off-by: Philipp Stanner <phasta at kernel.org>
> ---
>   include/drm/gpu_scheduler.h | 19 +++++++++++++------
>   1 file changed, 13 insertions(+), 6 deletions(-)
> 
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index 916279b5aa00..29e5bda91806 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -421,20 +421,27 @@ struct drm_sched_backend_ops {
>   
>   	/**
>   	 * @run_job: Called to execute the job once all of the dependencies
> -	 * have been resolved. This may be called multiple times, if
> -	 * timedout_job() has happened and drm_sched_job_recovery() decides to
> -	 * try it again.
> +	 * have been resolved.
> +	 *
> +	 * The deprecated drm_sched_resubmit_jobs() (called from
> +	 * drm_sched_backend_ops.timedout_job()) can invoke this again with the

I think it would be "@timedout_job".

> +	 * same parameters. Using this is discouraged because it, presumably,
> +	 * violates dma_fence rules.

I believe it would be "struct dma_fence".

> +	 *
> +	 * TODO: Document which fence rules above.
>   	 *
>   	 * @sched_job: the job to run
>   	 *
> -	 * Returns: dma_fence the driver must signal once the hardware has
> -	 *	completed the job ("hardware fence").
> -	 *
>   	 * Note that the scheduler expects to 'inherit' its own reference to
>   	 * this fence from the callback. It does not invoke an extra
>   	 * dma_fence_get() on it. Consequently, this callback must take a
>   	 * reference for the scheduler, and additional ones for the driver's
>   	 * respective needs.

Would it be possible to add a comment that `run_job()` must check if
`s_fence->finished.error` is different than 0? If you increase the karma
of a job and don't check for `s_fence->finished.error`, you might run a
cancelled job.

> +	 *
> +	 * Return:
> +	 * * On success: dma_fence the driver must signal once the hardware has
> +	 * completed the job ("hardware fence").

A suggestion: "the fence that the driver must signal once the hardware
has completed the job".

Best Regards,
- Maíra

> +	 * * On failure: NULL or an ERR_PTR.
>   	 */
>   	struct dma_fence *(*run_job)(struct drm_sched_job *sched_job);
>   



More information about the dri-devel mailing list