[PATCH 2/2] drm/sched: clarify the documentation on drm_sched_entity_error

Mon Nov 25 10:00:20 UTC 2024

On Wed, 2024-09-18 at 15:39 +0200, Christian König wrote:
> Sima requested that in a discussion, just copy&paste my explanation
> from
> the mail.
> 
> Signed-off-by: Christian König <christian.koenig at amd.com>
> ---
>  drivers/gpu/drm/scheduler/sched_entity.c | 17 +++++++++++++++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
> b/drivers/gpu/drm/scheduler/sched_entity.c
> index 58c8161289fe..571e2f2365a1 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -166,8 +166,21 @@ bool drm_sched_entity_is_ready(struct
> drm_sched_entity *entity)
>   * drm_sched_entity_error - return error of last scheduled job
>   * @entity: scheduler entity to check

Please move the "Returns:" section to here, so we get a unified style
for the scheduler.

"Returns: the negative error code of the last scheduled job, 0 if there
has been no error, or no job yet."

Regarding your sentence from below:

> "Returns the error of the last scheduled job. Result can change any
> time when
> + * new jobs are pushed to the hw.

what does the last sentence mean? It seems you just want to say that
each time the function is called it can return a new error code?

IMO it should be relatively obvious that this function is racy. If you
want to emphasize it nevertheless I'd do that in a sentence separate
from the "Returns: " section

>   *
> - * Opportunistically return the error of the last scheduled job.
> Result can
> - * change any time when new jobs are pushed to the hw.
> + * Drivers should use this function in two ways:

maybe "in one of two ways" ?

> + *
> + * 1. In it's prepare callback so that when one submission fails all
> following

s/it's/their

> + * from the same ctx are marked with an error number as well.
> + *
> + * This is intentionally done in a driver callback so that driver
> decides if

s/so that driver/so that the driver

> + * they want subsequent submissions to fail or not. That can be
> helpful for
> + * example for in kernel paging queues where submissions don't

s/for in kernel/in kernel

> depend on each
> + * other and a failed submission shouldn't cancel all following.
> + *
> + * 2. In it's submission IOCTL to reject new submissions and inform
> userspace

s/In it's/In their

> + * that it needs to kick of some error handling.
> + *
> + * Returns the error of the last scheduled job. Result can change
> any time when
> + * new jobs are pushed to the hw.
>   */
>  int drm_sched_entity_error(struct drm_sched_entity *entity)
>  {

btw I talked to Dave about the other patch. I think if you provide a v2
with Janis suggestions and mine here we can merge this

Danke,
P.