[PATCH] drm/scheduler: put killed job cleanup to worker

Grodzovsky, Andrey Andrey.Grodzovsky at amd.com
Wed Jul 3 14:23:30 UTC 2019


On 7/3/19 6:28 AM, Lucas Stach wrote:
> drm_sched_entity_kill_jobs_cb() is called right from the last scheduled
> job finished fence signaling. As this might happen from IRQ context we
> now end up calling the GPU driver free_job callback in IRQ context, while
> all other paths call it from normal process context.
>
> Etnaviv in particular calls core kernel functions that are only valid to
> be called from process context when freeing the job. Other drivers might
> have similar issues, but I did not validate this. Fix this by punting
> the cleanup work into a work item, so the driver expectations are met.
>
> Signed-off-by: Lucas Stach <l.stach at pengutronix.de>
> ---
>   drivers/gpu/drm/scheduler/sched_entity.c | 28 ++++++++++++++----------
>   1 file changed, 17 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> index 35ddbec1375a..ba4eb66784b9 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -202,23 +202,23 @@ long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout)
>   }
>   EXPORT_SYMBOL(drm_sched_entity_flush);
>   
> -/**
> - * drm_sched_entity_kill_jobs - helper for drm_sched_entity_kill_jobs
> - *
> - * @f: signaled fence
> - * @cb: our callback structure
> - *
> - * Signal the scheduler finished fence when the entity in question is killed.
> - */
> +static void drm_sched_entity_kill_work(struct work_struct *work)
> +{
> +	struct drm_sched_job *job = container_of(work, struct drm_sched_job,
> +						 finish_work);
> +
> +	drm_sched_fence_finished(job->s_fence);
> +	WARN_ON(job->s_fence->parent);
> +	job->sched->ops->free_job(job);
> +}
> +
>   static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f,
>   					  struct dma_fence_cb *cb)
>   {
>   	struct drm_sched_job *job = container_of(cb, struct drm_sched_job,
>   						 finish_cb);
>   
> -	drm_sched_fence_finished(job->s_fence);
> -	WARN_ON(job->s_fence->parent);
> -	job->sched->ops->free_job(job);
> +	schedule_work(&job->finish_work);
>   }
>   
>   /**
> @@ -240,6 +240,12 @@ static void drm_sched_entity_kill_jobs(struct drm_sched_entity *entity)
>   		drm_sched_fence_scheduled(s_fence);
>   		dma_fence_set_error(&s_fence->finished, -ESRCH);
>   
> +		/*
> +		 * Replace regular finish work function with one that just
> +		 * kills the job.
> +		 */
> +		job->finish_work.func = drm_sched_entity_kill_work;


I rechecked the latest code and finish_work was removed in ffae3e5 
'drm/scheduler: rework job destruction'

Andrey


> +
>   		/*
>   		 * When pipe is hanged by older entity, new entity might
>   		 * not even have chance to submit it's first job to HW


More information about the dri-devel mailing list