[PATCH] drm/amdgpu:fix gpu recover missing skipping

Yu, Xiangliang Xiangliang.Yu at amd.com
Wed Nov 8 07:13:42 UTC 2017


Reviewed-and-tested-By: Xiangliang Yu <Xiangliang.Yu at amd.com>


> -----Original Message-----
> From: amd-gfx [mailto:amd-gfx-bounces at lists.freedesktop.org] On Behalf
> Of Monk Liu
> Sent: Wednesday, November 08, 2017 2:39 PM
> To: amd-gfx at lists.freedesktop.org
> Cc: Liu, Monk <Monk.Liu at amd.com>
> Subject: [PATCH] drm/amdgpu:fix gpu recover missing skipping
> 
> if app close CTX right after IB submit, gpu recover will failed to find out the
> entity/ctx behind the guilty job thus lead to bad job skipping in scheduler
> failed
> 
> to fix this corner case just move the job->karma increasing out of the
> condition that the backing entity was found that way the job itself will be
> "guilty" anyway
> 
> Change-Id: Ia30f02df9297a343d6d8dace496e237827dd1548
> Signed-off-by: Monk Liu <Monk.Liu at amd.com>
> ---
>  drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
> b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
> index 7aa6455..720fd1b 100644
> --- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
> +++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
> @@ -464,6 +464,7 @@ void amd_sched_hw_job_reset(struct
> amd_gpu_scheduler *sched, struct amd_sched_jo
>  	spin_unlock(&sched->job_list_lock);
> 
>  	if (bad) {
> +		atomic_inc(&bad->karma);
>  		/* don't increase @bad's karma if it's from KERNEL RQ,
>  		 * becuase sometimes GPU hang would cause kernel jobs
> (like VM updating jobs)
>  		 * corrupt but keep in mind that kernel jobs always
> considered good.
> @@ -474,7 +475,7 @@ void amd_sched_hw_job_reset(struct
> amd_gpu_scheduler *sched, struct amd_sched_jo
>  			spin_lock(&rq->lock);
>  			list_for_each_entry_safe(entity, tmp, &rq->entities,
> list) {
>  				if (bad->s_fence->scheduled.context ==
> entity->fence_context) {
> -				    if (atomic_inc_return(&bad->karma) > bad-
> >sched->hang_limit)
> +				    if (atomic_read(&bad->karma) > bad-
> >sched->hang_limit)
>  						if (entity->guilty)
>  							atomic_set(entity-
> >guilty, 1);
>  					break;
> --
> 2.7.4
> 
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


More information about the amd-gfx mailing list