[PATCH] drm/amdgpu:fix gpu recover missing skipping(v2)
Yu, Xiangliang
Xiangliang.Yu at amd.com
Wed Nov 8 07:25:50 UTC 2017
Reviewed-By: Xiangliang Yu <Xiangliang.Yu at amd.com>
> -----Original Message-----
> From: amd-gfx [mailto:amd-gfx-bounces at lists.freedesktop.org] On Behalf
> Of Monk Liu
> Sent: Wednesday, November 08, 2017 3:08 PM
> To: amd-gfx at lists.freedesktop.org
> Cc: Liu, Monk <Monk.Liu at amd.com>
> Subject: [PATCH] drm/amdgpu:fix gpu recover missing skipping(v2)
>
> if app close CTX right after IB submit, gpu recover will fail to find out the
> entity behind this guilty job thus lead to no job skipping for this guilty job.
>
> to fix this corner case just move the increasement of
> job->karma out of the entity iteration.
>
> v2:
> only do karma increasment if bad->s_priority != KERNEL because we always
> consider KERNEL job be correct and always want to recover an unfinished
> kernel job (sometimes kernel job is interrupted by VF FLR or other GPU hang
> event)
>
> Change-Id: I33e9e959e182d7e002a2108e565cb898acac4f9c
> Signed-off-by: Monk Liu <Monk.Liu at amd.com>
> ---
> drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
> b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
> index 7aa6455..c999026 100644
> --- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
> +++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
> @@ -463,7 +463,8 @@ void amd_sched_hw_job_reset(struct
> amd_gpu_scheduler *sched, struct amd_sched_jo
> }
> spin_unlock(&sched->job_list_lock);
>
> - if (bad) {
> + if (bad && bad->s_priority != AMD_SCHED_PRIORITY_KERNEL) {
> + atomic_inc(&bad->karma);
> /* don't increase @bad's karma if it's from KERNEL RQ,
> * becuase sometimes GPU hang would cause kernel jobs
> (like VM updating jobs)
> * corrupt but keep in mind that kernel jobs always
> considered good.
> @@ -474,7 +475,7 @@ void amd_sched_hw_job_reset(struct
> amd_gpu_scheduler *sched, struct amd_sched_jo
> spin_lock(&rq->lock);
> list_for_each_entry_safe(entity, tmp, &rq->entities,
> list) {
> if (bad->s_fence->scheduled.context ==
> entity->fence_context) {
> - if (atomic_inc_return(&bad->karma) > bad-
> >sched->hang_limit)
> + if (atomic_read(&bad->karma) > bad-
> >sched->hang_limit)
> if (entity->guilty)
> atomic_set(entity-
> >guilty, 1);
> break;
> --
> 2.7.4
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
More information about the amd-gfx
mailing list