[PATCH 2/2] drm/amdgpu: Mark ctx as guilty in ring_soft_recovery path

Friedrich Vock friedrich.vock at gmx.de
Sat Jan 13 14:24:16 UTC 2024


On 13.01.24 15:02, Joshua Ashton wrote:
> We need to bump the karma of the drm_sched job in order for the context
> that we just recovered to get correct feedback that it is guilty of
> hanging.
>
> Without this feedback, the application may keep pushing through the soft
> recoveries, continually hanging the system with jobs that timeout.
>
> There is an accompanying Mesa/RADV patch here
> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27050
> to properly handle device loss state when VRAM is not lost.
>
> With these, I was able to run Counter-Strike 2 and launch an application
> which can fault the GPU in a variety of ways, and still have Steam +
> Counter-Strike 2 + Gamescope (compositor) stay up and continue
> functioning on Steam Deck.
>
> Signed-off-by: Joshua Ashton <joshua at froggi.es>
Tested-by: Friedrich Vock <friedrich.vock at gmx.de>
>
> Cc: Friedrich Vock <friedrich.vock at gmx.de>
> Cc: Bas Nieuwenhuizen <bas at basnieuwenhuizen.nl>
> Cc: Christian König <christian.koenig at amd.com>
> Cc: André Almeida <andrealmeid at igalia.com>
> Cc: stable at vger.kernel.org
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 2 ++
>   1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> index 25209ce54552..e87cafb5b1c3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> @@ -448,6 +448,8 @@ bool amdgpu_ring_soft_recovery(struct amdgpu_ring *ring, struct amdgpu_job *job)
>   		dma_fence_set_error(fence, -ENODATA);
>   	spin_unlock_irqrestore(fence->lock, flags);
>
> +	if (job->vm)
> +		drm_sched_increase_karma(&job->base);
>   	atomic_inc(&ring->adev->gpu_reset_counter);
>   	while (!dma_fence_is_signaled(fence) &&
>   	       ktime_to_ns(ktime_sub(deadline, ktime_get())) > 0)


More information about the amd-gfx mailing list