[PATCH 2/2] drm/amdgpu: Process fences on IH overflow
Christian König
ckoenig.leichtzumerken at gmail.com
Mon Jan 15 10:26:03 UTC 2024
Am 14.01.24 um 14:00 schrieb Friedrich Vock:
> If the IH ring buffer overflows, it's possible that fence signal events
> were lost. Check each ring for progress to prevent job timeouts/GPU
> hangs due to the fences staying unsignaled despite the work being done.
That's completely unnecessary and in some cases even harmful.
We already have a timeout handler for that and overflows point to severe
system problem so they should never occur in a production system.
Regards,
Christian.
>
> Cc: Joshua Ashton <joshua at froggi.es>
> Cc: Alex Deucher <alexander.deucher at amd.com>
> Cc: stable at vger.kernel.org
>
> Signed-off-by: Friedrich Vock <friedrich.vock at gmx.de>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
> index f3b0aaf3ebc6..2a246db1d3a7 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
> @@ -209,6 +209,7 @@ int amdgpu_ih_process(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih)
> {
> unsigned int count;
> u32 wptr;
> + int i;
>
> if (!ih->enabled || adev->shutdown)
> return IRQ_NONE;
> @@ -227,6 +228,20 @@ int amdgpu_ih_process(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih)
> ih->rptr &= ih->ptr_mask;
> }
>
> + /* If the ring buffer overflowed, we might have lost some fence
> + * signal interrupts. Check if there was any activity so the signal
> + * doesn't get lost.
> + */
> + if (ih->overflow) {
> + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
> + struct amdgpu_ring *ring = adev->rings[i];
> +
> + if (!ring || !ring->fence_drv.initialized)
> + continue;
> + amdgpu_fence_process(ring);
> + }
> + }
> +
> amdgpu_ih_set_rptr(adev, ih);
> wake_up_all(&ih->wait_process);
>
> --
> 2.43.0
>
More information about the amd-gfx
mailing list