[PATCH 5/5] drm/amdgpu: add gpu reset check before page retirement thread runs
Christian König
ckoenig.leichtzumerken at gmail.com
Thu Jun 13 10:21:21 UTC 2024
Am 13.06.24 um 04:25 schrieb YiPeng Chai:
> If gpu is recovering, clear all message reset flags
> in fifo and wait for gpu to complete recovery.
>
> Signed-off-by: YiPeng Chai <YiPeng.Chai at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index 341c9bd0d1a4..bf4f8d439ebe 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -2982,6 +2982,18 @@ static int amdgpu_ras_page_retirement_thread(void *param)
>
> atomic_dec(&con->page_retirement_req_cnt);
>
> + reinit_completion(&con->gpu_reset_completion);
> +
> + if (amdgpu_in_reset(adev) || atomic_read(&con->in_recovery)) {
It's illegal to call amdgpu_in_reset() from outside of the hw specific
backends.
When you want to make the code mutual exclusive with GPU resets you need
to grab the reset lock.
Regards,
Christian.
> + uint32_t reset;
> +
> + amdgpu_ras_clear_poison_fifo_msg_reset_flag(adev, &reset);
> +
> + if (!wait_for_completion_timeout(&con->gpu_reset_completion,
> + msecs_to_jiffies(MAX_GPU_RESET_COMPLETION_TIME)))
> + dev_err(adev->dev, "Waiting for GPU to complete reset timeout!\n");
> + }
> +
> #ifdef HAVE_KFIFO_PUT_NON_POINTER
> if (!amdgpu_ras_get_poison_req(adev, &poison_msg))
> continue;
More information about the amd-gfx
mailing list