[PATCH 6/6] drm/amdgpu: fix fence fallback timer expired error

Zhang, Owen(SRDC) Owen.Zhang2 at amd.com
Mon Apr 28 10:12:42 UTC 2025


[AMD Official Use Only - AMD Internal Distribution Only]

Hi, @Koenig, Christian<mailto:Christian.Koenig at amd.com>

Looking for your expertise... Thanks for support.


Rgds/Owen

From: Zhang, GuoQing (Sam) <GuoQing.Zhang at amd.com>
Sent: Thursday, April 24, 2025 11:39 AM
To: Christian König <ckoenig.leichtzumerken at gmail.com>; amd-gfx at lists.freedesktop.org; Koenig, Christian <Christian.Koenig at amd.com>; Deucher, Alexander <Alexander.Deucher at amd.com>
Cc: Zhao, Victor <Victor.Zhao at amd.com>; Chang, HaiJun <HaiJun.Chang at amd.com>; Deng, Emily <Emily.Deng at amd.com>; Zhang, Owen(SRDC) <Owen.Zhang2 at amd.com>
Subject: Re: [PATCH 6/6] drm/amdgpu: fix fence fallback timer expired error


[AMD Official Use Only - AMD Internal Distribution Only]

Ping... @Koenig, Christian<mailto:Christian.Koenig at amd.com>

Thanks
Sam

From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org<mailto:amd-gfx-bounces at lists.freedesktop.org>> on behalf of Zhang, GuoQing (Sam) <GuoQing.Zhang at amd.com<mailto:GuoQing.Zhang at amd.com>>
Date: Wednesday, April 23, 2025 at 14:59
To: Christian König <ckoenig.leichtzumerken at gmail.com<mailto:ckoenig.leichtzumerken at gmail.com>>, amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org> <amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>>
Cc: Zhao, Victor <Victor.Zhao at amd.com<mailto:Victor.Zhao at amd.com>>, Chang, HaiJun <HaiJun.Chang at amd.com<mailto:HaiJun.Chang at amd.com>>, Deng, Emily <Emily.Deng at amd.com<mailto:Emily.Deng at amd.com>>, Zhang, Owen(SRDC) <Owen.Zhang2 at amd.com<mailto:Owen.Zhang2 at amd.com>>
Subject: Re: [PATCH 6/6] drm/amdgpu: fix fence fallback timer expired error

[AMD Official Use Only - AMD Internal Distribution Only]


[AMD Official Use Only - AMD Internal Distribution Only]

Hi @Christian König<mailto:ckoenig.leichtzumerken at gmail.com>,

On QEMU VM environment, when request_irq() is called in guest KMD, QEMU will enable interrupt for the device on the host.

When hibernate and resume with a new vGPU without calling request_irq() on the new vGPU, the interrupt of the new vGPU is not enabled. The IH handler in guest KMD will not be called in this case.

This change is to ensure request_irq() is called on resume for the new vGPUs.

Regards
Sam

From: Christian König <ckoenig.leichtzumerken at gmail.com<mailto:ckoenig.leichtzumerken at gmail.com>>
Date: Wednesday, April 16, 2025 at 21:54
To: Zhang, GuoQing (Sam) <GuoQing.Zhang at amd.com<mailto:GuoQing.Zhang at amd.com>>, amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org> <amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>>
Cc: Zhao, Victor <Victor.Zhao at amd.com<mailto:Victor.Zhao at amd.com>>, Chang, HaiJun <HaiJun.Chang at amd.com<mailto:HaiJun.Chang at amd.com>>, Deng, Emily <Emily.Deng at amd.com<mailto:Emily.Deng at amd.com>>
Subject: Re: [PATCH 6/6] drm/amdgpu: fix fence fallback timer expired error
Am 14.04.25 um 12:46 schrieb Samuel Zhang:
> IH is not working after switching a new gpu index for the first time.
> IH handler function need to be re-registered with kernel after switching
> to new gpu index.

Why?

Christian.

>
> Signed-off-by: Samuel Zhang <guoqing.zhang at amd.com<mailto:guoqing.zhang at amd.com>>
> Change-Id: Idece1c8fce24032fd08f5a8b6ac23793c51e56dd
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c |  7 +++++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h |  1 +
>  drivers/gpu/drm/amd/amdgpu/vega20_ih.c  | 18 ++++++++++++++++--
>  3 files changed, 22 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> index 19ce4da285e8..2292245a0c5d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> @@ -326,7 +326,7 @@ int amdgpu_irq_init(struct amdgpu_device *adev)
>        return r;
>  }
>
> -void amdgpu_irq_fini_hw(struct amdgpu_device *adev)
> +void amdgpu_irq_uninstall(struct amdgpu_device *adev)
>  {
>        if (adev->irq.installed) {
>                free_irq(adev->irq.irq, adev_to_drm(adev));
> @@ -334,7 +334,10 @@ void amdgpu_irq_fini_hw(struct amdgpu_device *adev)
>                if (adev->irq.msi_enabled)
>                        pci_free_irq_vectors(adev->pdev);
>        }
> -
> +}
> +void amdgpu_irq_fini_hw(struct amdgpu_device *adev)
> +{
> +     amdgpu_irq_uninstall(adev);
>        amdgpu_ih_ring_fini(adev, &adev->irq.ih_soft);
>        amdgpu_ih_ring_fini(adev, &adev->irq.ih);
>        amdgpu_ih_ring_fini(adev, &adev->irq.ih1);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
> index 04c0b4fa17a4..c6e6681b4f71 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
> @@ -123,6 +123,7 @@ extern const int node_id_to_phys_map[NODEID_MAX];
>  void amdgpu_irq_disable_all(struct amdgpu_device *adev);
>
>  int amdgpu_irq_init(struct amdgpu_device *adev);
> +void amdgpu_irq_uninstall(struct amdgpu_device *adev);
>  void amdgpu_irq_fini_sw(struct amdgpu_device *adev);
>  void amdgpu_irq_fini_hw(struct amdgpu_device *adev);
>  int amdgpu_irq_add_id(struct amdgpu_device *adev,
> diff --git a/drivers/gpu/drm/amd/amdgpu/vega20_ih.c b/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
> index faa0dd75dd6d..ef996505e4dc 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
> @@ -643,12 +643,26 @@ static int vega20_ih_hw_fini(struct amdgpu_ip_block *ip_block)
>
>  static int vega20_ih_suspend(struct amdgpu_ip_block *ip_block)
>  {
> -     return vega20_ih_hw_fini(ip_block);
> +     struct amdgpu_device *adev = ip_block->adev;
> +     int r = 0;
> +
> +     r = vega20_ih_hw_fini(ip_block);
> +     amdgpu_irq_uninstall(adev);
> +     return r;
>  }
>
>  static int vega20_ih_resume(struct amdgpu_ip_block *ip_block)
>  {
> -     return vega20_ih_hw_init(ip_block);
> +     struct amdgpu_device *adev = ip_block->adev;
> +     int r = 0;
> +
> +     r = amdgpu_irq_init(adev);
> +     if (r) {
> +             dev_err(adev->dev, "amdgpu_irq_init failed in %s, %d\n", __func__, r);
> +             return r;
> +     }
> +     r = vega20_ih_hw_init(ip_block);
> +     return r;
>  }
>
>  static bool vega20_ih_is_idle(struct amdgpu_ip_block *ip_block)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20250428/f11e3627/attachment-0001.htm>


More information about the amd-gfx mailing list