[PATCH] drm/amdgpu: Init zone device and drm client after mode-1 reset on reload

Rehman, Ahmad Ahmad.Rehman at amd.com
Tue Mar 5 00:20:14 UTC 2024


[AMD Official Use Only - General]

Hey,


Due to mode-1 reset (pending_reset), the amdgpu_amdkfd_device_init will not be called and hence adev->kfd.init_complete will not be set. The function amdgpu_amdkfd_drm_client_create has condition:
if (!adev->kfd.init_complete)
                return 0;
So, in probe function, when we return from device_init the KFD is not initialized and amdgpu_amdkfd_drm_client_create returns without doing anything.

Thanks,
Ahmad

________________________________
From: Kuehling, Felix <Felix.Kuehling at amd.com>
Sent: Monday, March 4, 2024 6:39 PM
To: Rehman, Ahmad <Ahmad.Rehman at amd.com>; amd-gfx at lists.freedesktop.org <amd-gfx at lists.freedesktop.org>
Cc: Wan, Gavin <Gavin.Wan at amd.com>
Subject: Re: [PATCH] drm/amdgpu: Init zone device and drm client after mode-1 reset on reload


On 2024-03-04 17:05, Ahmad Rehman wrote:
> In passthrough environment, when amdgpu is reloaded after unload, mode-1
> is triggered after initializing the necessary IPs, That init does not
> include KFD, and KFD init waits until the reset is completed. KFD init
> is called in the reset handler, but in this case, the zone device and
> drm client is not initialized, causing app to create kernel panic.
>
> Signed-off-by: Ahmad Rehman <Ahmad.Rehman at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 15b188aaf681..80b9642f2bc4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -2479,8 +2479,11 @@ static void amdgpu_drv_delayed_reset_work_handler(struct work_struct *work)
>        }
>        for (i = 0; i < mgpu_info.num_dgpu; i++) {
>                adev = mgpu_info.gpu_ins[i].adev;
> -             if (!adev->kfd.init_complete)
> +             if (!adev->kfd.init_complete) {
> +                     kgd2kfd_init_zone_device(adev);
>                        amdgpu_amdkfd_device_init(adev);
> +                     amdgpu_amdkfd_drm_client_create(adev);
I don't see what's preventing the DRM client initialization in the
reset-on-driver-load case. It only needs to be created once and that
happens in amdgpu_pci_probe. Am I missing anything?

Regards,
   Felix


> +             }
>                amdgpu_ttm_set_buffer_funcs_status(adev, true);
>        }
>   }
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20240305/774ed984/attachment-0001.htm>


More information about the amd-gfx mailing list