[PATCH v3] drm/amdgpu: Init zone device and drm client after mode-1 reset on reload

Felix Kuehling felix.kuehling at amd.com
Tue Mar 12 21:40:08 UTC 2024


On 2024-03-08 14:00, Ahmad Rehman wrote:
> In passthrough environment, when amdgpu is reloaded after unload, mode-1
> is triggered after initializing the necessary IPs, That init does not
> include KFD, and KFD init waits until the reset is completed. KFD init
> is called in the reset handler, but in this case, the zone device and
> drm client is not initialized, causing app to create kernel panic.
>
> v2: Removing the init KFD condition from amdgpu_amdkfd_drm_client_create.
> As the previous version has the potential of creating DRM client twice.
>
> v3: v2 patch results in SDMA engine hung as DRM open causes VM clear to SDMA
> before SDAM init. Adding the condition to in drm client creation, on top of v1,
> to guard against drm client creation call multiple times.
>
> Signed-off-by: Ahmad Rehman <Ahmad.Rehman at amd.com>

Reviewed-by: Felix Kuehling <felix.kuehling at amd.com>


> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 4 ++--
>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    | 5 ++++-
>   2 files changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> index f5f2945711be..4389d24f36e2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> @@ -146,8 +146,8 @@ int amdgpu_amdkfd_drm_client_create(struct amdgpu_device *adev)
>   {
>   	int ret;
>   
> -	if (!adev->kfd.init_complete)
> -		return 0;
> +	if (!adev->kfd.init_complete || adev->kfd.client.dev)
> +                return 0;
>   
>   	ret = drm_client_init(&adev->ddev, &adev->kfd.client, "kfd",
>   			      &kfd_client_funcs);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 15b188aaf681..80b9642f2bc4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -2479,8 +2479,11 @@ static void amdgpu_drv_delayed_reset_work_handler(struct work_struct *work)
>   	}
>   	for (i = 0; i < mgpu_info.num_dgpu; i++) {
>   		adev = mgpu_info.gpu_ins[i].adev;
> -		if (!adev->kfd.init_complete)
> +		if (!adev->kfd.init_complete) {
> +			kgd2kfd_init_zone_device(adev);
>   			amdgpu_amdkfd_device_init(adev);
> +			amdgpu_amdkfd_drm_client_create(adev);
> +		}
>   		amdgpu_ttm_set_buffer_funcs_status(adev, true);
>   	}
>   }


More information about the amd-gfx mailing list