[PATCH] drm/amdgpu: add drm_dev_unplug() in GPU initialization failure to prevent crash

Chen, Guchun Guchun.Chen at amd.com
Wed Dec 15 13:28:14 UTC 2021


[Public]

Hi Christian,

Your question is a really good one. The patch to unmap MMOI in such early phase is from Andrey's patch: drm/amdgpu: Unmap all MMIO mappings. It's a patch half a year ago, and everything looks fine till this case.

Regards,
Guchun

-----Original Message-----
From: Koenig, Christian <Christian.Koenig at amd.com> 
Sent: Wednesday, December 15, 2021 7:00 PM
To: Shi, Leslie <Yuliang.Shi at amd.com>; Grodzovsky, Andrey <Andrey.Grodzovsky at amd.com>; Pan, Xinhui <Xinhui.Pan at amd.com>; Deucher, Alexander <Alexander.Deucher at amd.com>; amd-gfx at lists.freedesktop.org
Cc: Chen, Guchun <Guchun.Chen at amd.com>
Subject: Re: [PATCH] drm/amdgpu: add drm_dev_unplug() in GPU initialization failure to prevent crash

Am 15.12.21 um 09:46 schrieb Leslie Shi:
> [Why]
> In amdgpu_driver_load_kms, when amdgpu_device_init returns error 
> during driver modprobe, it will start the error handle path 
> immediately and call into amdgpu_device_unmap_mmio as well to release 
> mapped VRAM. However, in the following release callback, driver stills visits the unmapped memory like vcn.inst[i].fw_shared_cpu_addr in vcn_v3_0_sw_fini. So a kernel crash occurs.

Mhm, interesting workaround but I'm not sure that's the right thing to do.

Question is why are we unmapping the MMIO space on driver load failure so early in the first place? I mean don't we need to clean up a bit?

If that's really the way to go then we should at least add a comment explaining why it's done that way.

Regards,
Christian.

>
> [How]
> Add drm_dev_unplug() before executing amdgpu_driver_unload_kms to prevent such crash.
> GPU initialization failure is somehow allowed, but a kernel crash in this case should never happen.
>
> Signed-off-by: Leslie Shi <Yuliang.Shi at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 2 ++
>   1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index 651c7abfde03..7bf6aecdbb92 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -268,6 +268,8 @@ int amdgpu_driver_load_kms(struct amdgpu_device *adev, unsigned long flags)
>   		/* balance pm_runtime_get_sync in amdgpu_driver_unload_kms */
>   		if (adev->rmmio && adev->runpm)
>   			pm_runtime_put_noidle(dev->dev);
> +
> +		drm_dev_unplug(dev);
>   		amdgpu_driver_unload_kms(dev);
>   	}
>   


More information about the amd-gfx mailing list