[PATCH v2] drm/amdgpu: Call amdgpu_device_unmap_mmio() iff device is unplugged to prevent crash in GPU initialization failure

Christian König ckoenig.leichtzumerken at gmail.com
Thu Dec 16 08:38:54 UTC 2021


The !drm_dev_enter() is quite unusual and deserves a comment explaining 
what's going on here.

Apart from that it looks good with the typos fixed I think.

Christian.

Am 16.12.21 um 08:27 schrieb Chen, Guchun:
> [Public]
>
> My BAD to misunderstand this.
>
> There are both spell typos in patch subject and body, s/iff/if.
>
> The patch is: Reviewed-by: Guchun Chen <guchun.chen at amd.com>
>
> Please wait for the ack from Andrey and Christian before pushing this.
>
> Regards,
> Guchun
>
> -----Original Message-----
> From: Shi, Leslie <Yuliang.Shi at amd.com>
> Sent: Thursday, December 16, 2021 3:00 PM
> To: Chen, Guchun <Guchun.Chen at amd.com>; Grodzovsky, Andrey <Andrey.Grodzovsky at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>; Pan, Xinhui <Xinhui.Pan at amd.com>; Deucher, Alexander <Alexander.Deucher at amd.com>; amd-gfx at lists.freedesktop.org
> Subject: RE: [PATCH v2] drm/amdgpu: Call amdgpu_device_unmap_mmio() iff device is unplugged to prevent crash in GPU initialization failure
>
> [Public]
>
> Hi Guchun,
>
> As Andrey says, "we should not call amdgpu_device_unmap_mmio unless device is unplugged", I think we should call amdgpu_device_unmap_mmio() only if device is unplugged (drm_dev_enter() return false) .
>
> +if (!drm_dev_enter(adev_to_drm(adev), &idx))
> +	amdgpu_device_unmap_mmio(adev);
> + else
> # 	drm_dev_exit(idx);
>
>
> Regards,
> Leslie
>
> -----Original Message-----
> From: Chen, Guchun <Guchun.Chen at amd.com>
> Sent: Thursday, December 16, 2021 2:46 PM
> To: Shi, Leslie <Yuliang.Shi at amd.com>; Grodzovsky, Andrey <Andrey.Grodzovsky at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>; Pan, Xinhui <Xinhui.Pan at amd.com>; Deucher, Alexander <Alexander.Deucher at amd.com>; amd-gfx at lists.freedesktop.org
> Subject: RE: [PATCH v2] drm/amdgpu: Call amdgpu_device_unmap_mmio() iff device is unplugged to prevent crash in GPU initialization failure
>
> [Public]
>
> Hi Leslie,
>
> I think we need to modify it like:
>
> +if (drm_dev_enter(adev_to_drm(adev), &idx)) {
> +	amdgpu_device_unmap_mmio(adev);
> +	drm_dev_exit(idx);
> +}
>
> Also you need to credit Andrey a 'suggested-by' in your patch.
>
> Regards,
> Guchun
>
> -----Original Message-----
> From: Shi, Leslie <Yuliang.Shi at amd.com>
> Sent: Thursday, December 16, 2021 2:14 PM
> To: Grodzovsky, Andrey <Andrey.Grodzovsky at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>; Pan, Xinhui <Xinhui.Pan at amd.com>; Deucher, Alexander <Alexander.Deucher at amd.com>; amd-gfx at lists.freedesktop.org
> Cc: Chen, Guchun <Guchun.Chen at amd.com>; Shi, Leslie <Yuliang.Shi at amd.com>
> Subject: [PATCH v2] drm/amdgpu: Call amdgpu_device_unmap_mmio() iff device is unplugged to prevent crash in GPU initialization failure
>
> [Why]
> In amdgpu_driver_load_kms, when amdgpu_device_init returns error during driver modprobe, it will start the error handle path immediately and call into amdgpu_device_unmap_mmio as well to release mapped VRAM. However, in the following release callback, driver stills visits the unmapped memory like vcn.inst[i].fw_shared_cpu_addr in vcn_v3_0_sw_fini. So a kernel crash occurs.
>
> [How]
> call amdgpu_device_unmap_mmio() iff device is unplugged to prevent invalid memory address in
> vcn_v3_0_sw_fini() when GPU initialization failure.
>
> Signed-off-by: Leslie Shi <Yuliang.Shi at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 +++++++-
>   1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index fb03d75880ec..d3656e7b60c2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3845,6 +3845,8 @@ static void amdgpu_device_unmap_mmio(struct amdgpu_device *adev)
>    */
>   void amdgpu_device_fini_hw(struct amdgpu_device *adev)  {
> +	int idx;
> +
>   	dev_info(adev->dev, "amdgpu: finishing device.\n");
>   	flush_delayed_work(&adev->delayed_init_work);
>   	if (adev->mman.initialized) {
> @@ -3888,7 +3890,11 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
>   
>   	amdgpu_gart_dummy_page_fini(adev);
>   
> -	amdgpu_device_unmap_mmio(adev);
> +	if (!drm_dev_enter(adev_to_drm(adev), &idx))
> +		amdgpu_device_unmap_mmio(adev);
> +	else
> +		drm_dev_exit(idx);
> +
>   }
>   
>   void amdgpu_device_fini_sw(struct amdgpu_device *adev)
> --
> 2.25.1



More information about the amd-gfx mailing list