[PATCH v2] drm/amdgpu: Call amdgpu_device_unmap_mmio() iff device is unplugged to prevent crash in GPU initialization failure
Andrey Grodzovsky
andrey.grodzovsky at amd.com
Thu Dec 16 16:56:22 UTC 2021
Maybe we just should use drm_dev_is_unplugged() for this particular case
because, there would be no race since when device is unplugged it's
final. It's the other way around that requires strict drm_dev_enter/exit
scope.
Andrey
On 2021-12-16 3:38 a.m., Christian König wrote:
> The !drm_dev_enter() is quite unusual and deserves a comment
> explaining what's going on here.
>
> Apart from that it looks good with the typos fixed I think.
>
> Christian.
>
> Am 16.12.21 um 08:27 schrieb Chen, Guchun:
>> [Public]
>>
>> My BAD to misunderstand this.
>>
>> There are both spell typos in patch subject and body, s/iff/if.
>>
>> The patch is: Reviewed-by: Guchun Chen <guchun.chen at amd.com>
>>
>> Please wait for the ack from Andrey and Christian before pushing this.
>>
>> Regards,
>> Guchun
>>
>> -----Original Message-----
>> From: Shi, Leslie <Yuliang.Shi at amd.com>
>> Sent: Thursday, December 16, 2021 3:00 PM
>> To: Chen, Guchun <Guchun.Chen at amd.com>; Grodzovsky, Andrey
>> <Andrey.Grodzovsky at amd.com>; Koenig, Christian
>> <Christian.Koenig at amd.com>; Pan, Xinhui <Xinhui.Pan at amd.com>;
>> Deucher, Alexander <Alexander.Deucher at amd.com>;
>> amd-gfx at lists.freedesktop.org
>> Subject: RE: [PATCH v2] drm/amdgpu: Call amdgpu_device_unmap_mmio()
>> iff device is unplugged to prevent crash in GPU initialization failure
>>
>> [Public]
>>
>> Hi Guchun,
>>
>> As Andrey says, "we should not call amdgpu_device_unmap_mmio unless
>> device is unplugged", I think we should call
>> amdgpu_device_unmap_mmio() only if device is unplugged
>> (drm_dev_enter() return false) .
>>
>> +if (!drm_dev_enter(adev_to_drm(adev), &idx))
>> + amdgpu_device_unmap_mmio(adev);
>> + else
>> # drm_dev_exit(idx);
>>
>>
>> Regards,
>> Leslie
>>
>> -----Original Message-----
>> From: Chen, Guchun <Guchun.Chen at amd.com>
>> Sent: Thursday, December 16, 2021 2:46 PM
>> To: Shi, Leslie <Yuliang.Shi at amd.com>; Grodzovsky, Andrey
>> <Andrey.Grodzovsky at amd.com>; Koenig, Christian
>> <Christian.Koenig at amd.com>; Pan, Xinhui <Xinhui.Pan at amd.com>;
>> Deucher, Alexander <Alexander.Deucher at amd.com>;
>> amd-gfx at lists.freedesktop.org
>> Subject: RE: [PATCH v2] drm/amdgpu: Call amdgpu_device_unmap_mmio()
>> iff device is unplugged to prevent crash in GPU initialization failure
>>
>> [Public]
>>
>> Hi Leslie,
>>
>> I think we need to modify it like:
>>
>> +if (drm_dev_enter(adev_to_drm(adev), &idx)) {
>> + amdgpu_device_unmap_mmio(adev);
>> + drm_dev_exit(idx);
>> +}
>>
>> Also you need to credit Andrey a 'suggested-by' in your patch.
>>
>> Regards,
>> Guchun
>>
>> -----Original Message-----
>> From: Shi, Leslie <Yuliang.Shi at amd.com>
>> Sent: Thursday, December 16, 2021 2:14 PM
>> To: Grodzovsky, Andrey <Andrey.Grodzovsky at amd.com>; Koenig, Christian
>> <Christian.Koenig at amd.com>; Pan, Xinhui <Xinhui.Pan at amd.com>;
>> Deucher, Alexander <Alexander.Deucher at amd.com>;
>> amd-gfx at lists.freedesktop.org
>> Cc: Chen, Guchun <Guchun.Chen at amd.com>; Shi, Leslie
>> <Yuliang.Shi at amd.com>
>> Subject: [PATCH v2] drm/amdgpu: Call amdgpu_device_unmap_mmio() iff
>> device is unplugged to prevent crash in GPU initialization failure
>>
>> [Why]
>> In amdgpu_driver_load_kms, when amdgpu_device_init returns error
>> during driver modprobe, it will start the error handle path
>> immediately and call into amdgpu_device_unmap_mmio as well to release
>> mapped VRAM. However, in the following release callback, driver
>> stills visits the unmapped memory like vcn.inst[i].fw_shared_cpu_addr
>> in vcn_v3_0_sw_fini. So a kernel crash occurs.
>>
>> [How]
>> call amdgpu_device_unmap_mmio() iff device is unplugged to prevent
>> invalid memory address in
>> vcn_v3_0_sw_fini() when GPU initialization failure.
>>
>> Signed-off-by: Leslie Shi <Yuliang.Shi at amd.com>
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 +++++++-
>> 1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index fb03d75880ec..d3656e7b60c2 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -3845,6 +3845,8 @@ static void amdgpu_device_unmap_mmio(struct
>> amdgpu_device *adev)
>> */
>> void amdgpu_device_fini_hw(struct amdgpu_device *adev) {
>> + int idx;
>> +
>> dev_info(adev->dev, "amdgpu: finishing device.\n");
>> flush_delayed_work(&adev->delayed_init_work);
>> if (adev->mman.initialized) {
>> @@ -3888,7 +3890,11 @@ void amdgpu_device_fini_hw(struct
>> amdgpu_device *adev)
>> amdgpu_gart_dummy_page_fini(adev);
>> - amdgpu_device_unmap_mmio(adev);
>> + if (!drm_dev_enter(adev_to_drm(adev), &idx))
>> + amdgpu_device_unmap_mmio(adev);
>> + else
>> + drm_dev_exit(idx);
>> +
>> }
>> void amdgpu_device_fini_sw(struct amdgpu_device *adev)
>> --
>> 2.25.1
>
More information about the amd-gfx
mailing list