[PATCH] drm/amdgpu: TA unload messages are not actually sent to psp when amdgpu is uninstalled

Andrey Grodzovsky andrey.grodzovsky at amd.com
Wed Aug 17 14:01:28 UTC 2022


On 2022-08-17 09:44, Alex Deucher wrote:
> On Tue, Aug 16, 2022 at 10:54 PM Chai, Thomas <YiPeng.Chai at amd.com> wrote:
>> [AMD Official Use Only - General]
>>
>> Hi Alex:
>>    When removing an amdgpu device, it may be difficult to change the order of psp_hw_fini calls.
>>
>> 1. The drm_dev_unplug call is at the beginning in the amdgpu_pci_remove function,  which makes the gpu device inaccessible for userspace operations.  If the call to psp_hw_fini was moved before drm_dev_unplug,  userspace could access the gpu device but the psp might be removing. It has unknown issues.
>>
> +Andrey Grodzovsky
>
> We should fix the ordering in amdgpu_pci_remove() then I guess?  There
> are lots of places where drm_dev_enter() is used to protect access to
> the hardware which could be similarly affected.
>
> Alex


We probably can try to move drm_dev_unplug after 
amdgpu_driver_unload_kms. I don't remember now why drm_dev_unplug must 
be the first thing we do in amdgpu_pci_remove and what impact it will 
have but maybe give it a try.
Also see if you can run libdrm hotplug test suite before and after the 
change.

Andrey


>
>> 2. psp_hw_fini is called by the .hw_fini iterator in amdgpu_device_ip_fini_early, referring to the code starting from amdgpu_pci_remove to .hw_fini is called,
>>     there are many preparatory operations before calling .hw_fini,  which makes it very difficult to change the order of psp_hw_fini or all block .hw_fini.
>>
>>     So can we do a workaround in psp_cmd_submit_buf when removing amdgpu device?
>>
>> -----Original Message-----
>> From: Alex Deucher <alexdeucher at gmail.com>
>> Sent: Monday, August 15, 2022 10:22 PM
>> To: Chai, Thomas <YiPeng.Chai at amd.com>
>> Cc: amd-gfx at lists.freedesktop.org; Zhang, Hawking <Hawking.Zhang at amd.com>; Chen, Guchun <Guchun.Chen at amd.com>; Chai, Thomas <YiPeng.Chai at amd.com>
>> Subject: Re: [PATCH] drm/amdgpu: TA unload messages are not actually sent to psp when amdgpu is uninstalled
>>
>> On Mon, Aug 15, 2022 at 3:06 AM YiPeng Chai <YiPeng.Chai at amd.com> wrote:
>>> The psp_cmd_submit_buf function is called by psp_hw_fini to send TA
>>> unload messages to psp to terminate ras, asd and tmr.
>>> But when amdgpu is uninstalled, drm_dev_unplug is called earlier than
>>> psp_hw_fini in amdgpu_pci_remove, the calling order as follows:
>>> static void amdgpu_pci_remove(struct pci_dev *pdev) {
>>>          drm_dev_unplug
>>>          ......
>>>          amdgpu_driver_unload_kms->amdgpu_device_fini_hw->...
>>>                  ->.hw_fini->psp_hw_fini->...
>>>                  ->psp_ta_unload->psp_cmd_submit_buf
>>>          ......
>>> }
>>> The program will return when calling drm_dev_enter in
>>> psp_cmd_submit_buf.
>>>
>>> So the call to drm_dev_enter in psp_cmd_submit_buf should be removed,
>>> so that the TA unload messages can be sent to the psp when amdgpu is
>>> uninstalled.
>>>
>>> Signed-off-by: YiPeng Chai <YiPeng.Chai at amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 4 ----
>>>   1 file changed, 4 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>>> index b067ce45d226..0578d8d094a7 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>>> @@ -585,9 +585,6 @@ psp_cmd_submit_buf(struct psp_context *psp,
>>>          if (psp->adev->no_hw_access)
>>>                  return 0;
>>>
>>> -       if (!drm_dev_enter(adev_to_drm(psp->adev), &idx))
>>> -               return 0;
>>> -
>> This check is to prevent the hardware from being accessed if the card is removed.  I think we need to fix the ordering elsewhere.
>>
>> Alex
>>
>>>          memset(psp->cmd_buf_mem, 0, PSP_CMD_BUFFER_SIZE);
>>>
>>>          memcpy(psp->cmd_buf_mem, cmd, sizeof(struct
>>> psp_gfx_cmd_resp)); @@ -651,7 +648,6 @@ psp_cmd_submit_buf(struct psp_context *psp,
>>>          }
>>>
>>>   exit:
>>> -       drm_dev_exit(idx);
>>>          return ret;
>>>   }
>>>
>>> --
>>> 2.25.1
>>>


More information about the amd-gfx mailing list