[PATCH] drm/amdgpu: Fix the warning info when removing amdgpu device

Christian König ckoenig.leichtzumerken at gmail.com
Tue Mar 7 06:22:18 UTC 2023


The commit message reads a bit bumpy. Generally best practice are:

Short (72 chars or less) summary

More detailed explanatory text. Wrap it to 72 characters. The blank
line separating the summary from the body is critical (unless you omit
the body entirely).

Write your commit message in the imperative: "Fix bug" and not "Fixed
bug" or "Fixes bug." This convention matches up with commit messages
generated by commands like git merge and git revert.

Further paragraphs come after blank lines.

- Bullet points are okay, too.
- Typically a hyphen or asterisk is used for the bullet, followed by a
   single space. Use a hanging indent.

Apart from that the patch is Acked-by: Christian König 
<christian.koenig at amd.com>

Regards,
Christian.

Am 07.03.23 um 03:23 schrieb Chen, Guchun:
> Reviewed-by: Guchun Chen <guchun.chen at amd.com>
>
> Regards,
> Guchun
>
> -----Original Message-----
> From: lyndonli <Lyndon.Li at amd.com>
> Sent: Tuesday, March 7, 2023 10:12 AM
> To: amd-gfx at lists.freedesktop.org
> Cc: Prosyak, Vitaly <Vitaly.Prosyak at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>; Deucher, Alexander <Alexander.Deucher at amd.com>; Chen, Guchun <Guchun.Chen at amd.com>; Xu, Feifei <Feifei.Xu at amd.com>; Ma, Jun <Jun.Ma2 at amd.com>; Li, Lyndon <Lyndon.Li at amd.com>
> Subject: [PATCH] drm/amdgpu: Fix the warning info when removing amdgpu device
>
> Actually, the drm_dev_enter in psp_cmd_submit_buf does not protect anything.
> And it is not used to prevent concurrent access.
> If DRM device is unplugged, it will always check the condition in WARN_ON.
> We'd better not keep adding commands to the list.
> Simply moving the drm_dev_enter/drm_dev_exit higher level will not solve the issue.
> Because psp_cmd_submit_buf is called in many places, such as psp_hw_init-->psp_load_fw, psp_suspend-->psp_xgmi_terminate, amdgpu_device_gpu_recover-->amdgpu_ras_suspend.
> So drop drm_dev_enter/drm_dev_exit in psp_cmd_submit_buf.
>
> When removing amdgpu, the calling order as follows:
> amdgpu_pci_remove
> 	drm_dev_unplug
> 	amdgpu_driver_unload_kms
> 		amdgpu_device_fini_hw
> 			amdgpu_device_ip_fini_early
> 				psp_hw_fini
> 					psp_ras_terminate
> 						psp_ta_unloadye
> 							psp_cmd_submit_buf
>
> [ 4507.740388] Call Trace:
> [ 4507.740389]  <TASK>
> [ 4507.740391]  psp_ta_unload+0x44/0x70 [amdgpu] [ 4507.740485]  psp_ras_terminate+0x4d/0x70 [amdgpu] [ 4507.740575]  psp_hw_fini+0x28/0xa0 [amdgpu] [ 4507.740662]  amdgpu_device_fini_hw+0x328/0x442 [amdgpu] [ 4507.740791]  amdgpu_driver_unload_kms+0x51/0x60 [amdgpu] [ 4507.740875]  amdgpu_pci_remove+0x5a/0x140 [amdgpu] [ 4507.740962]  ? _raw_spin_unlock_irqrestore+0x27/0x43
> [ 4507.740965]  ? __pm_runtime_resume+0x60/0x90 [ 4507.740968]  pci_device_remove+0x39/0xb0 [ 4507.740971]  device_remove+0x46/0x70 [ 4507.740972]  device_release_driver_internal+0xd1/0x160
> [ 4507.740974]  driver_detach+0x4a/0x90
> [ 4507.740975]  bus_remove_driver+0x6c/0xf0 [ 4507.740976]  driver_unregister+0x31/0x50 [ 4507.740977]  pci_unregister_driver+0x40/0x90 [ 4507.740978]  amdgpu_exit+0x15/0x120 [amdgpu]
>
> Signed-off-by: lyndonli <Lyndon.Li at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 17 +----------------
>   1 file changed, 1 insertion(+), 16 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> index 4c617faaa7c9..02f948adae72 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> @@ -603,27 +603,14 @@ psp_cmd_submit_buf(struct psp_context *psp,
>   		   struct psp_gfx_cmd_resp *cmd, uint64_t fence_mc_addr)  {
>   	int ret;
> -	int index, idx;
> +	int index;
>   	int timeout = 20000;
>   	bool ras_intr = false;
>   	bool skip_unsupport = false;
> -	bool dev_entered;
>   
>   	if (psp->adev->no_hw_access)
>   		return 0;
>   
> -	dev_entered = drm_dev_enter(adev_to_drm(psp->adev), &idx);
> -	/*
> -	 * We allow sending PSP messages LOAD_ASD and UNLOAD_TA without acquiring
> -	 * a lock in drm_dev_enter during driver unload because we must call
> -	 * drm_dev_unplug as the beginning  of unload driver sequence . It is very
> -	 * crucial that userspace can't access device instances anymore.
> -	 */
> -	if (!dev_entered)
> -		WARN_ON(psp->cmd_buf_mem->cmd_id != GFX_CMD_ID_LOAD_ASD &&
> -			psp->cmd_buf_mem->cmd_id != GFX_CMD_ID_UNLOAD_TA &&
> -			psp->cmd_buf_mem->cmd_id != GFX_CMD_ID_INVOKE_CMD);
> -
>   	memset(psp->cmd_buf_mem, 0, PSP_CMD_BUFFER_SIZE);
>   
>   	memcpy(psp->cmd_buf_mem, cmd, sizeof(struct psp_gfx_cmd_resp)); @@ -687,8 +674,6 @@ psp_cmd_submit_buf(struct psp_context *psp,
>   	}
>   
>   exit:
> -	if (dev_entered)
> -		drm_dev_exit(idx);
>   	return ret;
>   }
>   
> --
> 2.34.1
>



More information about the amd-gfx mailing list