[PATCH] drm/amdgpu: fix a memory protection fault when remove amdgpu device

Chen, Jiansong (Simon) Jiansong.Chen at amd.com
Wed Dec 30 07:32:14 UTC 2020


[AMD Official Use Only - Internal Distribution Only]

-----Original Message-----
From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of Dennis Li
Sent: Wednesday, December 30, 2020 2:46 PM
To: amd-gfx at lists.freedesktop.org; Clements, John <John.Clements at amd.com>; Zhang, Hawking <Hawking.Zhang at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>
Cc: Li, Dennis <Dennis.Li at amd.com>
Subject: [PATCH] drm/amdgpu: fix a memory protection fault when remove amdgpu device

ASD and TA share the same firmware in SIENNA_CICHLID and only TA firmware is requested during boot, so only need release TA firmware when remove device.

[   83.877150] general protection fault, probably for non-canonical address 0x1269f97e6ed04095: 0000 [#1] SMP PTI
[   83.888076] CPU: 0 PID: 1312 Comm: modprobe Tainted: G        W  OE     5.9.0-rc5-deli-amd-vangogh-0.0.6.6-114-gdd99d5669a96-dirty #2
[   83.901160] Hardware name: System manufacturer System Product Name/TUF Z370-PLUS GAMING II, BIOS 0411 09/21/2018
[   83.912353] RIP: 0010:free_fw_priv+0xc/0x120
[   83.917531] Code: e8 99 cd b0 ff b8 a1 ff ff ff eb 9f 4c 89 f7 e8 8a cd b0 ff b8 f4 ff ff ff eb 90 0f 1f 00 0f 1f 44 00 00 55 48 89 e5 41 54 53 <4c> 8b 67 18 48 89 fb 4c 89 e7 e8 45 94 41 00 b8 ff ff ff ff f0 0f
[   83.937576] RSP: 0018:ffffbc34c13a3ce0 EFLAGS: 00010206
[   83.943699] RAX: ffffffffbb681850 RBX: ffffa047f117eb60 RCX: 0000000080800055
[   83.951879] RDX: ffffbc34c1d5f000 RSI: 0000000080800055 RDI: 1269f97e6ed04095
[   83.959955] RBP: ffffbc34c13a3cf0 R08: 0000000000000000 R09: 0000000000000001
[   83.968107] R10: ffffbc34c13a3cc8 R11: 00000000ffffff00 R12: ffffa047d6b23378
[   83.976166] R13: ffffa047d6b23338 R14: ffffa047d6b240c8 R15: 0000000000000000
[   83.984295] FS:  00007f74f6712540(0000) GS:ffffa047fbe00000(0000) knlGS:0000000000000000
[   83.993323] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   84.000056] CR2: 0000556a1cca4e18 CR3: 000000021faa8004 CR4: 00000000003706f0
[   84.008128] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   84.016155] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   84.024174] Call Trace:
[   84.027514]  release_firmware.part.11+0x4b/0x70
[   84.033017]  release_firmware+0x13/0x20
[   84.037803]  psp_sw_fini+0x77/0xb0 [amdgpu]
[   84.042857]  amdgpu_device_fini+0x38c/0x5d0 [amdgpu]
[   84.048815]  amdgpu_driver_unload_kms+0x43/0x70 [amdgpu]
[   84.055055]  drm_dev_unregister+0x73/0xb0 [drm]
[   84.060499]  drm_dev_unplug+0x28/0x30 [drm]
[   84.065598]  amdgpu_dev_uninit+0x1b/0x40 [amdgpu]
[   84.071223]  amdgpu_pci_remove+0x4e/0x70 [amdgpu]
[   84.076835]  pci_device_remove+0x3e/0xc0
[   84.081609]  device_release_driver_internal+0xfb/0x1c0
[   84.087558]  driver_detach+0x4d/0xa0
[   84.092041]  bus_remove_driver+0x5f/0xe0
[   84.096854]  driver_unregister+0x2f/0x50
[   84.101594]  pci_unregister_driver+0x22/0xa0
[   84.106806]  amdgpu_exit+0x15/0x2b [amdgpu]

Signed-off-by: Dennis Li <Dennis.Li at amd.com>
Change-Id: Icc981a421499dff844855d5a662e91d1730c2754

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index eb19ae734396..b44b46dd60f2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -564,7 +564,7 @@ static int psp_asd_load(struct psp_context *psp)
  * add workaround to bypass it for sriov now.
  * TODO: add version check to make it common
  */
-if (amdgpu_sriov_vf(psp->adev) || !psp->asd_fw)
+if (amdgpu_sriov_vf(psp->adev) || !psp->asd_ucode_size)
 return 0;

 cmd = kzalloc(sizeof(struct psp_gfx_cmd_resp), GFP_KERNEL); @@ -2779,11 +2779,10 @@ static int parse_ta_bin_descriptor(struct psp_context *psp,
switch (desc->fw_type) {
 case TA_FW_TYPE_PSP_ASD:
-psp->asd_fw_version   = le32_to_cpu(desc->fw_version);
+psp->asd_fw_version        = le32_to_cpu(desc->fw_version);
 psp->asd_feature_version   = le32_to_cpu(desc->fw_version);
-psp->asd_ucode_size   = le32_to_cpu(desc->size_bytes);
+psp->asd_ucode_size        = le32_to_cpu(desc->size_bytes);

Seems the above 2 changes are irrelevant.

 psp->asd_start_addr    = ucode_start_addr;
-psp->asd_fw                = psp->ta_fw;
 break;
 case TA_FW_TYPE_PSP_XGMI:
 psp->ta_xgmi_ucode_version = le32_to_cpu(desc->fw_version);
--
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx at lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CJiansong.Chen%40amd.com%7C84cb071ec63b4873b95708d8ac8e8c73%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637449075549697811%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=%2F%2BnptRNjREYYlH9FWzP9%2BvbKO3AhrV3XoSN6Kq%2Bh%2BQI%3D&reserved=0


More information about the amd-gfx mailing list