[PATCH] drm/amdgpu: Check hive->reset_domain not NULL before releasing it.
Felix Kuehling
felix.kuehling at amd.com
Tue Nov 1 19:25:25 UTC 2022
On 2022-11-01 14:49, Gavin Wan wrote:
> The recent change brought a bug on SRIOV envrionment. It caused
> kernel crashing while unloading amdgpu on guest VM with hive
> configuration. The reason is that the hive->reset_domain is not
> used (hive->reset_domain is not initialized) for SRIOV, but the
> code did not check if hive->reset_domain before releasing.
>
> The hive->reset_domain need be checked not NULL before releasing.
>
> Fixed: d95e8e97e2d5 ("drm/amdgpu: refine create and release logic of hive info")
The tag should be named "Fixes", not "Fixed".
> Signed-off-by: Gavin Wan <Gavin.Wan at amd.com>
> Change-Id: I17189e4d7357e399c6b70e43c24051356c025a3a
Please remove the Change-Id.
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 11 +++++++++--
> 1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> index 47159e9a0884..371c4f1aac2b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> @@ -217,8 +217,15 @@ static void amdgpu_xgmi_hive_release(struct kobject *kobj)
> struct amdgpu_hive_info *hive = container_of(
> kobj, struct amdgpu_hive_info, kobj);
>
> - amdgpu_reset_put_reset_domain(hive->reset_domain);
> - hive->reset_domain = NULL;
> + /**
Remove the extra *. /** is used to denote doc-comments, and this is not one.
> + * The hive->reset_domain is only initialized for none SRIOV
> + * configuration. It needs to check if hive->reset_domain is
> + * NULL.
> + */
> + if (hive->reset_domain) {
> + amdgpu_reset_put_reset_domain(hive->reset_domain);
It may be better to do the NULL pointer check inside
amdgpu_reset_put_reset_domain. In fact, current staging already has a
check there, so this patch is unnecessary. Just sync your branch. It was
added by this commit:
> commit d6a7ab1e0168a96b6cb0e386399e54af4fe39af4
> Author: Vignesh Chander <Vignesh.Chander at amd.com>
> Date: Wed Sep 28 14:59:45 2022 -0400
>
> drm/amdgpu: Skip put_reset_domain if it doesn't exist
>
> For xgmi sriov, the reset is handled by host driver and
> hive->reset_domain
> is not initialized so need to check if it exists before doing a put.
> Signed-off-by: Vignesh Chander <Vignesh.Chander at amd.com>
> Reviewed-by: Shaoyun Liu <Shaoyun.Liu at amd.com>
Regards,
Felix
> + hive->reset_domain = NULL;
> + }
>
> mutex_destroy(&hive->hive_lock);
> kfree(hive);
More information about the amd-gfx
mailing list