[PATCH] drm/amdgpu/sriov add amdgpu_amdkfd_pre_reset in gpu reset

Felix Kuehling felix.kuehling at amd.com
Thu Apr 2 17:25:52 UTC 2020


[+Monk]

This looks reasonable to me. However, you're effectively reverting this
commit by Monk:

a03eb637d2a5 drm/amdgpu: fix KIQ ring test fail in TDR of SRIOV

In hind-sight, Monk's commit was broken. Removing the call to pre_reset
has other consequences, such as breaking notifications about reset to
user mode, and probably invalidating some assumptions in kfd_post_reset.
Can you coordinate with Monk to work out why his change was needed, and
whether you'll need a different solution for the problem he was trying
to address?

In the meanwhile, this patch is

Acked-by: Felix Kuehling <Felix.Kuehling at amd.com>


Am 2020-04-02 um 3:20 a.m. schrieb Jack Zhang:

> kfd_pre_reset will free mem_objs allocated by kfd_gtt_sa_allocate
>
> Without this change, sriov tdr code path will never free those allocated
> memories and get memory leak.
>
> Signed-off-by: Jack Zhang <Jack.Zhang1 at amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 8faaa17..832daf7 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3847,6 +3847,8 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device *adev,
>  {
>  	int r;
>  
> +	amdgpu_amdkfd_pre_reset(adev);
> +
>  	if (from_hypervisor)
>  		r = amdgpu_virt_request_full_gpu(adev, true);
>  	else


More information about the amd-gfx mailing list