[PATCH] drm/amdgpu/sriov add amdgpu_amdkfd_pre_reset in gpu reset
Felix Kuehling
felix.kuehling at amd.com
Thu Apr 2 17:25:52 UTC 2020
[+Monk]
This looks reasonable to me. However, you're effectively reverting this
commit by Monk:
a03eb637d2a5 drm/amdgpu: fix KIQ ring test fail in TDR of SRIOV
In hind-sight, Monk's commit was broken. Removing the call to pre_reset
has other consequences, such as breaking notifications about reset to
user mode, and probably invalidating some assumptions in kfd_post_reset.
Can you coordinate with Monk to work out why his change was needed, and
whether you'll need a different solution for the problem he was trying
to address?
In the meanwhile, this patch is
Acked-by: Felix Kuehling <Felix.Kuehling at amd.com>
Am 2020-04-02 um 3:20 a.m. schrieb Jack Zhang:
> kfd_pre_reset will free mem_objs allocated by kfd_gtt_sa_allocate
>
> Without this change, sriov tdr code path will never free those allocated
> memories and get memory leak.
>
> Signed-off-by: Jack Zhang <Jack.Zhang1 at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 8faaa17..832daf7 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3847,6 +3847,8 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device *adev,
> {
> int r;
>
> + amdgpu_amdkfd_pre_reset(adev);
> +
> if (from_hypervisor)
> r = amdgpu_virt_request_full_gpu(adev, true);
> else
More information about the amd-gfx
mailing list