[PATCH v2 2/3] drm/amdgpu: Handle xgmi device removal.

Alex Deucher alexdeucher at gmail.com
Fri Nov 30 20:08:36 UTC 2018


On Fri, Nov 30, 2018 at 3:06 PM Grodzovsky, Andrey
<Andrey.Grodzovsky at amd.com> wrote:
>
>
>
> On 11/30/2018 02:49 PM, Alex Deucher wrote:
> > On Fri, Nov 30, 2018 at 1:17 PM Andrey Grodzovsky
> > <andrey.grodzovsky at amd.com> wrote:
> >> XGMI hive has some resources allocted on device init which
> >> needs to be deallocated when the device is unregistered.
> >>
> >> v2: Remove creation of dedicated wq for XGMI hive reset.
> >>
> >> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
> >> ---
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  3 +++
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c   | 20 ++++++++++++++++++++
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h   |  1 +
> >>   3 files changed, 24 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >> index c75badf..bfd286c 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >> @@ -1864,6 +1864,9 @@ static int amdgpu_device_ip_fini(struct amdgpu_device *adev)
> >>   {
> >>          int i, r;
> >>
> >> +       if (adev->gmc.xgmi.num_physical_nodes > 1)
> >> +               amdgpu_xgmi_remove_device(adev);
> >> +
> >>          amdgpu_amdkfd_device_fini(adev);
> >>
> >>          amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE);
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> >> index fb37e69..38e1599 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> >> @@ -135,3 +135,23 @@ int amdgpu_xgmi_add_device(struct amdgpu_device *adev)
> >>          mutex_unlock(&xgmi_mutex);
> >>          return ret;
> >>   }
> >> +
> >> +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev)
> >> +{
> >> +       struct amdgpu_hive_info *hive;
> >> +
> >> +       if ((adev->asic_type < CHIP_VEGA20) || (adev->flags & AMD_IS_APU))
> >> +               return;
> > It would be nice to have something better here to check against.  This
> > seems kind of fragile.  Can we check based on some xgmi related
> > structure?
> >
> > Alex
>
> This check is same as in amdgpu_xgmi_add_device, i guess we could
> look if adev->gmc.xgmi.head is empty or not since it's only gets added to
> hive->device_list if amdgpu_xgmi_add_device was called.
> Sounds ok ?

Sounds good.  We should probably fix up the other case as well at some point.

Thanks!

Alex

>
> Andrey
>
> >
> >> +
> >> +       mutex_lock(&xgmi_mutex);
> >> +
> >> +       hive = amdgpu_get_xgmi_hive(adev);
> >> +       if (!hive)
> >> +               goto exit;
> >> +
> >> +       if (!(hive->number_devices--))
> >> +               mutex_destroy(&hive->hive_lock);
> >> +
> >> +exit:
> >> +       mutex_unlock(&xgmi_mutex);
> >> +}
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
> >> index 6335bfd..6151eb9 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
> >> @@ -35,5 +35,6 @@ struct amdgpu_hive_info {
> >>   struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev);
> >>   int amdgpu_xgmi_update_topology(struct amdgpu_hive_info *hive, struct amdgpu_device *adev);
> >>   int amdgpu_xgmi_add_device(struct amdgpu_device *adev);
> >> +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev);
> >>
> >>   #endif
> >> --
> >> 2.7.4
> >>
> >> _______________________________________________
> >> amd-gfx mailing list
> >> amd-gfx at lists.freedesktop.org
> >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>


More information about the amd-gfx mailing list