[PATCH v2 2/3] drm/amdgpu: Handle xgmi device removal.

Alex Deucher alexdeucher at gmail.com
Fri Nov 30 20:20:13 UTC 2018


On Fri, Nov 30, 2018 at 3:12 PM Grodzovsky, Andrey
<Andrey.Grodzovsky at amd.com> wrote:
>
>
> On 11/30/2018 03:08 PM, Alex Deucher wrote:
> > On Fri, Nov 30, 2018 at 3:06 PM Grodzovsky, Andrey
> > <Andrey.Grodzovsky at amd.com> wrote:
> >>
> >>
> >> On 11/30/2018 02:49 PM, Alex Deucher wrote:
> >>> On Fri, Nov 30, 2018 at 1:17 PM Andrey Grodzovsky
> >>> <andrey.grodzovsky at amd.com> wrote:
> >>>> XGMI hive has some resources allocted on device init which
> >>>> needs to be deallocated when the device is unregistered.
> >>>>
> >>>> v2: Remove creation of dedicated wq for XGMI hive reset.
> >>>>
> >>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
> >>>> ---
> >>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  3 +++
> >>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c   | 20 ++++++++++++++++++++
> >>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h   |  1 +
> >>>>    3 files changed, 24 insertions(+)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >>>> index c75badf..bfd286c 100644
> >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >>>> @@ -1864,6 +1864,9 @@ static int amdgpu_device_ip_fini(struct amdgpu_device *adev)
> >>>>    {
> >>>>           int i, r;
> >>>>
> >>>> +       if (adev->gmc.xgmi.num_physical_nodes > 1)
> >>>> +               amdgpu_xgmi_remove_device(adev);
> >>>> +
> >>>>           amdgpu_amdkfd_device_fini(adev);
> >>>>
> >>>>           amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE);
> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> >>>> index fb37e69..38e1599 100644
> >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> >>>> @@ -135,3 +135,23 @@ int amdgpu_xgmi_add_device(struct amdgpu_device *adev)
> >>>>           mutex_unlock(&xgmi_mutex);
> >>>>           return ret;
> >>>>    }
> >>>> +
> >>>> +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev)
> >>>> +{
> >>>> +       struct amdgpu_hive_info *hive;
> >>>> +
> >>>> +       if ((adev->asic_type < CHIP_VEGA20) || (adev->flags & AMD_IS_APU))
> >>>> +               return;
> >>> It would be nice to have something better here to check against.  This
> >>> seems kind of fragile.  Can we check based on some xgmi related
> >>> structure?
> >>>
> >>> Alex
> >> This check is same as in amdgpu_xgmi_add_device, i guess we could
> >> look if adev->gmc.xgmi.head is empty or not since it's only gets added to
> >> hive->device_list if amdgpu_xgmi_add_device was called.
> >> Sounds ok ?
> > Sounds good.  We should probably fix up the other case as well at some point.
> >
> > Thanks!
> >
> > Alex
>
> The other case is in amdgpu_xgmi_add_device - at the beginning - just
> avoids XGMI code path for non VEGA20 dGPUs - how can we avoid this check ?
>

Maybe something like adev->gmc.xgmi.supported and set it in
soc15_set_ip_blocks() for vega20.  We should do something similar for
ras.  That way for new asics we only need to set the variable rather
than updating asic checks scattered around the code.

Alex

> Andrey
>
>
> >
> >> Andrey
> >>
> >>>> +
> >>>> +       mutex_lock(&xgmi_mutex);
> >>>> +
> >>>> +       hive = amdgpu_get_xgmi_hive(adev);
> >>>> +       if (!hive)
> >>>> +               goto exit;
> >>>> +
> >>>> +       if (!(hive->number_devices--))
> >>>> +               mutex_destroy(&hive->hive_lock);
> >>>> +
> >>>> +exit:
> >>>> +       mutex_unlock(&xgmi_mutex);
> >>>> +}
> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
> >>>> index 6335bfd..6151eb9 100644
> >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
> >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
> >>>> @@ -35,5 +35,6 @@ struct amdgpu_hive_info {
> >>>>    struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev);
> >>>>    int amdgpu_xgmi_update_topology(struct amdgpu_hive_info *hive, struct amdgpu_device *adev);
> >>>>    int amdgpu_xgmi_add_device(struct amdgpu_device *adev);
> >>>> +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev);
> >>>>
> >>>>    #endif
> >>>> --
> >>>> 2.7.4
> >>>>
> >>>> _______________________________________________
> >>>> amd-gfx mailing list
> >>>> amd-gfx at lists.freedesktop.org
> >>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>


More information about the amd-gfx mailing list