[PATCH v2 2/3] drm/amdgpu: Handle xgmi device removal.

Grodzovsky, Andrey Andrey.Grodzovsky at amd.com
Fri Nov 30 20:12:32 UTC 2018


On 11/30/2018 03:08 PM, Alex Deucher wrote:
> On Fri, Nov 30, 2018 at 3:06 PM Grodzovsky, Andrey
> <Andrey.Grodzovsky at amd.com> wrote:
>>
>>
>> On 11/30/2018 02:49 PM, Alex Deucher wrote:
>>> On Fri, Nov 30, 2018 at 1:17 PM Andrey Grodzovsky
>>> <andrey.grodzovsky at amd.com> wrote:
>>>> XGMI hive has some resources allocted on device init which
>>>> needs to be deallocated when the device is unregistered.
>>>>
>>>> v2: Remove creation of dedicated wq for XGMI hive reset.
>>>>
>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  3 +++
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c   | 20 ++++++++++++++++++++
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h   |  1 +
>>>>    3 files changed, 24 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> index c75badf..bfd286c 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> @@ -1864,6 +1864,9 @@ static int amdgpu_device_ip_fini(struct amdgpu_device *adev)
>>>>    {
>>>>           int i, r;
>>>>
>>>> +       if (adev->gmc.xgmi.num_physical_nodes > 1)
>>>> +               amdgpu_xgmi_remove_device(adev);
>>>> +
>>>>           amdgpu_amdkfd_device_fini(adev);
>>>>
>>>>           amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE);
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
>>>> index fb37e69..38e1599 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
>>>> @@ -135,3 +135,23 @@ int amdgpu_xgmi_add_device(struct amdgpu_device *adev)
>>>>           mutex_unlock(&xgmi_mutex);
>>>>           return ret;
>>>>    }
>>>> +
>>>> +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev)
>>>> +{
>>>> +       struct amdgpu_hive_info *hive;
>>>> +
>>>> +       if ((adev->asic_type < CHIP_VEGA20) || (adev->flags & AMD_IS_APU))
>>>> +               return;
>>> It would be nice to have something better here to check against.  This
>>> seems kind of fragile.  Can we check based on some xgmi related
>>> structure?
>>>
>>> Alex
>> This check is same as in amdgpu_xgmi_add_device, i guess we could
>> look if adev->gmc.xgmi.head is empty or not since it's only gets added to
>> hive->device_list if amdgpu_xgmi_add_device was called.
>> Sounds ok ?
> Sounds good.  We should probably fix up the other case as well at some point.
>
> Thanks!
>
> Alex

The other case is in amdgpu_xgmi_add_device - at the beginning - just 
avoids XGMI code path for non VEGA20 dGPUs - how can we avoid this check ?

Andrey


>
>> Andrey
>>
>>>> +
>>>> +       mutex_lock(&xgmi_mutex);
>>>> +
>>>> +       hive = amdgpu_get_xgmi_hive(adev);
>>>> +       if (!hive)
>>>> +               goto exit;
>>>> +
>>>> +       if (!(hive->number_devices--))
>>>> +               mutex_destroy(&hive->hive_lock);
>>>> +
>>>> +exit:
>>>> +       mutex_unlock(&xgmi_mutex);
>>>> +}
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
>>>> index 6335bfd..6151eb9 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
>>>> @@ -35,5 +35,6 @@ struct amdgpu_hive_info {
>>>>    struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev);
>>>>    int amdgpu_xgmi_update_topology(struct amdgpu_hive_info *hive, struct amdgpu_device *adev);
>>>>    int amdgpu_xgmi_add_device(struct amdgpu_device *adev);
>>>> +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev);
>>>>
>>>>    #endif
>>>> --
>>>> 2.7.4
>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx at lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx



More information about the amd-gfx mailing list