[PATCH 1/2] drm/amdgpu: Handle xgmi device removal and add reset wq.

Fri Nov 30 16:20:00 UTC 2018

On 11/30/2018 10:53 AM, Koenig, Christian wrote:
> Am 30.11.18 um 16:14 schrieb Grodzovsky, Andrey:
>> On 11/30/2018 04:03 AM, Christian König wrote:
>>> Am 29.11.18 um 21:36 schrieb Andrey Grodzovsky:
>>>> XGMI hive has some resources allocted on device init which
>>>> needs to be deallocated when the device is unregistered.
>>>>
>>>> Add per hive wq to allow all the nodes in hive to run resets
>>>> concurently - this should speed up the total reset time to avoid
>>>> breaching the PSP FW timeout.
>>> Do you really want a workqueue? That sounds like the exact opposite of
>>> what you describe here.
>>>
>>> E.g. a workqueue is there to serialize all work items scheduled to it.
>>>
>>> What you most likely rather want is a work item per device which are
>>> scheduled, run in parallel and are joined back together before
>>> dropping the per hive lock.
>>>
>>> Christian.
>> As you can see in second patch that exactly how I use this wq. I define
>> per adev work item all of which I enqueue during GPU reset routine.
>> I believe the wq I created will processes work items in parallel since
>> it's is served by per CPU worker thread (thread pool) while  a
>> serializing wq is
>> created  using the 'create_singlethread_workqueue' interface which
>> creates an ordered wq which process at most one item simultaneously.
> That is correct, but you are still limited to one work item processed
> per CPU which is not necessary what you want.
I was wrong to say it's single worker thread per CPU, it's actually work 
thread pool
per CPU, so even with single CPU in the system you get concurrency of 
execution.

>
>> I did have my doubts about creating a dedicated wq instead of reusing
>> existing system wide wq such as system_unbound_wq because this adds new
>> kernel threads but eventually I opted for my own queue because there is
>> a hard requirement from the PSP FW to complete all ASIC resets in
>> defined finite amount
>> of time (1s i believe)   otherwise some nodes in the XGMI hive might
>> switch to single GPU mode due to timeout.
>> If I rely on system wide wq then during stress times this queue might
>> fail to process my work items within that time limit because it's
>> overloaded by work items from other queue clients.
>> Maybe I should squash the 2 changes into one to make all of this more
>> clear ?
> No, that won't help. You would rather need to document that properly.
>
> But creating a wq just for a rarely used device reset is completely
> overkill.
>
> Stuff like that should go to the system_highpri_wq work queue which
> should make already sure that it never hits the PSP timeout.
>
> Christian.

I agree in general with that, but it looks like system_highpri_wq is 
'BOUND' wq meaning it's items are
limited to the thread pool of the CPU from which they were queued only, 
anyway, since I mostly just wait in sleep inside PSP reset
function that should be still OK.
Will try that.

Andrey

>
>> Andrey
>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>>>> ---
>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 24 ++++++++++++++++++++++++
>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h |  3 +++
>>>>     2 files changed, 27 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
>>>> index fb37e69..9ac2dc5 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
>>>> @@ -61,6 +61,8 @@ struct amdgpu_hive_info
>>>> *amdgpu_get_xgmi_hive(struct amdgpu_device *adev)
>>>>         INIT_LIST_HEAD(&tmp->device_list);
>>>>         mutex_init(&tmp->hive_lock);
>>>>     +    tmp->reset_queue = alloc_workqueue("xgmi-hive", WQ_UNBOUND |
>>>> WQ_HIGHPRI, 0);
>>>> +
>>>>         return tmp;
>>>>     }
>>>>     @@ -135,3 +137,25 @@ int amdgpu_xgmi_add_device(struct
>>>> amdgpu_device *adev)
>>>>         mutex_unlock(&xgmi_mutex);
>>>>         return ret;
>>>>     }
>>>> +
>>>> +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev)
>>>> +{
>>>> +    struct amdgpu_hive_info *hive;
>>>> +
>>>> +    if ((adev->asic_type < CHIP_VEGA20) || (adev->flags & AMD_IS_APU))
>>>> +        return;
>>>> +
>>>> +    mutex_lock(&xgmi_mutex);
>>>> +
>>>> +    hive = amdgpu_get_xgmi_hive(adev);
>>>> +    if (!hive)
>>>> +        goto exit;
>>>> +
>>>> +    if (!(hive->number_devices--)) {
>>>> +        mutex_destroy(&hive->hive_lock);
>>>> +        destroy_workqueue(hive->reset_queue);
>>>> +    }
>>>> +
>>>> +exit:
>>>> +    mutex_unlock(&xgmi_mutex);
>>>> +}
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
>>>> index 6335bfd..285ab93 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
>>>> @@ -30,10 +30,13 @@ struct amdgpu_hive_info {
>>>>         struct psp_xgmi_topology_info    topology_info;
>>>>         int number_devices;
>>>>         struct mutex hive_lock;
>>>> +    /* hive members reset wq */
>>>> +    struct workqueue_struct *reset_queue;
>>>>     };
>>>>       struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct
>>>> amdgpu_device *adev);
>>>>     int amdgpu_xgmi_update_topology(struct amdgpu_hive_info *hive,
>>>> struct amdgpu_device *adev);
>>>>     int amdgpu_xgmi_add_device(struct amdgpu_device *adev);
>>>> +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev);
>>>>       #endif