[PATCH] drm/amdgpu: Fix multiple GPU resets in XGMI hive.

Fri May 6 05:41:28 UTC 2022

On 2022-05-05 17:47, Andrey Grodzovsky wrote:
> 
> On 2022-05-05 15:49, Felix Kuehling wrote:
>>
>> Am 2022-05-05 um 14:57 schrieb Andrey Grodzovsky:
>>>
>>> On 2022-05-05 11:06, Christian König wrote:
>>>> Am 05.05.22 um 15:54 schrieb Andrey Grodzovsky:
>>>>>
>>>>> On 2022-05-05 09:23, Christian König wrote:
>>>>>> Am 05.05.22 um 15:15 schrieb Andrey Grodzovsky:
>>>>>>> On 2022-05-05 06:09, Christian König wrote:
>>>>>>>
>>>>>>>> Am 04.05.22 um 18:18 schrieb Andrey Grodzovsky:
>>>>>>>>> Problem:
>>>>>>>>> During hive reset caused by command timing out on a ring
>>>>>>>>> extra resets are generated by triggered by KFD which is
>>>>>>>>> unable to accesses registers on the resetting ASIC.
>>>>>>>>>
>>>>>>>>> Fix: Rework GPU reset to use a list of pending reset jobs
>>>>>>>>> such that the first reset jobs that actaully resets the entire
>>>>>>>>> reset domain will cancel all those pending redundant resets.
>>>>>>>>>
>>>>>>>>> This is in line with what we already do for redundant TDRs
>>>>>>>>> in scheduler code.
>>>>>>>>
>>>>>>>> Mhm, why exactly do you need the extra linked list then?
>>>>>>>>
>>>>>>>> Let's talk about that on our call today.
>>>>>>>
>>>>>>>
>>>>>>> Going to miss it as you know, and also this is the place to 
>>>>>>> discuss technical questions anyway so -
>>>>>>
>>>>>> Good point.
>>>>>>
>>>>>>> It's needed because those other resets are not time out handlers 
>>>>>>> that are governed by the scheduler
>>>>>>> but rather external resets that are triggered by such clients as 
>>>>>>> KFD, RAS and sysfs. Scheduler has no
>>>>>>> knowledge of them (and should not have) but they are serialized 
>>>>>>> into same wq as the TO handlers
>>>>>>> from the scheduler. It just happens that TO triggered reset 
>>>>>>> causes in turn another reset (from KFD in
>>>>>>> this case) and we want to prevent this second reset from taking 
>>>>>>> place just as we want to avoid multiple
>>>>>>> TO resets to take place in scheduler code.
>>>>>>
>>>>>> Yeah, but why do you need multiple workers?
>>>>>>
>>>>>> You have a single worker for the GPU reset not triggered by the 
>>>>>> scheduler in you adev and cancel that at the end of the reset 
>>>>>> procedure.
>>>>>>
>>>>>> If anybody things it needs to trigger another reset while in reset 
>>>>>> (which is actually a small design bug separately) the reset will 
>>>>>> just be canceled in the same way we cancel the scheduler resets.
>>>>>>
>>>>>> Christian.
>>>>>
>>>>>
>>>>> Had this in mind at first but then I realized that each client 
>>>>> (RAS, KFD and sysfs) will want to fill his own data for the work 
>>>>> (see amdgpu_device_gpu_recover) - for XGMI hive each will want to 
>>>>> set his own adev (which is fine if you set a work per adev as you 
>>>>> suggest) but also each client might want (they all put NULL there 
>>>>> but in theory in the future) also set his own bad job value and 
>>>>> here you might have a collision.
>>>>
>>>> Yeah, but that is intentional. See when we have a job that needs to 
>>>> be consumed by the reset handler and not overwritten or something.
>>>
>>>
>>> I am not sure why this is a requirement, multiple clients can decide 
>>> concurrently to trigger a reset for some reason (possibly independent 
>>> reasons) hence they cannot share same work struct to pass to it their 
>>> data.
>>
>> If those concurrent clients could detect that a reset was already in 
>> progress, you wouldn't need the complexity of multiple work structs 
>> being scheduled. You could simply return without triggering another 
>> reset.
> 
> 
> In my view main problem here with single work struct either at reset 
> domain level or even adev level is that in some cases we optimize resets 
> and don't really perform ASIC HW reset (see amdgpu_job_timedout with 
> soft recovery and skip_hw_reset in amdgpu_device_gpu_recover_imp for the 
> case the bad job does get signaled just before we start HW reset and we 
> just skip this). You can see that if many different reset sources share 
> same work struct what can happen is that the first to obtain the lock 
> you describe bellow might opt out from full HW reset because his bad job 

The problem is this "opting out" of reset--meaning that the client didn't do
the complete recovery work, and were too quick to call for a GPU reset. So
they need to fix this locally in their code.

Generally when a GPU reset is scheduled, it should proceed regardless,
and there shouldn't be any opting out.

> did signal for example or because his hunged IP block was able to 
> recover through SW reset but in the meantime another reset source who 
> needed an actual HW reset just silently returned and we end up with 
> unhandled reset request. True that today this happens only to job 
> timeout reset sources that are handled form within the scheduler and 
> won't use this single work struct but no one prevents a future case for 
> this to happen and also, if we actually want to unify scheduler time out 
> handlers within reset domain (which seems to me the right design 
> approach) we won't be able to use just one work struct for this reason 
> anyway.

Felix idea is good, and here is another idea which I've implemented in
open source when having to handle infinite number of wire events with finite
number of type (5 types of event), in a finite memory.

Instead of allocating a myriad of reset events when each client wants
a GPU reset, you'd have only one instance of a "reset" event, say an instance
of a linked list entry, call it R. Initially R is list_init(), so it points
to itself.

Anytime someone wants to do a reset, they do a list_move(R, event work list);
(or list_move_tail() if you want them events chronologically ordered)
and wake up a thread. If R is already on the list, meaning another client
has had scheduled the reset, then no change occurs. (You could check if R is non-empty,
and that check would tell you if R is scheduled to be processed, or you could just do
the list_move() and up(processing thread), always, to mitigate races).

Note that many clients can call list_move(R, event work list), and the result
is the same--a reset is pending to occur, regardless if it had already occurred,
or if it hadn't occurred and is about to.

Then when the processing thread wakes up, it starts consuming "events" from that
linked list, and removing them from it, and processing them. Obviously, access to
the linked list needs to be guarded by a lock.

If you have a set of different type of event, say "reset", "start", "stop", "load",
etc., you can create an array of linked list event structs and index them by a macro
whose value is the index into the array, and whose name is the type of the event.

Seeing that it's just a reset event, you could just have a single instance of a linked list.

Anyway, just an idea of an old old implementation of mine.

> 
> Andrey
> 
> 
>>
>> I'd put the reset work struct into the reset_domain struct. That way 
>> you'd have exactly one worker for the reset domain. You could 
>> implement a lock-less scheme to decide whether you need to schedule a 
>> reset, e.g. using an atomic counter in the shared work struct that 
>> gets incremented when a client wants to trigger a reset 
>> (atomic_add_return). If that counter is exactly 1 after incrementing, 
>> you need to fill in the rest of the work struct and schedule the work. 
>> Otherwise, it's already scheduled (or another client is in the process 
>> of scheduling it) and you just return. When the worker finishes (after 
>> confirming a successful reset), it resets the counter to 0, so the 
>> next client requesting a reset will schedule the worker again.
>>
>> Regards,
>>   Felix
>>
>>
>>>
>>>
>>>>
>>>>
>>>> Additional to that keep in mind that you can't allocate any memory 
>>>> before or during the GPU reset nor wait for the reset to complete 
>>>> (so you can't allocate anything on the stack either).
>>>
>>>
>>> There is no dynamic allocation here, regarding stack allocations - we 
>>> do it all the time when we call functions, even during GPU resets, 
>>> how on stack allocation of work struct in amdgpu_device_gpu_recover 
>>> is different from any other local variable we allocate in any 
>>> function we call ?
>>>
>>> I am also not sure why it's not allowed to wait for reset to complete 
>>> ? Also, see in amdgpu_ras_do_recovery and gpu_recover_get (debugfs) - 
>>> the caller expects the reset to complete before he returns. I can 
>>> probably work around it in RAS code by calling 
>>> atomic_set(&ras->in_recovery, 0)  from some callback within actual 
>>> reset function but regarding sysfs it actually expects a result 
>>> returned indicating whether the call was successful or not.
>>>
>>> Andrey
>>>
>>>
>>>>
>>>> I don't think that concept you try here will work.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> Also in general seems to me it's cleaner approach where this logic 
>>>>> (the work items) are held and handled in reset_domain and are not 
>>>>> split in each adev or any other entity. We might want in the future 
>>>>> to even move the scheduler handling into reset domain since reset 
>>>>> domain is supposed to be a generic things and not only or AMD.
>>>>>
>>>>> Andrey
>>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Andrey
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>>>>>>>>> Tested-by: Bai Zoy <Zoy.Bai at amd.com>
>>>>>>>>> ---
>>>>>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        | 11 +---
>>>>>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 17 +++--
>>>>>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c  |  3 +
>>>>>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h  | 73 
>>>>>>>>> +++++++++++++++++++++-
>>>>>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h   |  3 +-
>>>>>>>>>   drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c      |  7 ++-
>>>>>>>>>   drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c      |  7 ++-
>>>>>>>>>   drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c      |  7 ++-
>>>>>>>>>   8 files changed, 104 insertions(+), 24 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>>>>>> index 4264abc5604d..99efd8317547 100644
>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>>>>>> @@ -109,6 +109,7 @@
>>>>>>>>>   #include "amdgpu_fdinfo.h"
>>>>>>>>>   #include "amdgpu_mca.h"
>>>>>>>>>   #include "amdgpu_ras.h"
>>>>>>>>> +#include "amdgpu_reset.h"
>>>>>>>>>     #define MAX_GPU_INSTANCE        16
>>>>>>>>>   @@ -509,16 +510,6 @@ struct amdgpu_allowed_register_entry {
>>>>>>>>>       bool grbm_indexed;
>>>>>>>>>   };
>>>>>>>>>   -enum amd_reset_method {
>>>>>>>>> -    AMD_RESET_METHOD_NONE = -1,
>>>>>>>>> -    AMD_RESET_METHOD_LEGACY = 0,
>>>>>>>>> -    AMD_RESET_METHOD_MODE0,
>>>>>>>>> -    AMD_RESET_METHOD_MODE1,
>>>>>>>>> -    AMD_RESET_METHOD_MODE2,
>>>>>>>>> -    AMD_RESET_METHOD_BACO,
>>>>>>>>> -    AMD_RESET_METHOD_PCI,
>>>>>>>>> -};
>>>>>>>>> -
>>>>>>>>>   struct amdgpu_video_codec_info {
>>>>>>>>>       u32 codec_type;
>>>>>>>>>       u32 max_width;
>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>>> index e582f1044c0f..7fa82269c30f 100644
>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>>> @@ -5201,6 +5201,12 @@ int amdgpu_device_gpu_recover_imp(struct 
>>>>>>>>> amdgpu_device *adev,
>>>>>>>>>       }
>>>>>>>>>         tmp_vram_lost_counter = 
>>>>>>>>> atomic_read(&((adev)->vram_lost_counter));
>>>>>>>>> +
>>>>>>>>> +    /* Drop all pending resets since we will reset now anyway */
>>>>>>>>> +    tmp_adev = list_first_entry(device_list_handle, struct 
>>>>>>>>> amdgpu_device,
>>>>>>>>> +                        reset_list);
>>>>>>>>> + amdgpu_reset_pending_list(tmp_adev->reset_domain);
>>>>>>>>> +
>>>>>>>>>       /* Actual ASIC resets if needed.*/
>>>>>>>>>       /* Host driver will handle XGMI hive reset for SRIOV */
>>>>>>>>>       if (amdgpu_sriov_vf(adev)) {
>>>>>>>>> @@ -5296,7 +5302,7 @@ int amdgpu_device_gpu_recover_imp(struct 
>>>>>>>>> amdgpu_device *adev,
>>>>>>>>>   }
>>>>>>>>>     struct amdgpu_recover_work_struct {
>>>>>>>>> -    struct work_struct base;
>>>>>>>>> +    struct amdgpu_reset_work_struct base;
>>>>>>>>>       struct amdgpu_device *adev;
>>>>>>>>>       struct amdgpu_job *job;
>>>>>>>>>       int ret;
>>>>>>>>> @@ -5304,7 +5310,7 @@ struct amdgpu_recover_work_struct {
>>>>>>>>>     static void amdgpu_device_queue_gpu_recover_work(struct 
>>>>>>>>> work_struct *work)
>>>>>>>>>   {
>>>>>>>>> -    struct amdgpu_recover_work_struct *recover_work = 
>>>>>>>>> container_of(work, struct amdgpu_recover_work_struct, base);
>>>>>>>>> +    struct amdgpu_recover_work_struct *recover_work = 
>>>>>>>>> container_of(work, struct amdgpu_recover_work_struct, 
>>>>>>>>> base.base.work);
>>>>>>>>>         recover_work->ret = 
>>>>>>>>> amdgpu_device_gpu_recover_imp(recover_work->adev, 
>>>>>>>>> recover_work->job);
>>>>>>>>>   }
>>>>>>>>> @@ -5316,12 +5322,15 @@ int amdgpu_device_gpu_recover(struct 
>>>>>>>>> amdgpu_device *adev,
>>>>>>>>>   {
>>>>>>>>>       struct amdgpu_recover_work_struct work = {.adev = adev, 
>>>>>>>>> .job = job};
>>>>>>>>>   -    INIT_WORK(&work.base, 
>>>>>>>>> amdgpu_device_queue_gpu_recover_work);
>>>>>>>>> +    INIT_DELAYED_WORK(&work.base.base, 
>>>>>>>>> amdgpu_device_queue_gpu_recover_work);
>>>>>>>>> +    INIT_LIST_HEAD(&work.base.node);
>>>>>>>>>         if (!amdgpu_reset_domain_schedule(adev->reset_domain, 
>>>>>>>>> &work.base))
>>>>>>>>>           return -EAGAIN;
>>>>>>>>>   -    flush_work(&work.base);
>>>>>>>>> +    flush_delayed_work(&work.base.base);
>>>>>>>>> +
>>>>>>>>> + amdgpu_reset_domain_del_pendning_work(adev->reset_domain, 
>>>>>>>>> &work.base);
>>>>>>>>>         return work.ret;
>>>>>>>>>   }
>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c 
>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
>>>>>>>>> index c80af0889773..ffddd419c351 100644
>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
>>>>>>>>> @@ -134,6 +134,9 @@ struct amdgpu_reset_domain 
>>>>>>>>> *amdgpu_reset_create_reset_domain(enum amdgpu_reset_d
>>>>>>>>>       atomic_set(&reset_domain->in_gpu_reset, 0);
>>>>>>>>>       init_rwsem(&reset_domain->sem);
>>>>>>>>>   + INIT_LIST_HEAD(&reset_domain->pending_works);
>>>>>>>>> +    mutex_init(&reset_domain->reset_lock);
>>>>>>>>> +
>>>>>>>>>       return reset_domain;
>>>>>>>>>   }
>>>>>>>>>   diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h 
>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
>>>>>>>>> index 1949dbe28a86..863ec5720fc1 100644
>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
>>>>>>>>> @@ -24,7 +24,18 @@
>>>>>>>>>   #ifndef __AMDGPU_RESET_H__
>>>>>>>>>   #define __AMDGPU_RESET_H__
>>>>>>>>>   -#include "amdgpu.h"
>>>>>>>>> +
>>>>>>>>> +#include <linux/atomic.h>
>>>>>>>>> +#include <linux/mutex.h>
>>>>>>>>> +#include <linux/list.h>
>>>>>>>>> +#include <linux/kref.h>
>>>>>>>>> +#include <linux/rwsem.h>
>>>>>>>>> +#include <linux/workqueue.h>
>>>>>>>>> +
>>>>>>>>> +struct amdgpu_device;
>>>>>>>>> +struct amdgpu_job;
>>>>>>>>> +struct amdgpu_hive_info;
>>>>>>>>> +
>>>>>>>>>     enum AMDGPU_RESET_FLAGS {
>>>>>>>>>   @@ -32,6 +43,17 @@ enum AMDGPU_RESET_FLAGS {
>>>>>>>>>       AMDGPU_SKIP_HW_RESET = 1,
>>>>>>>>>   };
>>>>>>>>>   +
>>>>>>>>> +enum amd_reset_method {
>>>>>>>>> +    AMD_RESET_METHOD_NONE = -1,
>>>>>>>>> +    AMD_RESET_METHOD_LEGACY = 0,
>>>>>>>>> +    AMD_RESET_METHOD_MODE0,
>>>>>>>>> +    AMD_RESET_METHOD_MODE1,
>>>>>>>>> +    AMD_RESET_METHOD_MODE2,
>>>>>>>>> +    AMD_RESET_METHOD_BACO,
>>>>>>>>> +    AMD_RESET_METHOD_PCI,
>>>>>>>>> +};
>>>>>>>>> +
>>>>>>>>>   struct amdgpu_reset_context {
>>>>>>>>>       enum amd_reset_method method;
>>>>>>>>>       struct amdgpu_device *reset_req_dev;
>>>>>>>>> @@ -40,6 +62,8 @@ struct amdgpu_reset_context {
>>>>>>>>>       unsigned long flags;
>>>>>>>>>   };
>>>>>>>>>   +struct amdgpu_reset_control;
>>>>>>>>> +
>>>>>>>>>   struct amdgpu_reset_handler {
>>>>>>>>>       enum amd_reset_method reset_method;
>>>>>>>>>       struct list_head handler_list;
>>>>>>>>> @@ -76,12 +100,21 @@ enum amdgpu_reset_domain_type {
>>>>>>>>>       XGMI_HIVE
>>>>>>>>>   };
>>>>>>>>>   +
>>>>>>>>> +struct amdgpu_reset_work_struct {
>>>>>>>>> +    struct delayed_work base;
>>>>>>>>> +    struct list_head node;
>>>>>>>>> +};
>>>>>>>>> +
>>>>>>>>>   struct amdgpu_reset_domain {
>>>>>>>>>       struct kref refcount;
>>>>>>>>>       struct workqueue_struct *wq;
>>>>>>>>>       enum amdgpu_reset_domain_type type;
>>>>>>>>>       struct rw_semaphore sem;
>>>>>>>>>       atomic_t in_gpu_reset;
>>>>>>>>> +
>>>>>>>>> +    struct list_head pending_works;
>>>>>>>>> +    struct mutex reset_lock;
>>>>>>>>>   };
>>>>>>>>>     @@ -113,9 +146,43 @@ static inline void 
>>>>>>>>> amdgpu_reset_put_reset_domain(struct amdgpu_reset_domain *dom
>>>>>>>>>   }
>>>>>>>>>     static inline bool amdgpu_reset_domain_schedule(struct 
>>>>>>>>> amdgpu_reset_domain *domain,
>>>>>>>>> -                        struct work_struct *work)
>>>>>>>>> +                        struct amdgpu_reset_work_struct *work)
>>>>>>>>>   {
>>>>>>>>> -    return queue_work(domain->wq, work);
>>>>>>>>> +    mutex_lock(&domain->reset_lock);
>>>>>>>>> +
>>>>>>>>> +    if (!queue_delayed_work(domain->wq, &work->base, 0)) {
>>>>>>>>> +        mutex_unlock(&domain->reset_lock);
>>>>>>>>> +        return false;
>>>>>>>>> +    }
>>>>>>>>> +
>>>>>>>>> +    list_add_tail(&work->node, &domain->pending_works);
>>>>>>>>> +    mutex_unlock(&domain->reset_lock);
>>>>>>>>> +
>>>>>>>>> +    return true;
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static inline void 
>>>>>>>>> amdgpu_reset_domain_del_pendning_work(struct 
>>>>>>>>> amdgpu_reset_domain *domain,
>>>>>>>>> +                  struct amdgpu_reset_work_struct *work)
>>>>>>>>> +{
>>>>>>>>> +    mutex_lock(&domain->reset_lock);
>>>>>>>>> +    list_del_init(&work->node);
>>>>>>>>> +    mutex_unlock(&domain->reset_lock);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static inline void amdgpu_reset_pending_list(struct 
>>>>>>>>> amdgpu_reset_domain *domain)
>>>>>>>>> +{
>>>>>>>>> +    struct amdgpu_reset_work_struct *entry, *tmp;
>>>>>>>>> +
>>>>>>>>> +    mutex_lock(&domain->reset_lock);
>>>>>>>>> +    list_for_each_entry_safe(entry, tmp, 
>>>>>>>>> &domain->pending_works, node) {
>>>>>>>>> +
>>>>>>>>> +        list_del_init(&entry->node);
>>>>>>>>> +
>>>>>>>>> +        /* Stop any other related pending resets */
>>>>>>>>> +        cancel_delayed_work(&entry->base);
>>>>>>>>> +    }
>>>>>>>>> +
>>>>>>>>> +    mutex_unlock(&domain->reset_lock);
>>>>>>>>>   }
>>>>>>>>>     void amdgpu_device_lock_reset_domain(struct 
>>>>>>>>> amdgpu_reset_domain *reset_domain);
>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h 
>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>>>>>>>>> index 239f232f9c02..574e870d3064 100644
>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>>>>>>>>> @@ -25,6 +25,7 @@
>>>>>>>>>   #define AMDGPU_VIRT_H
>>>>>>>>>     #include "amdgv_sriovmsg.h"
>>>>>>>>> +#include "amdgpu_reset.h"
>>>>>>>>>     #define AMDGPU_SRIOV_CAPS_SRIOV_VBIOS  (1 << 0) /* vBIOS is 
>>>>>>>>> sr-iov ready */
>>>>>>>>>   #define AMDGPU_SRIOV_CAPS_ENABLE_IOV   (1 << 1) /* sr-iov is 
>>>>>>>>> enabled on this GPU */
>>>>>>>>> @@ -230,7 +231,7 @@ struct amdgpu_virt {
>>>>>>>>>       uint32_t            reg_val_offs;
>>>>>>>>>       struct amdgpu_irq_src        ack_irq;
>>>>>>>>>       struct amdgpu_irq_src        rcv_irq;
>>>>>>>>> -    struct work_struct        flr_work;
>>>>>>>>> +    struct amdgpu_reset_work_struct flr_work;
>>>>>>>>>       struct amdgpu_mm_table        mm_table;
>>>>>>>>>       const struct amdgpu_virt_ops    *ops;
>>>>>>>>>       struct amdgpu_vf_error_buffer    vf_errors;
>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c 
>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>>>>>>>>> index b81acf59870c..f3d1c2be9292 100644
>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>>>>>>>>> @@ -251,7 +251,7 @@ static int 
>>>>>>>>> xgpu_ai_set_mailbox_ack_irq(struct amdgpu_device *adev,
>>>>>>>>>     static void xgpu_ai_mailbox_flr_work(struct work_struct *work)
>>>>>>>>>   {
>>>>>>>>> -    struct amdgpu_virt *virt = container_of(work, struct 
>>>>>>>>> amdgpu_virt, flr_work);
>>>>>>>>> +    struct amdgpu_virt *virt = container_of(work, struct 
>>>>>>>>> amdgpu_virt, flr_work.base.work);
>>>>>>>>>       struct amdgpu_device *adev = container_of(virt, struct 
>>>>>>>>> amdgpu_device, virt);
>>>>>>>>>       int timeout = AI_MAILBOX_POLL_FLR_TIMEDOUT;
>>>>>>>>>   @@ -380,7 +380,8 @@ int xgpu_ai_mailbox_get_irq(struct 
>>>>>>>>> amdgpu_device *adev)
>>>>>>>>>           return r;
>>>>>>>>>       }
>>>>>>>>>   -    INIT_WORK(&adev->virt.flr_work, xgpu_ai_mailbox_flr_work);
>>>>>>>>> + INIT_DELAYED_WORK(&adev->virt.flr_work.base, 
>>>>>>>>> xgpu_ai_mailbox_flr_work);
>>>>>>>>> + INIT_LIST_HEAD(&adev->virt.flr_work.node);
>>>>>>>>>         return 0;
>>>>>>>>>   }
>>>>>>>>> @@ -389,6 +390,8 @@ void xgpu_ai_mailbox_put_irq(struct 
>>>>>>>>> amdgpu_device *adev)
>>>>>>>>>   {
>>>>>>>>>       amdgpu_irq_put(adev, &adev->virt.ack_irq, 0);
>>>>>>>>>       amdgpu_irq_put(adev, &adev->virt.rcv_irq, 0);
>>>>>>>>> +
>>>>>>>>> + amdgpu_reset_domain_del_pendning_work(adev->reset_domain, 
>>>>>>>>> &adev->virt.flr_work);
>>>>>>>>>   }
>>>>>>>>>     static int xgpu_ai_request_init_data(struct amdgpu_device 
>>>>>>>>> *adev)
>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c 
>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>>>>>>>>> index 22c10b97ea81..927b3d5bb1d0 100644
>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>>>>>>>>> @@ -275,7 +275,7 @@ static int 
>>>>>>>>> xgpu_nv_set_mailbox_ack_irq(struct amdgpu_device *adev,
>>>>>>>>>     static void xgpu_nv_mailbox_flr_work(struct work_struct *work)
>>>>>>>>>   {
>>>>>>>>> -    struct amdgpu_virt *virt = container_of(work, struct 
>>>>>>>>> amdgpu_virt, flr_work);
>>>>>>>>> +    struct amdgpu_virt *virt = container_of(work, struct 
>>>>>>>>> amdgpu_virt, flr_work.base.work);
>>>>>>>>>       struct amdgpu_device *adev = container_of(virt, struct 
>>>>>>>>> amdgpu_device, virt);
>>>>>>>>>       int timeout = NV_MAILBOX_POLL_FLR_TIMEDOUT;
>>>>>>>>>   @@ -407,7 +407,8 @@ int xgpu_nv_mailbox_get_irq(struct 
>>>>>>>>> amdgpu_device *adev)
>>>>>>>>>           return r;
>>>>>>>>>       }
>>>>>>>>>   -    INIT_WORK(&adev->virt.flr_work, xgpu_nv_mailbox_flr_work);
>>>>>>>>> + INIT_DELAYED_WORK(&adev->virt.flr_work.base, 
>>>>>>>>> xgpu_nv_mailbox_flr_work);
>>>>>>>>> + INIT_LIST_HEAD(&adev->virt.flr_work.node);
>>>>>>>>>         return 0;
>>>>>>>>>   }
>>>>>>>>> @@ -416,6 +417,8 @@ void xgpu_nv_mailbox_put_irq(struct 
>>>>>>>>> amdgpu_device *adev)
>>>>>>>>>   {
>>>>>>>>>       amdgpu_irq_put(adev, &adev->virt.ack_irq, 0);
>>>>>>>>>       amdgpu_irq_put(adev, &adev->virt.rcv_irq, 0);
>>>>>>>>> +
>>>>>>>>> + amdgpu_reset_domain_del_pendning_work(adev->reset_domain, 
>>>>>>>>> &adev->virt.flr_work);
>>>>>>>>>   }
>>>>>>>>>     const struct amdgpu_virt_ops xgpu_nv_virt_ops = {
>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c 
>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
>>>>>>>>> index 7b63d30b9b79..1d4ef5c70730 100644
>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
>>>>>>>>> @@ -512,7 +512,7 @@ static int 
>>>>>>>>> xgpu_vi_set_mailbox_ack_irq(struct amdgpu_device *adev,
>>>>>>>>>     static void xgpu_vi_mailbox_flr_work(struct work_struct *work)
>>>>>>>>>   {
>>>>>>>>> -    struct amdgpu_virt *virt = container_of(work, struct 
>>>>>>>>> amdgpu_virt, flr_work);
>>>>>>>>> +    struct amdgpu_virt *virt = container_of(work, struct 
>>>>>>>>> amdgpu_virt, flr_work.base.work);
>>>>>>>>>       struct amdgpu_device *adev = container_of(virt, struct 
>>>>>>>>> amdgpu_device, virt);
>>>>>>>>>         /* wait until RCV_MSG become 3 */
>>>>>>>>> @@ -610,7 +610,8 @@ int xgpu_vi_mailbox_get_irq(struct 
>>>>>>>>> amdgpu_device *adev)
>>>>>>>>>           return r;
>>>>>>>>>       }
>>>>>>>>>   -    INIT_WORK(&adev->virt.flr_work, xgpu_vi_mailbox_flr_work);
>>>>>>>>> + INIT_DELAYED_WORK(&adev->virt.flr_work.base, 
>>>>>>>>> xgpu_vi_mailbox_flr_work);
>>>>>>>>> + INIT_LIST_HEAD(&adev->virt.flr_work.node);
>>>>>>>>>         return 0;
>>>>>>>>>   }
>>>>>>>>> @@ -619,6 +620,8 @@ void xgpu_vi_mailbox_put_irq(struct 
>>>>>>>>> amdgpu_device *adev)
>>>>>>>>>   {
>>>>>>>>>       amdgpu_irq_put(adev, &adev->virt.ack_irq, 0);
>>>>>>>>>       amdgpu_irq_put(adev, &adev->virt.rcv_irq, 0);
>>>>>>>>> +
>>>>>>>>> + amdgpu_reset_domain_del_pendning_work(adev->reset_domain, 
>>>>>>>>> &adev->virt.flr_work);
>>>>>>>>>   }
>>>>>>>>>     const struct amdgpu_virt_ops xgpu_vi_virt_ops = {
>>>>>>>>
>>>>>>
>>>>

Regards,
-- 
Luben