[PATCH] drm/amdgpu: Fix multiple GPU resets in XGMI hive.

Fri May 6 06:02:40 UTC 2022

On 5/6/2022 3:17 AM, Andrey Grodzovsky wrote:
> 
> On 2022-05-05 15:49, Felix Kuehling wrote:
>>
>> Am 2022-05-05 um 14:57 schrieb Andrey Grodzovsky:
>>>
>>> On 2022-05-05 11:06, Christian König wrote:
>>>> Am 05.05.22 um 15:54 schrieb Andrey Grodzovsky:
>>>>>
>>>>> On 2022-05-05 09:23, Christian König wrote:
>>>>>> Am 05.05.22 um 15:15 schrieb Andrey Grodzovsky:
>>>>>>> On 2022-05-05 06:09, Christian König wrote:
>>>>>>>
>>>>>>>> Am 04.05.22 um 18:18 schrieb Andrey Grodzovsky:
>>>>>>>>> Problem:
>>>>>>>>> During hive reset caused by command timing out on a ring
>>>>>>>>> extra resets are generated by triggered by KFD which is
>>>>>>>>> unable to accesses registers on the resetting ASIC.
>>>>>>>>>
>>>>>>>>> Fix: Rework GPU reset to use a list of pending reset jobs
>>>>>>>>> such that the first reset jobs that actaully resets the entire
>>>>>>>>> reset domain will cancel all those pending redundant resets.
>>>>>>>>>
>>>>>>>>> This is in line with what we already do for redundant TDRs
>>>>>>>>> in scheduler code.
>>>>>>>>
>>>>>>>> Mhm, why exactly do you need the extra linked list then?
>>>>>>>>
>>>>>>>> Let's talk about that on our call today.
>>>>>>>
>>>>>>>
>>>>>>> Going to miss it as you know, and also this is the place to 
>>>>>>> discuss technical questions anyway so -
>>>>>>
>>>>>> Good point.
>>>>>>
>>>>>>> It's needed because those other resets are not time out handlers 
>>>>>>> that are governed by the scheduler
>>>>>>> but rather external resets that are triggered by such clients as 
>>>>>>> KFD, RAS and sysfs. Scheduler has no
>>>>>>> knowledge of them (and should not have) but they are serialized 
>>>>>>> into same wq as the TO handlers
>>>>>>> from the scheduler. It just happens that TO triggered reset 
>>>>>>> causes in turn another reset (from KFD in
>>>>>>> this case) and we want to prevent this second reset from taking 
>>>>>>> place just as we want to avoid multiple
>>>>>>> TO resets to take place in scheduler code.
>>>>>>
>>>>>> Yeah, but why do you need multiple workers?
>>>>>>
>>>>>> You have a single worker for the GPU reset not triggered by the 
>>>>>> scheduler in you adev and cancel that at the end of the reset 
>>>>>> procedure.
>>>>>>
>>>>>> If anybody things it needs to trigger another reset while in reset 
>>>>>> (which is actually a small design bug separately) the reset will 
>>>>>> just be canceled in the same way we cancel the scheduler resets.
>>>>>>
>>>>>> Christian.
>>>>>
>>>>>
>>>>> Had this in mind at first but then I realized that each client 
>>>>> (RAS, KFD and sysfs) will want to fill his own data for the work 
>>>>> (see amdgpu_device_gpu_recover) - for XGMI hive each will want to 
>>>>> set his own adev (which is fine if you set a work per adev as you 
>>>>> suggest) but also each client might want (they all put NULL there 
>>>>> but in theory in the future) also set his own bad job value and 
>>>>> here you might have a collision.
>>>>
>>>> Yeah, but that is intentional. See when we have a job that needs to 
>>>> be consumed by the reset handler and not overwritten or something.
>>>
>>>
>>> I am not sure why this is a requirement, multiple clients can decide 
>>> concurrently to trigger a reset for some reason (possibly independent 
>>> reasons) hence they cannot share same work struct to pass to it their 
>>> data.
>>
>> If those concurrent clients could detect that a reset was already in 
>> progress, you wouldn't need the complexity of multiple work structs 
>> being scheduled. You could simply return without triggering another 
>> reset.
> 
> 
> In my view main problem here with single work struct either at reset 
> domain level or even adev level is that in some cases we optimize resets 
> and don't really perform ASIC HW reset (see amdgpu_job_timedout with 
> soft recovery and skip_hw_reset in amdgpu_device_gpu_recover_imp for the 
> case the bad job does get signaled just before we start HW reset and we 
> just skip this). You can see that if many different reset sources share 
> same work struct what can happen is that the first to obtain the lock 
> you describe bellow might opt out from full HW reset because his bad job 
> did signal for example or because his hunged IP block was able to 
> recover through SW reset but in the meantime another reset source who 
> needed an actual HW reset just silently returned and we end up with 
> unhandled reset request. True that today this happens only to job 
> timeout reset sources that are handled form within the scheduler and 
> won't use this single work struct but no one prevents a future case for 
> this to happen and also, if we actually want to unify scheduler time out 
> handlers within reset domain (which seems to me the right design 
> approach) we won't be able to use just one work struct for this reason 
> anyway.
> 

Just to add to this point - a reset domain is co-operative domain. In 
addition to sharing a set of clients from various reset sources for one 
device, it also will have a set of devices like in XGMI hive. The job 
timeout on one device may not eventually result in result, but a RAS 
error happening on another device at the same time would need a reset. 
The second device's RAS error cannot return seeing  that a reset work 
already started, or ignore the reset work given that another device has 
filled the reset data.

When there is a reset domain, it should take care of the work scheduled 
and keeping it in device or any other level doesn't sound good.

Thanks,
Lijo

> Andrey
> 
> 
>>
>> I'd put the reset work struct into the reset_domain struct. That way 
>> you'd have exactly one worker for the reset domain. You could 
>> implement a lock-less scheme to decide whether you need to schedule a 
>> reset, e.g. using an atomic counter in the shared work struct that 
>> gets incremented when a client wants to trigger a reset 
>> (atomic_add_return). If that counter is exactly 1 after incrementing, 
>> you need to fill in the rest of the work struct and schedule the work. 
>> Otherwise, it's already scheduled (or another client is in the process 
>> of scheduling it) and you just return. When the worker finishes (after 
>> confirming a successful reset), it resets the counter to 0, so the 
>> next client requesting a reset will schedule the worker again.
>>
>> Regards,
>>   Felix
>>
>>
>>>
>>>
>>>>
>>>>
>>>> Additional to that keep in mind that you can't allocate any memory 
>>>> before or during the GPU reset nor wait for the reset to complete 
>>>> (so you can't allocate anything on the stack either).
>>>
>>>
>>> There is no dynamic allocation here, regarding stack allocations - we 
>>> do it all the time when we call functions, even during GPU resets, 
>>> how on stack allocation of work struct in amdgpu_device_gpu_recover 
>>> is different from any other local variable we allocate in any 
>>> function we call ?
>>>
>>> I am also not sure why it's not allowed to wait for reset to complete 
>>> ? Also, see in amdgpu_ras_do_recovery and gpu_recover_get (debugfs) - 
>>> the caller expects the reset to complete before he returns. I can 
>>> probably work around it in RAS code by calling 
>>> atomic_set(&ras->in_recovery, 0)  from some callback within actual 
>>> reset function but regarding sysfs it actually expects a result 
>>> returned indicating whether the call was successful or not.
>>>
>>> Andrey
>>>
>>>
>>>>
>>>> I don't think that concept you try here will work.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> Also in general seems to me it's cleaner approach where this logic 
>>>>> (the work items) are held and handled in reset_domain and are not 
>>>>> split in each adev or any other entity. We might want in the future 
>>>>> to even move the scheduler handling into reset domain since reset 
>>>>> domain is supposed to be a generic things and not only or AMD.
>>>>>
>>>>> Andrey
>>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Andrey
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>>>>>>>>> Tested-by: Bai Zoy <Zoy.Bai at amd.com>
>>>>>>>>> ---
>>>>>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        | 11 +---
>>>>>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 17 +++--
>>>>>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c  |  3 +
>>>>>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h  | 73 
>>>>>>>>> +++++++++++++++++++++-
>>>>>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h   |  3 +-
>>>>>>>>>   drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c      |  7 ++-
>>>>>>>>>   drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c      |  7 ++-
>>>>>>>>>   drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c      |  7 ++-
>>>>>>>>>   8 files changed, 104 insertions(+), 24 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>>>>>> index 4264abc5604d..99efd8317547 100644
>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>>>>>> @@ -109,6 +109,7 @@
>>>>>>>>>   #include "amdgpu_fdinfo.h"
>>>>>>>>>   #include "amdgpu_mca.h"
>>>>>>>>>   #include "amdgpu_ras.h"
>>>>>>>>> +#include "amdgpu_reset.h"
>>>>>>>>>     #define MAX_GPU_INSTANCE        16
>>>>>>>>>   @@ -509,16 +510,6 @@ struct amdgpu_allowed_register_entry {
>>>>>>>>>       bool grbm_indexed;
>>>>>>>>>   };
>>>>>>>>>   -enum amd_reset_method {
>>>>>>>>> -    AMD_RESET_METHOD_NONE = -1,
>>>>>>>>> -    AMD_RESET_METHOD_LEGACY = 0,
>>>>>>>>> -    AMD_RESET_METHOD_MODE0,
>>>>>>>>> -    AMD_RESET_METHOD_MODE1,
>>>>>>>>> -    AMD_RESET_METHOD_MODE2,
>>>>>>>>> -    AMD_RESET_METHOD_BACO,
>>>>>>>>> -    AMD_RESET_METHOD_PCI,
>>>>>>>>> -};
>>>>>>>>> -
>>>>>>>>>   struct amdgpu_video_codec_info {
>>>>>>>>>       u32 codec_type;
>>>>>>>>>       u32 max_width;
>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>>> index e582f1044c0f..7fa82269c30f 100644
>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>>> @@ -5201,6 +5201,12 @@ int amdgpu_device_gpu_recover_imp(struct 
>>>>>>>>> amdgpu_device *adev,
>>>>>>>>>       }
>>>>>>>>>         tmp_vram_lost_counter = 
>>>>>>>>> atomic_read(&((adev)->vram_lost_counter));
>>>>>>>>> +
>>>>>>>>> +    /* Drop all pending resets since we will reset now anyway */
>>>>>>>>> +    tmp_adev = list_first_entry(device_list_handle, struct 
>>>>>>>>> amdgpu_device,
>>>>>>>>> +                        reset_list);
>>>>>>>>> + amdgpu_reset_pending_list(tmp_adev->reset_domain);
>>>>>>>>> +
>>>>>>>>>       /* Actual ASIC resets if needed.*/
>>>>>>>>>       /* Host driver will handle XGMI hive reset for SRIOV */
>>>>>>>>>       if (amdgpu_sriov_vf(adev)) {
>>>>>>>>> @@ -5296,7 +5302,7 @@ int amdgpu_device_gpu_recover_imp(struct 
>>>>>>>>> amdgpu_device *adev,
>>>>>>>>>   }
>>>>>>>>>     struct amdgpu_recover_work_struct {
>>>>>>>>> -    struct work_struct base;
>>>>>>>>> +    struct amdgpu_reset_work_struct base;
>>>>>>>>>       struct amdgpu_device *adev;
>>>>>>>>>       struct amdgpu_job *job;
>>>>>>>>>       int ret;
>>>>>>>>> @@ -5304,7 +5310,7 @@ struct amdgpu_recover_work_struct {
>>>>>>>>>     static void amdgpu_device_queue_gpu_recover_work(struct 
>>>>>>>>> work_struct *work)
>>>>>>>>>   {
>>>>>>>>> -    struct amdgpu_recover_work_struct *recover_work = 
>>>>>>>>> container_of(work, struct amdgpu_recover_work_struct, base);
>>>>>>>>> +    struct amdgpu_recover_work_struct *recover_work = 
>>>>>>>>> container_of(work, struct amdgpu_recover_work_struct, 
>>>>>>>>> base.base.work);
>>>>>>>>>         recover_work->ret = 
>>>>>>>>> amdgpu_device_gpu_recover_imp(recover_work->adev, 
>>>>>>>>> recover_work->job);
>>>>>>>>>   }
>>>>>>>>> @@ -5316,12 +5322,15 @@ int amdgpu_device_gpu_recover(struct 
>>>>>>>>> amdgpu_device *adev,
>>>>>>>>>   {
>>>>>>>>>       struct amdgpu_recover_work_struct work = {.adev = adev, 
>>>>>>>>> .job = job};
>>>>>>>>>   -    INIT_WORK(&work.base, 
>>>>>>>>> amdgpu_device_queue_gpu_recover_work);
>>>>>>>>> +    INIT_DELAYED_WORK(&work.base.base, 
>>>>>>>>> amdgpu_device_queue_gpu_recover_work);
>>>>>>>>> +    INIT_LIST_HEAD(&work.base.node);
>>>>>>>>>         if (!amdgpu_reset_domain_schedule(adev->reset_domain, 
>>>>>>>>> &work.base))
>>>>>>>>>           return -EAGAIN;
>>>>>>>>>   -    flush_work(&work.base);
>>>>>>>>> +    flush_delayed_work(&work.base.base);
>>>>>>>>> +
>>>>>>>>> + amdgpu_reset_domain_del_pendning_work(adev->reset_domain, 
>>>>>>>>> &work.base);
>>>>>>>>>         return work.ret;
>>>>>>>>>   }
>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c 
>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
>>>>>>>>> index c80af0889773..ffddd419c351 100644
>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
>>>>>>>>> @@ -134,6 +134,9 @@ struct amdgpu_reset_domain 
>>>>>>>>> *amdgpu_reset_create_reset_domain(enum amdgpu_reset_d
>>>>>>>>>       atomic_set(&reset_domain->in_gpu_reset, 0);
>>>>>>>>>       init_rwsem(&reset_domain->sem);
>>>>>>>>>   + INIT_LIST_HEAD(&reset_domain->pending_works);
>>>>>>>>> +    mutex_init(&reset_domain->reset_lock);
>>>>>>>>> +
>>>>>>>>>       return reset_domain;
>>>>>>>>>   }
>>>>>>>>>   diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h 
>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
>>>>>>>>> index 1949dbe28a86..863ec5720fc1 100644
>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
>>>>>>>>> @@ -24,7 +24,18 @@
>>>>>>>>>   #ifndef __AMDGPU_RESET_H__
>>>>>>>>>   #define __AMDGPU_RESET_H__
>>>>>>>>>   -#include "amdgpu.h"
>>>>>>>>> +
>>>>>>>>> +#include <linux/atomic.h>
>>>>>>>>> +#include <linux/mutex.h>
>>>>>>>>> +#include <linux/list.h>
>>>>>>>>> +#include <linux/kref.h>
>>>>>>>>> +#include <linux/rwsem.h>
>>>>>>>>> +#include <linux/workqueue.h>
>>>>>>>>> +
>>>>>>>>> +struct amdgpu_device;
>>>>>>>>> +struct amdgpu_job;
>>>>>>>>> +struct amdgpu_hive_info;
>>>>>>>>> +
>>>>>>>>>     enum AMDGPU_RESET_FLAGS {
>>>>>>>>>   @@ -32,6 +43,17 @@ enum AMDGPU_RESET_FLAGS {
>>>>>>>>>       AMDGPU_SKIP_HW_RESET = 1,
>>>>>>>>>   };
>>>>>>>>>   +
>>>>>>>>> +enum amd_reset_method {
>>>>>>>>> +    AMD_RESET_METHOD_NONE = -1,
>>>>>>>>> +    AMD_RESET_METHOD_LEGACY = 0,
>>>>>>>>> +    AMD_RESET_METHOD_MODE0,
>>>>>>>>> +    AMD_RESET_METHOD_MODE1,
>>>>>>>>> +    AMD_RESET_METHOD_MODE2,
>>>>>>>>> +    AMD_RESET_METHOD_BACO,
>>>>>>>>> +    AMD_RESET_METHOD_PCI,
>>>>>>>>> +};
>>>>>>>>> +
>>>>>>>>>   struct amdgpu_reset_context {
>>>>>>>>>       enum amd_reset_method method;
>>>>>>>>>       struct amdgpu_device *reset_req_dev;
>>>>>>>>> @@ -40,6 +62,8 @@ struct amdgpu_reset_context {
>>>>>>>>>       unsigned long flags;
>>>>>>>>>   };
>>>>>>>>>   +struct amdgpu_reset_control;
>>>>>>>>> +
>>>>>>>>>   struct amdgpu_reset_handler {
>>>>>>>>>       enum amd_reset_method reset_method;
>>>>>>>>>       struct list_head handler_list;
>>>>>>>>> @@ -76,12 +100,21 @@ enum amdgpu_reset_domain_type {
>>>>>>>>>       XGMI_HIVE
>>>>>>>>>   };
>>>>>>>>>   +
>>>>>>>>> +struct amdgpu_reset_work_struct {
>>>>>>>>> +    struct delayed_work base;
>>>>>>>>> +    struct list_head node;
>>>>>>>>> +};
>>>>>>>>> +
>>>>>>>>>   struct amdgpu_reset_domain {
>>>>>>>>>       struct kref refcount;
>>>>>>>>>       struct workqueue_struct *wq;
>>>>>>>>>       enum amdgpu_reset_domain_type type;
>>>>>>>>>       struct rw_semaphore sem;
>>>>>>>>>       atomic_t in_gpu_reset;
>>>>>>>>> +
>>>>>>>>> +    struct list_head pending_works;
>>>>>>>>> +    struct mutex reset_lock;
>>>>>>>>>   };
>>>>>>>>>     @@ -113,9 +146,43 @@ static inline void 
>>>>>>>>> amdgpu_reset_put_reset_domain(struct amdgpu_reset_domain *dom
>>>>>>>>>   }
>>>>>>>>>     static inline bool amdgpu_reset_domain_schedule(struct 
>>>>>>>>> amdgpu_reset_domain *domain,
>>>>>>>>> -                        struct work_struct *work)
>>>>>>>>> +                        struct amdgpu_reset_work_struct *work)
>>>>>>>>>   {
>>>>>>>>> -    return queue_work(domain->wq, work);
>>>>>>>>> +    mutex_lock(&domain->reset_lock);
>>>>>>>>> +
>>>>>>>>> +    if (!queue_delayed_work(domain->wq, &work->base, 0)) {
>>>>>>>>> +        mutex_unlock(&domain->reset_lock);
>>>>>>>>> +        return false;
>>>>>>>>> +    }
>>>>>>>>> +
>>>>>>>>> +    list_add_tail(&work->node, &domain->pending_works);
>>>>>>>>> +    mutex_unlock(&domain->reset_lock);
>>>>>>>>> +
>>>>>>>>> +    return true;
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static inline void 
>>>>>>>>> amdgpu_reset_domain_del_pendning_work(struct 
>>>>>>>>> amdgpu_reset_domain *domain,
>>>>>>>>> +                  struct amdgpu_reset_work_struct *work)
>>>>>>>>> +{
>>>>>>>>> +    mutex_lock(&domain->reset_lock);
>>>>>>>>> +    list_del_init(&work->node);
>>>>>>>>> +    mutex_unlock(&domain->reset_lock);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static inline void amdgpu_reset_pending_list(struct 
>>>>>>>>> amdgpu_reset_domain *domain)
>>>>>>>>> +{
>>>>>>>>> +    struct amdgpu_reset_work_struct *entry, *tmp;
>>>>>>>>> +
>>>>>>>>> +    mutex_lock(&domain->reset_lock);
>>>>>>>>> +    list_for_each_entry_safe(entry, tmp, 
>>>>>>>>> &domain->pending_works, node) {
>>>>>>>>> +
>>>>>>>>> +        list_del_init(&entry->node);
>>>>>>>>> +
>>>>>>>>> +        /* Stop any other related pending resets */
>>>>>>>>> +        cancel_delayed_work(&entry->base);
>>>>>>>>> +    }
>>>>>>>>> +
>>>>>>>>> +    mutex_unlock(&domain->reset_lock);
>>>>>>>>>   }
>>>>>>>>>     void amdgpu_device_lock_reset_domain(struct 
>>>>>>>>> amdgpu_reset_domain *reset_domain);
>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h 
>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>>>>>>>>> index 239f232f9c02..574e870d3064 100644
>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>>>>>>>>> @@ -25,6 +25,7 @@
>>>>>>>>>   #define AMDGPU_VIRT_H
>>>>>>>>>     #include "amdgv_sriovmsg.h"
>>>>>>>>> +#include "amdgpu_reset.h"
>>>>>>>>>     #define AMDGPU_SRIOV_CAPS_SRIOV_VBIOS  (1 << 0) /* vBIOS is 
>>>>>>>>> sr-iov ready */
>>>>>>>>>   #define AMDGPU_SRIOV_CAPS_ENABLE_IOV   (1 << 1) /* sr-iov is 
>>>>>>>>> enabled on this GPU */
>>>>>>>>> @@ -230,7 +231,7 @@ struct amdgpu_virt {
>>>>>>>>>       uint32_t            reg_val_offs;
>>>>>>>>>       struct amdgpu_irq_src        ack_irq;
>>>>>>>>>       struct amdgpu_irq_src        rcv_irq;
>>>>>>>>> -    struct work_struct        flr_work;
>>>>>>>>> +    struct amdgpu_reset_work_struct flr_work;
>>>>>>>>>       struct amdgpu_mm_table        mm_table;
>>>>>>>>>       const struct amdgpu_virt_ops    *ops;
>>>>>>>>>       struct amdgpu_vf_error_buffer    vf_errors;
>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c 
>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>>>>>>>>> index b81acf59870c..f3d1c2be9292 100644
>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>>>>>>>>> @@ -251,7 +251,7 @@ static int 
>>>>>>>>> xgpu_ai_set_mailbox_ack_irq(struct amdgpu_device *adev,
>>>>>>>>>     static void xgpu_ai_mailbox_flr_work(struct work_struct *work)
>>>>>>>>>   {
>>>>>>>>> -    struct amdgpu_virt *virt = container_of(work, struct 
>>>>>>>>> amdgpu_virt, flr_work);
>>>>>>>>> +    struct amdgpu_virt *virt = container_of(work, struct 
>>>>>>>>> amdgpu_virt, flr_work.base.work);
>>>>>>>>>       struct amdgpu_device *adev = container_of(virt, struct 
>>>>>>>>> amdgpu_device, virt);
>>>>>>>>>       int timeout = AI_MAILBOX_POLL_FLR_TIMEDOUT;
>>>>>>>>>   @@ -380,7 +380,8 @@ int xgpu_ai_mailbox_get_irq(struct 
>>>>>>>>> amdgpu_device *adev)
>>>>>>>>>           return r;
>>>>>>>>>       }
>>>>>>>>>   -    INIT_WORK(&adev->virt.flr_work, xgpu_ai_mailbox_flr_work);
>>>>>>>>> + INIT_DELAYED_WORK(&adev->virt.flr_work.base, 
>>>>>>>>> xgpu_ai_mailbox_flr_work);
>>>>>>>>> + INIT_LIST_HEAD(&adev->virt.flr_work.node);
>>>>>>>>>         return 0;
>>>>>>>>>   }
>>>>>>>>> @@ -389,6 +390,8 @@ void xgpu_ai_mailbox_put_irq(struct 
>>>>>>>>> amdgpu_device *adev)
>>>>>>>>>   {
>>>>>>>>>       amdgpu_irq_put(adev, &adev->virt.ack_irq, 0);
>>>>>>>>>       amdgpu_irq_put(adev, &adev->virt.rcv_irq, 0);
>>>>>>>>> +
>>>>>>>>> + amdgpu_reset_domain_del_pendning_work(adev->reset_domain, 
>>>>>>>>> &adev->virt.flr_work);
>>>>>>>>>   }
>>>>>>>>>     static int xgpu_ai_request_init_data(struct amdgpu_device 
>>>>>>>>> *adev)
>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c 
>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>>>>>>>>> index 22c10b97ea81..927b3d5bb1d0 100644
>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>>>>>>>>> @@ -275,7 +275,7 @@ static int 
>>>>>>>>> xgpu_nv_set_mailbox_ack_irq(struct amdgpu_device *adev,
>>>>>>>>>     static void xgpu_nv_mailbox_flr_work(struct work_struct *work)
>>>>>>>>>   {
>>>>>>>>> -    struct amdgpu_virt *virt = container_of(work, struct 
>>>>>>>>> amdgpu_virt, flr_work);
>>>>>>>>> +    struct amdgpu_virt *virt = container_of(work, struct 
>>>>>>>>> amdgpu_virt, flr_work.base.work);
>>>>>>>>>       struct amdgpu_device *adev = container_of(virt, struct 
>>>>>>>>> amdgpu_device, virt);
>>>>>>>>>       int timeout = NV_MAILBOX_POLL_FLR_TIMEDOUT;
>>>>>>>>>   @@ -407,7 +407,8 @@ int xgpu_nv_mailbox_get_irq(struct 
>>>>>>>>> amdgpu_device *adev)
>>>>>>>>>           return r;
>>>>>>>>>       }
>>>>>>>>>   -    INIT_WORK(&adev->virt.flr_work, xgpu_nv_mailbox_flr_work);
>>>>>>>>> + INIT_DELAYED_WORK(&adev->virt.flr_work.base, 
>>>>>>>>> xgpu_nv_mailbox_flr_work);
>>>>>>>>> + INIT_LIST_HEAD(&adev->virt.flr_work.node);
>>>>>>>>>         return 0;
>>>>>>>>>   }
>>>>>>>>> @@ -416,6 +417,8 @@ void xgpu_nv_mailbox_put_irq(struct 
>>>>>>>>> amdgpu_device *adev)
>>>>>>>>>   {
>>>>>>>>>       amdgpu_irq_put(adev, &adev->virt.ack_irq, 0);
>>>>>>>>>       amdgpu_irq_put(adev, &adev->virt.rcv_irq, 0);
>>>>>>>>> +
>>>>>>>>> + amdgpu_reset_domain_del_pendning_work(adev->reset_domain, 
>>>>>>>>> &adev->virt.flr_work);
>>>>>>>>>   }
>>>>>>>>>     const struct amdgpu_virt_ops xgpu_nv_virt_ops = {
>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c 
>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
>>>>>>>>> index 7b63d30b9b79..1d4ef5c70730 100644
>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
>>>>>>>>> @@ -512,7 +512,7 @@ static int 
>>>>>>>>> xgpu_vi_set_mailbox_ack_irq(struct amdgpu_device *adev,
>>>>>>>>>     static void xgpu_vi_mailbox_flr_work(struct work_struct *work)
>>>>>>>>>   {
>>>>>>>>> -    struct amdgpu_virt *virt = container_of(work, struct 
>>>>>>>>> amdgpu_virt, flr_work);
>>>>>>>>> +    struct amdgpu_virt *virt = container_of(work, struct 
>>>>>>>>> amdgpu_virt, flr_work.base.work);
>>>>>>>>>       struct amdgpu_device *adev = container_of(virt, struct 
>>>>>>>>> amdgpu_device, virt);
>>>>>>>>>         /* wait until RCV_MSG become 3 */
>>>>>>>>> @@ -610,7 +610,8 @@ int xgpu_vi_mailbox_get_irq(struct 
>>>>>>>>> amdgpu_device *adev)
>>>>>>>>>           return r;
>>>>>>>>>       }
>>>>>>>>>   -    INIT_WORK(&adev->virt.flr_work, xgpu_vi_mailbox_flr_work);
>>>>>>>>> + INIT_DELAYED_WORK(&adev->virt.flr_work.base, 
>>>>>>>>> xgpu_vi_mailbox_flr_work);
>>>>>>>>> + INIT_LIST_HEAD(&adev->virt.flr_work.node);
>>>>>>>>>         return 0;
>>>>>>>>>   }
>>>>>>>>> @@ -619,6 +620,8 @@ void xgpu_vi_mailbox_put_irq(struct 
>>>>>>>>> amdgpu_device *adev)
>>>>>>>>>   {
>>>>>>>>>       amdgpu_irq_put(adev, &adev->virt.ack_irq, 0);
>>>>>>>>>       amdgpu_irq_put(adev, &adev->virt.rcv_irq, 0);
>>>>>>>>> +
>>>>>>>>> + amdgpu_reset_domain_del_pendning_work(adev->reset_domain, 
>>>>>>>>> &adev->virt.flr_work);
>>>>>>>>>   }
>>>>>>>>>     const struct amdgpu_virt_ops xgpu_vi_virt_ops = {
>>>>>>>>
>>>>>>
>>>>