[PATCH] drm/amdgpu: Fix multiple GPU resets in XGMI hive.
Lazar, Lijo
lijo.lazar at amd.com
Wed May 11 15:20:01 UTC 2022
On 5/11/2022 7:28 PM, Christian König wrote:
> Am 11.05.22 um 15:43 schrieb Andrey Grodzovsky:
>> On 2022-05-11 03:38, Christian König wrote:
>>> Am 10.05.22 um 20:53 schrieb Andrey Grodzovsky:
>>>> [SNIP]
>>>>> E.g. in the reset code (either before or after the reset, that's
>>>>> debatable) you do something like this:
>>>>>
>>>>> for (i = 0; i < num_ring; ++i)
>>>>> cancel_delayed_work(ring[i]->scheduler....)
>>>>> cancel_work(adev->ras_work);
>>>>> cancel_work(adev->iofault_work);
>>>>> cancel_work(adev->debugfs_work);
>>>>> ...
>>>>>
>>>>> You don't really need to track which reset source has fired and
>>>>> which hasn't, because that would be racy again. Instead just
>>>>> bluntly reset all possible sources.
>>>>>
>>>>> Christian.
>>>>
>>>>
>>>> I don't say we care if it fired once or twice (I need to add a fix
>>>> to only insert reset work to pending reset list if it's not already
>>>> there), the point of using list (or array) to which you add and from
>>>> which you remove is that the logic of this is encapsulated within
>>>> reset domain. In your way we need to be aware who exactly schedules
>>>> reset work and explicitly cancel them, this also means that for any
>>>> new source added in the future you will need to remember to add him
>>>
>>> I don't think that this is a valid argument. Additionally to the
>>> schedulers we probably just need less than a handful of reset
>>> sources, most likely even just one or two is enough.
>>>
>>> The only justification I can see of having additional separate reset
>>> sources would be if somebody wants to know if a specific source has
>>> been handled or not (e.g. call flush_work() or work_pending()). Like
>>> in the case of a reset triggered through debugfs.
>>
>>
>> This is indeed one reason, another is as we said before that if you
>> share 'reset source' (meaning a delayed work) with another client
>> (i.e. RAS and KFD) it means you make assumption that the other client
>> always proceeds with the
>> reset exactly the same way as you expect. So today we have this only
>> in scheduler vs non scheduler reset happening - non scheduler reset
>> clients assume the reset is always fully executed in HW while
>> scheduler based reset makes shortcuts and not always does HW reset
>> hence they cannot share 'reset source' (delayed work). Yes, we can
>> always add this in the future if and when such problem will arise but
>> no one will remember this then and a new bug will be introduced and
>> will take time to find and resolve.
>
> Mhm, so your main concern is that we forget to correctly handle the new
> reset sources?
>
> How about we do it like this then:
>
> struct amdgpu_reset_domain {
> ....
> union {
> struct {
> struct work_item debugfs;
> struct work_item ras;
> ....
> };
> struct work_item array[]
> } reset_sources;
> }
>
If it's only about static array,
enum amdgpu_reset_soruce {
AMDGPU_RESET_SRC_RAS,
AMDGPU_RESET_SRC_ABC,
.....
AMDGPU_RESET_SRC_XYZ,
AMDGPU_RESET_SRC_MAX
};
struct work_struct reset_work[AMDGPU_RESET_SRC_MAX]; => An index for
each work item
Thanks,
Lijo
> Not 100% sure if that works, but something like that should do the trick.
>
> My main concern is that I don't want to allocate the work items on the
> stack and dynamic allocation (e.g. kmalloc) is usually not possible either.
>
> Additional to that putting/removing work items from a list, array or
> other container is a very common source for race conditions.
>
> Regards,
> Christian.
>
>>
>>>> to the cancellation list which you showed above. In current way all
>>>> this done automatically within reset_domain code and it's agnostic
>>>> to specific driver and it's specific list of reset sources. Also in
>>>> case we would want to generalize reset_domain to other GPU drivers
>>>> (which was
>>>> a plan as far as i remember) this explicit mention of each reset
>>>> works for cancellation is again less suitable in my opinion.
>>>
>>> Well we could put the work item for the scheduler independent reset
>>> source into the reset domain as well. But I'm not sure those
>>> additional reset sources should be part of any common handling, that
>>> is largely amdgpu specific.
>>
>>
>> So it's for sure more then one source for the reasons described above,
>> also note that for scheduler we already cancel delayed work in
>> drm_sched_stop so calling them again in amdgpu code kind of superfluous.
>>
>> Andrey
>>
>>
>>>
>>> Christian.
>>>
>>>>
>>>> Andrey
>>>>
>>>>
>>>>>
>>>>>>
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> The only difference is I chose to do the canceling right BEFORE
>>>>>>>> the HW reset and not AFTER. I did this because I see a possible
>>>>>>>> race where a new reset request is being generated exactly after
>>>>>>>> we finished the HW reset but before we canceled out all pending
>>>>>>>> resets - in such case you wold not want to cancel this 'border
>>>>>>>> line new' reset request.
>>>>>>>
>>>>>>> Why not? Any new reset request directly after a hardware reset is
>>>>>>> most likely just falsely generated by the reset itself.
>>>>>>>
>>>>>>> Ideally I would cancel all sources after the reset, but before
>>>>>>> starting any new work.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Christian.
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Andrey
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>>>> You can see that if many different reset sources share same
>>>>>>>>>>> work struct what can happen is that the first to obtain the
>>>>>>>>>>> lock you describe bellow might opt out from full HW reset
>>>>>>>>>>> because his bad job did signal for example or because his
>>>>>>>>>>> hunged IP block was able to recover through SW reset but in
>>>>>>>>>>> the meantime another reset source who needed an actual HW
>>>>>>>>>>> reset just silently returned and we end up with unhandled
>>>>>>>>>>> reset request. True that today this happens only to job
>>>>>>>>>>> timeout reset sources that are handled form within the
>>>>>>>>>>> scheduler and won't use this single work struct but no one
>>>>>>>>>>> prevents a future case for this to happen and also, if we
>>>>>>>>>>> actually want to unify scheduler time out handlers within
>>>>>>>>>>> reset domain (which seems to me the right design approach) we
>>>>>>>>>>> won't be able to use just one work struct for this reason
>>>>>>>>>>> anyway.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Just to add to this point - a reset domain is co-operative
>>>>>>>>>> domain. In addition to sharing a set of clients from various
>>>>>>>>>> reset sources for one device, it also will have a set of
>>>>>>>>>> devices like in XGMI hive. The job timeout on one device may
>>>>>>>>>> not eventually result in result, but a RAS error happening on
>>>>>>>>>> another device at the same time would need a reset. The second
>>>>>>>>>> device's RAS error cannot return seeing that a reset work
>>>>>>>>>> already started, or ignore the reset work given that another
>>>>>>>>>> device has filled the reset data.
>>>>>>>>>>
>>>>>>>>>> When there is a reset domain, it should take care of the work
>>>>>>>>>> scheduled and keeping it in device or any other level doesn't
>>>>>>>>>> sound good.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Lijo
>>>>>>>>>>
>>>>>>>>>>> Andrey
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I'd put the reset work struct into the reset_domain struct.
>>>>>>>>>>>> That way you'd have exactly one worker for the reset domain.
>>>>>>>>>>>> You could implement a lock-less scheme to decide whether you
>>>>>>>>>>>> need to schedule a reset, e.g. using an atomic counter in
>>>>>>>>>>>> the shared work struct that gets incremented when a client
>>>>>>>>>>>> wants to trigger a reset (atomic_add_return). If that
>>>>>>>>>>>> counter is exactly 1 after incrementing, you need to fill in
>>>>>>>>>>>> the rest of the work struct and schedule the work.
>>>>>>>>>>>> Otherwise, it's already scheduled (or another client is in
>>>>>>>>>>>> the process of scheduling it) and you just return. When the
>>>>>>>>>>>> worker finishes (after confirming a successful reset), it
>>>>>>>>>>>> resets the counter to 0, so the next client requesting a
>>>>>>>>>>>> reset will schedule the worker again.
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Felix
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Additional to that keep in mind that you can't allocate
>>>>>>>>>>>>>> any memory before or during the GPU reset nor wait for the
>>>>>>>>>>>>>> reset to complete (so you can't allocate anything on the
>>>>>>>>>>>>>> stack either).
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> There is no dynamic allocation here, regarding stack
>>>>>>>>>>>>> allocations - we do it all the time when we call functions,
>>>>>>>>>>>>> even during GPU resets, how on stack allocation of work
>>>>>>>>>>>>> struct in amdgpu_device_gpu_recover is different from any
>>>>>>>>>>>>> other local variable we allocate in any function we call ?
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am also not sure why it's not allowed to wait for reset
>>>>>>>>>>>>> to complete ? Also, see in amdgpu_ras_do_recovery and
>>>>>>>>>>>>> gpu_recover_get (debugfs) - the caller expects the reset to
>>>>>>>>>>>>> complete before he returns. I can probably work around it
>>>>>>>>>>>>> in RAS code by calling atomic_set(&ras->in_recovery, 0)
>>>>>>>>>>>>> from some callback within actual reset function but
>>>>>>>>>>>>> regarding sysfs it actually expects a result returned
>>>>>>>>>>>>> indicating whether the call was successful or not.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I don't think that concept you try here will work.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Also in general seems to me it's cleaner approach where
>>>>>>>>>>>>>>> this logic (the work items) are held and handled in
>>>>>>>>>>>>>>> reset_domain and are not split in each adev or any other
>>>>>>>>>>>>>>> entity. We might want in the future to even move the
>>>>>>>>>>>>>>> scheduler handling into reset domain since reset domain
>>>>>>>>>>>>>>> is supposed to be a generic things and not only or AMD.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Signed-off-by: Andrey Grodzovsky
>>>>>>>>>>>>>>>>>>> <andrey.grodzovsky at amd.com>
>>>>>>>>>>>>>>>>>>> Tested-by: Bai Zoy <Zoy.Bai at amd.com>
>>>>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 11 +---
>>>>>>>>>>>>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 17 +++--
>>>>>>>>>>>>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 3 +
>>>>>>>>>>>>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 73
>>>>>>>>>>>>>>>>>>> +++++++++++++++++++++-
>>>>>>>>>>>>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h | 3 +-
>>>>>>>>>>>>>>>>>>> drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 7 ++-
>>>>>>>>>>>>>>>>>>> drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 7 ++-
>>>>>>>>>>>>>>>>>>> drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c | 7 ++-
>>>>>>>>>>>>>>>>>>> 8 files changed, 104 insertions(+), 24 deletions(-)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>>>>>>>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>>>>>>>>>>>>>>>> index 4264abc5604d..99efd8317547 100644
>>>>>>>>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>>>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>>>>>>>>>>>>>>>> @@ -109,6 +109,7 @@
>>>>>>>>>>>>>>>>>>> #include "amdgpu_fdinfo.h"
>>>>>>>>>>>>>>>>>>> #include "amdgpu_mca.h"
>>>>>>>>>>>>>>>>>>> #include "amdgpu_ras.h"
>>>>>>>>>>>>>>>>>>> +#include "amdgpu_reset.h"
>>>>>>>>>>>>>>>>>>> #define MAX_GPU_INSTANCE 16
>>>>>>>>>>>>>>>>>>> @@ -509,16 +510,6 @@ struct
>>>>>>>>>>>>>>>>>>> amdgpu_allowed_register_entry {
>>>>>>>>>>>>>>>>>>> bool grbm_indexed;
>>>>>>>>>>>>>>>>>>> };
>>>>>>>>>>>>>>>>>>> -enum amd_reset_method {
>>>>>>>>>>>>>>>>>>> - AMD_RESET_METHOD_NONE = -1,
>>>>>>>>>>>>>>>>>>> - AMD_RESET_METHOD_LEGACY = 0,
>>>>>>>>>>>>>>>>>>> - AMD_RESET_METHOD_MODE0,
>>>>>>>>>>>>>>>>>>> - AMD_RESET_METHOD_MODE1,
>>>>>>>>>>>>>>>>>>> - AMD_RESET_METHOD_MODE2,
>>>>>>>>>>>>>>>>>>> - AMD_RESET_METHOD_BACO,
>>>>>>>>>>>>>>>>>>> - AMD_RESET_METHOD_PCI,
>>>>>>>>>>>>>>>>>>> -};
>>>>>>>>>>>>>>>>>>> -
>>>>>>>>>>>>>>>>>>> struct amdgpu_video_codec_info {
>>>>>>>>>>>>>>>>>>> u32 codec_type;
>>>>>>>>>>>>>>>>>>> u32 max_width;
>>>>>>>>>>>>>>>>>>> diff --git
>>>>>>>>>>>>>>>>>>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>>>>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>>>>>>>>>>>>> index e582f1044c0f..7fa82269c30f 100644
>>>>>>>>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>>>>>>>>>>>>> @@ -5201,6 +5201,12 @@ int
>>>>>>>>>>>>>>>>>>> amdgpu_device_gpu_recover_imp(struct amdgpu_device
>>>>>>>>>>>>>>>>>>> *adev,
>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>> tmp_vram_lost_counter =
>>>>>>>>>>>>>>>>>>> atomic_read(&((adev)->vram_lost_counter));
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> + /* Drop all pending resets since we will reset
>>>>>>>>>>>>>>>>>>> now anyway */
>>>>>>>>>>>>>>>>>>> + tmp_adev = list_first_entry(device_list_handle,
>>>>>>>>>>>>>>>>>>> struct amdgpu_device,
>>>>>>>>>>>>>>>>>>> + reset_list);
>>>>>>>>>>>>>>>>>>> + amdgpu_reset_pending_list(tmp_adev->reset_domain);
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> /* Actual ASIC resets if needed.*/
>>>>>>>>>>>>>>>>>>> /* Host driver will handle XGMI hive reset for
>>>>>>>>>>>>>>>>>>> SRIOV */
>>>>>>>>>>>>>>>>>>> if (amdgpu_sriov_vf(adev)) {
>>>>>>>>>>>>>>>>>>> @@ -5296,7 +5302,7 @@ int
>>>>>>>>>>>>>>>>>>> amdgpu_device_gpu_recover_imp(struct amdgpu_device
>>>>>>>>>>>>>>>>>>> *adev,
>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>> struct amdgpu_recover_work_struct {
>>>>>>>>>>>>>>>>>>> - struct work_struct base;
>>>>>>>>>>>>>>>>>>> + struct amdgpu_reset_work_struct base;
>>>>>>>>>>>>>>>>>>> struct amdgpu_device *adev;
>>>>>>>>>>>>>>>>>>> struct amdgpu_job *job;
>>>>>>>>>>>>>>>>>>> int ret;
>>>>>>>>>>>>>>>>>>> @@ -5304,7 +5310,7 @@ struct
>>>>>>>>>>>>>>>>>>> amdgpu_recover_work_struct {
>>>>>>>>>>>>>>>>>>> static void
>>>>>>>>>>>>>>>>>>> amdgpu_device_queue_gpu_recover_work(struct
>>>>>>>>>>>>>>>>>>> work_struct *work)
>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>> - struct amdgpu_recover_work_struct *recover_work
>>>>>>>>>>>>>>>>>>> = container_of(work, struct
>>>>>>>>>>>>>>>>>>> amdgpu_recover_work_struct, base);
>>>>>>>>>>>>>>>>>>> + struct amdgpu_recover_work_struct *recover_work
>>>>>>>>>>>>>>>>>>> = container_of(work, struct
>>>>>>>>>>>>>>>>>>> amdgpu_recover_work_struct, base.base.work);
>>>>>>>>>>>>>>>>>>> recover_work->ret =
>>>>>>>>>>>>>>>>>>> amdgpu_device_gpu_recover_imp(recover_work->adev,
>>>>>>>>>>>>>>>>>>> recover_work->job);
>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>> @@ -5316,12 +5322,15 @@ int
>>>>>>>>>>>>>>>>>>> amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>> struct amdgpu_recover_work_struct work = {.adev
>>>>>>>>>>>>>>>>>>> = adev, .job = job};
>>>>>>>>>>>>>>>>>>> - INIT_WORK(&work.base,
>>>>>>>>>>>>>>>>>>> amdgpu_device_queue_gpu_recover_work);
>>>>>>>>>>>>>>>>>>> + INIT_DELAYED_WORK(&work.base.base,
>>>>>>>>>>>>>>>>>>> amdgpu_device_queue_gpu_recover_work);
>>>>>>>>>>>>>>>>>>> + INIT_LIST_HEAD(&work.base.node);
>>>>>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>>>>> (!amdgpu_reset_domain_schedule(adev->reset_domain,
>>>>>>>>>>>>>>>>>>> &work.base))
>>>>>>>>>>>>>>>>>>> return -EAGAIN;
>>>>>>>>>>>>>>>>>>> - flush_work(&work.base);
>>>>>>>>>>>>>>>>>>> + flush_delayed_work(&work.base.base);
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> amdgpu_reset_domain_del_pendning_work(adev->reset_domain,
>>>>>>>>>>>>>>>>>>> &work.base);
>>>>>>>>>>>>>>>>>>> return work.ret;
>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>> diff --git
>>>>>>>>>>>>>>>>>>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
>>>>>>>>>>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
>>>>>>>>>>>>>>>>>>> index c80af0889773..ffddd419c351 100644
>>>>>>>>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
>>>>>>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
>>>>>>>>>>>>>>>>>>> @@ -134,6 +134,9 @@ struct amdgpu_reset_domain
>>>>>>>>>>>>>>>>>>> *amdgpu_reset_create_reset_domain(enum amdgpu_reset_d
>>>>>>>>>>>>>>>>>>> atomic_set(&reset_domain->in_gpu_reset, 0);
>>>>>>>>>>>>>>>>>>> init_rwsem(&reset_domain->sem);
>>>>>>>>>>>>>>>>>>> + INIT_LIST_HEAD(&reset_domain->pending_works);
>>>>>>>>>>>>>>>>>>> + mutex_init(&reset_domain->reset_lock);
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> return reset_domain;
>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>> diff --git
>>>>>>>>>>>>>>>>>>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
>>>>>>>>>>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
>>>>>>>>>>>>>>>>>>> index 1949dbe28a86..863ec5720fc1 100644
>>>>>>>>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
>>>>>>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
>>>>>>>>>>>>>>>>>>> @@ -24,7 +24,18 @@
>>>>>>>>>>>>>>>>>>> #ifndef __AMDGPU_RESET_H__
>>>>>>>>>>>>>>>>>>> #define __AMDGPU_RESET_H__
>>>>>>>>>>>>>>>>>>> -#include "amdgpu.h"
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +#include <linux/atomic.h>
>>>>>>>>>>>>>>>>>>> +#include <linux/mutex.h>
>>>>>>>>>>>>>>>>>>> +#include <linux/list.h>
>>>>>>>>>>>>>>>>>>> +#include <linux/kref.h>
>>>>>>>>>>>>>>>>>>> +#include <linux/rwsem.h>
>>>>>>>>>>>>>>>>>>> +#include <linux/workqueue.h>
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +struct amdgpu_device;
>>>>>>>>>>>>>>>>>>> +struct amdgpu_job;
>>>>>>>>>>>>>>>>>>> +struct amdgpu_hive_info;
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> enum AMDGPU_RESET_FLAGS {
>>>>>>>>>>>>>>>>>>> @@ -32,6 +43,17 @@ enum AMDGPU_RESET_FLAGS {
>>>>>>>>>>>>>>>>>>> AMDGPU_SKIP_HW_RESET = 1,
>>>>>>>>>>>>>>>>>>> };
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +enum amd_reset_method {
>>>>>>>>>>>>>>>>>>> + AMD_RESET_METHOD_NONE = -1,
>>>>>>>>>>>>>>>>>>> + AMD_RESET_METHOD_LEGACY = 0,
>>>>>>>>>>>>>>>>>>> + AMD_RESET_METHOD_MODE0,
>>>>>>>>>>>>>>>>>>> + AMD_RESET_METHOD_MODE1,
>>>>>>>>>>>>>>>>>>> + AMD_RESET_METHOD_MODE2,
>>>>>>>>>>>>>>>>>>> + AMD_RESET_METHOD_BACO,
>>>>>>>>>>>>>>>>>>> + AMD_RESET_METHOD_PCI,
>>>>>>>>>>>>>>>>>>> +};
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> struct amdgpu_reset_context {
>>>>>>>>>>>>>>>>>>> enum amd_reset_method method;
>>>>>>>>>>>>>>>>>>> struct amdgpu_device *reset_req_dev;
>>>>>>>>>>>>>>>>>>> @@ -40,6 +62,8 @@ struct amdgpu_reset_context {
>>>>>>>>>>>>>>>>>>> unsigned long flags;
>>>>>>>>>>>>>>>>>>> };
>>>>>>>>>>>>>>>>>>> +struct amdgpu_reset_control;
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> struct amdgpu_reset_handler {
>>>>>>>>>>>>>>>>>>> enum amd_reset_method reset_method;
>>>>>>>>>>>>>>>>>>> struct list_head handler_list;
>>>>>>>>>>>>>>>>>>> @@ -76,12 +100,21 @@ enum amdgpu_reset_domain_type {
>>>>>>>>>>>>>>>>>>> XGMI_HIVE
>>>>>>>>>>>>>>>>>>> };
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +struct amdgpu_reset_work_struct {
>>>>>>>>>>>>>>>>>>> + struct delayed_work base;
>>>>>>>>>>>>>>>>>>> + struct list_head node;
>>>>>>>>>>>>>>>>>>> +};
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> struct amdgpu_reset_domain {
>>>>>>>>>>>>>>>>>>> struct kref refcount;
>>>>>>>>>>>>>>>>>>> struct workqueue_struct *wq;
>>>>>>>>>>>>>>>>>>> enum amdgpu_reset_domain_type type;
>>>>>>>>>>>>>>>>>>> struct rw_semaphore sem;
>>>>>>>>>>>>>>>>>>> atomic_t in_gpu_reset;
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> + struct list_head pending_works;
>>>>>>>>>>>>>>>>>>> + struct mutex reset_lock;
>>>>>>>>>>>>>>>>>>> };
>>>>>>>>>>>>>>>>>>> @@ -113,9 +146,43 @@ static inline void
>>>>>>>>>>>>>>>>>>> amdgpu_reset_put_reset_domain(struct
>>>>>>>>>>>>>>>>>>> amdgpu_reset_domain *dom
>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>> static inline bool
>>>>>>>>>>>>>>>>>>> amdgpu_reset_domain_schedule(struct
>>>>>>>>>>>>>>>>>>> amdgpu_reset_domain *domain,
>>>>>>>>>>>>>>>>>>> - struct work_struct *work)
>>>>>>>>>>>>>>>>>>> + struct
>>>>>>>>>>>>>>>>>>> amdgpu_reset_work_struct *work)
>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>> - return queue_work(domain->wq, work);
>>>>>>>>>>>>>>>>>>> + mutex_lock(&domain->reset_lock);
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> + if (!queue_delayed_work(domain->wq, &work->base,
>>>>>>>>>>>>>>>>>>> 0)) {
>>>>>>>>>>>>>>>>>>> + mutex_unlock(&domain->reset_lock);
>>>>>>>>>>>>>>>>>>> + return false;
>>>>>>>>>>>>>>>>>>> + }
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> + list_add_tail(&work->node, &domain->pending_works);
>>>>>>>>>>>>>>>>>>> + mutex_unlock(&domain->reset_lock);
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> + return true;
>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +static inline void
>>>>>>>>>>>>>>>>>>> amdgpu_reset_domain_del_pendning_work(struct
>>>>>>>>>>>>>>>>>>> amdgpu_reset_domain *domain,
>>>>>>>>>>>>>>>>>>> + struct amdgpu_reset_work_struct
>>>>>>>>>>>>>>>>>>> *work)
>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>> + mutex_lock(&domain->reset_lock);
>>>>>>>>>>>>>>>>>>> + list_del_init(&work->node);
>>>>>>>>>>>>>>>>>>> + mutex_unlock(&domain->reset_lock);
>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +static inline void amdgpu_reset_pending_list(struct
>>>>>>>>>>>>>>>>>>> amdgpu_reset_domain *domain)
>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>> + struct amdgpu_reset_work_struct *entry, *tmp;
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> + mutex_lock(&domain->reset_lock);
>>>>>>>>>>>>>>>>>>> + list_for_each_entry_safe(entry, tmp,
>>>>>>>>>>>>>>>>>>> &domain->pending_works, node) {
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> + list_del_init(&entry->node);
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> + /* Stop any other related pending resets */
>>>>>>>>>>>>>>>>>>> + cancel_delayed_work(&entry->base);
>>>>>>>>>>>>>>>>>>> + }
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> + mutex_unlock(&domain->reset_lock);
>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>> void amdgpu_device_lock_reset_domain(struct
>>>>>>>>>>>>>>>>>>> amdgpu_reset_domain *reset_domain);
>>>>>>>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>>>>>>>>>>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>>>>>>>>>>>>>>>>>>> index 239f232f9c02..574e870d3064 100644
>>>>>>>>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>>>>>>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>>>>>>>>>>>>>>>>>>> @@ -25,6 +25,7 @@
>>>>>>>>>>>>>>>>>>> #define AMDGPU_VIRT_H
>>>>>>>>>>>>>>>>>>> #include "amdgv_sriovmsg.h"
>>>>>>>>>>>>>>>>>>> +#include "amdgpu_reset.h"
>>>>>>>>>>>>>>>>>>> #define AMDGPU_SRIOV_CAPS_SRIOV_VBIOS (1 << 0) /*
>>>>>>>>>>>>>>>>>>> vBIOS is sr-iov ready */
>>>>>>>>>>>>>>>>>>> #define AMDGPU_SRIOV_CAPS_ENABLE_IOV (1 << 1) /*
>>>>>>>>>>>>>>>>>>> sr-iov is enabled on this GPU */
>>>>>>>>>>>>>>>>>>> @@ -230,7 +231,7 @@ struct amdgpu_virt {
>>>>>>>>>>>>>>>>>>> uint32_t reg_val_offs;
>>>>>>>>>>>>>>>>>>> struct amdgpu_irq_src ack_irq;
>>>>>>>>>>>>>>>>>>> struct amdgpu_irq_src rcv_irq;
>>>>>>>>>>>>>>>>>>> - struct work_struct flr_work;
>>>>>>>>>>>>>>>>>>> + struct amdgpu_reset_work_struct flr_work;
>>>>>>>>>>>>>>>>>>> struct amdgpu_mm_table mm_table;
>>>>>>>>>>>>>>>>>>> const struct amdgpu_virt_ops *ops;
>>>>>>>>>>>>>>>>>>> struct amdgpu_vf_error_buffer vf_errors;
>>>>>>>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>>>>>>>>>>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>>>>>>>>>>>>>>>>>>> index b81acf59870c..f3d1c2be9292 100644
>>>>>>>>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>>>>>>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>>>>>>>>>>>>>>>>>>> @@ -251,7 +251,7 @@ static int
>>>>>>>>>>>>>>>>>>> xgpu_ai_set_mailbox_ack_irq(struct amdgpu_device *adev,
>>>>>>>>>>>>>>>>>>> static void xgpu_ai_mailbox_flr_work(struct
>>>>>>>>>>>>>>>>>>> work_struct *work)
>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>> - struct amdgpu_virt *virt = container_of(work,
>>>>>>>>>>>>>>>>>>> struct amdgpu_virt, flr_work);
>>>>>>>>>>>>>>>>>>> + struct amdgpu_virt *virt = container_of(work,
>>>>>>>>>>>>>>>>>>> struct amdgpu_virt, flr_work.base.work);
>>>>>>>>>>>>>>>>>>> struct amdgpu_device *adev = container_of(virt,
>>>>>>>>>>>>>>>>>>> struct amdgpu_device, virt);
>>>>>>>>>>>>>>>>>>> int timeout = AI_MAILBOX_POLL_FLR_TIMEDOUT;
>>>>>>>>>>>>>>>>>>> @@ -380,7 +380,8 @@ int
>>>>>>>>>>>>>>>>>>> xgpu_ai_mailbox_get_irq(struct amdgpu_device *adev)
>>>>>>>>>>>>>>>>>>> return r;
>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>> - INIT_WORK(&adev->virt.flr_work,
>>>>>>>>>>>>>>>>>>> xgpu_ai_mailbox_flr_work);
>>>>>>>>>>>>>>>>>>> + INIT_DELAYED_WORK(&adev->virt.flr_work.base,
>>>>>>>>>>>>>>>>>>> xgpu_ai_mailbox_flr_work);
>>>>>>>>>>>>>>>>>>> + INIT_LIST_HEAD(&adev->virt.flr_work.node);
>>>>>>>>>>>>>>>>>>> return 0;
>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>> @@ -389,6 +390,8 @@ void
>>>>>>>>>>>>>>>>>>> xgpu_ai_mailbox_put_irq(struct amdgpu_device *adev)
>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>> amdgpu_irq_put(adev, &adev->virt.ack_irq, 0);
>>>>>>>>>>>>>>>>>>> amdgpu_irq_put(adev, &adev->virt.rcv_irq, 0);
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> amdgpu_reset_domain_del_pendning_work(adev->reset_domain,
>>>>>>>>>>>>>>>>>>> &adev->virt.flr_work);
>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>> static int xgpu_ai_request_init_data(struct
>>>>>>>>>>>>>>>>>>> amdgpu_device *adev)
>>>>>>>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>>>>>>>>>>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>>>>>>>>>>>>>>>>>>> index 22c10b97ea81..927b3d5bb1d0 100644
>>>>>>>>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>>>>>>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>>>>>>>>>>>>>>>>>>> @@ -275,7 +275,7 @@ static int
>>>>>>>>>>>>>>>>>>> xgpu_nv_set_mailbox_ack_irq(struct amdgpu_device *adev,
>>>>>>>>>>>>>>>>>>> static void xgpu_nv_mailbox_flr_work(struct
>>>>>>>>>>>>>>>>>>> work_struct *work)
>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>> - struct amdgpu_virt *virt = container_of(work,
>>>>>>>>>>>>>>>>>>> struct amdgpu_virt, flr_work);
>>>>>>>>>>>>>>>>>>> + struct amdgpu_virt *virt = container_of(work,
>>>>>>>>>>>>>>>>>>> struct amdgpu_virt, flr_work.base.work);
>>>>>>>>>>>>>>>>>>> struct amdgpu_device *adev = container_of(virt,
>>>>>>>>>>>>>>>>>>> struct amdgpu_device, virt);
>>>>>>>>>>>>>>>>>>> int timeout = NV_MAILBOX_POLL_FLR_TIMEDOUT;
>>>>>>>>>>>>>>>>>>> @@ -407,7 +407,8 @@ int
>>>>>>>>>>>>>>>>>>> xgpu_nv_mailbox_get_irq(struct amdgpu_device *adev)
>>>>>>>>>>>>>>>>>>> return r;
>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>> - INIT_WORK(&adev->virt.flr_work,
>>>>>>>>>>>>>>>>>>> xgpu_nv_mailbox_flr_work);
>>>>>>>>>>>>>>>>>>> + INIT_DELAYED_WORK(&adev->virt.flr_work.base,
>>>>>>>>>>>>>>>>>>> xgpu_nv_mailbox_flr_work);
>>>>>>>>>>>>>>>>>>> + INIT_LIST_HEAD(&adev->virt.flr_work.node);
>>>>>>>>>>>>>>>>>>> return 0;
>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>> @@ -416,6 +417,8 @@ void
>>>>>>>>>>>>>>>>>>> xgpu_nv_mailbox_put_irq(struct amdgpu_device *adev)
>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>> amdgpu_irq_put(adev, &adev->virt.ack_irq, 0);
>>>>>>>>>>>>>>>>>>> amdgpu_irq_put(adev, &adev->virt.rcv_irq, 0);
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> amdgpu_reset_domain_del_pendning_work(adev->reset_domain,
>>>>>>>>>>>>>>>>>>> &adev->virt.flr_work);
>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>> const struct amdgpu_virt_ops xgpu_nv_virt_ops = {
>>>>>>>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
>>>>>>>>>>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
>>>>>>>>>>>>>>>>>>> index 7b63d30b9b79..1d4ef5c70730 100644
>>>>>>>>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
>>>>>>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
>>>>>>>>>>>>>>>>>>> @@ -512,7 +512,7 @@ static int
>>>>>>>>>>>>>>>>>>> xgpu_vi_set_mailbox_ack_irq(struct amdgpu_device *adev,
>>>>>>>>>>>>>>>>>>> static void xgpu_vi_mailbox_flr_work(struct
>>>>>>>>>>>>>>>>>>> work_struct *work)
>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>> - struct amdgpu_virt *virt = container_of(work,
>>>>>>>>>>>>>>>>>>> struct amdgpu_virt, flr_work);
>>>>>>>>>>>>>>>>>>> + struct amdgpu_virt *virt = container_of(work,
>>>>>>>>>>>>>>>>>>> struct amdgpu_virt, flr_work.base.work);
>>>>>>>>>>>>>>>>>>> struct amdgpu_device *adev = container_of(virt,
>>>>>>>>>>>>>>>>>>> struct amdgpu_device, virt);
>>>>>>>>>>>>>>>>>>> /* wait until RCV_MSG become 3 */
>>>>>>>>>>>>>>>>>>> @@ -610,7 +610,8 @@ int
>>>>>>>>>>>>>>>>>>> xgpu_vi_mailbox_get_irq(struct amdgpu_device *adev)
>>>>>>>>>>>>>>>>>>> return r;
>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>> - INIT_WORK(&adev->virt.flr_work,
>>>>>>>>>>>>>>>>>>> xgpu_vi_mailbox_flr_work);
>>>>>>>>>>>>>>>>>>> + INIT_DELAYED_WORK(&adev->virt.flr_work.base,
>>>>>>>>>>>>>>>>>>> xgpu_vi_mailbox_flr_work);
>>>>>>>>>>>>>>>>>>> + INIT_LIST_HEAD(&adev->virt.flr_work.node);
>>>>>>>>>>>>>>>>>>> return 0;
>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>> @@ -619,6 +620,8 @@ void
>>>>>>>>>>>>>>>>>>> xgpu_vi_mailbox_put_irq(struct amdgpu_device *adev)
>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>> amdgpu_irq_put(adev, &adev->virt.ack_irq, 0);
>>>>>>>>>>>>>>>>>>> amdgpu_irq_put(adev, &adev->virt.rcv_irq, 0);
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> amdgpu_reset_domain_del_pendning_work(adev->reset_domain,
>>>>>>>>>>>>>>>>>>> &adev->virt.flr_work);
>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>> const struct amdgpu_virt_ops xgpu_vi_virt_ops = {
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>
More information about the amd-gfx
mailing list