[PATCH] drm/amdgpu: Fix multiple GPU resets in XGMI hive.
Andrey Grodzovsky
andrey.grodzovsky at amd.com
Wed May 11 15:35:24 UTC 2022
On 2022-05-11 11:20, Lazar, Lijo wrote:
>
>
> On 5/11/2022 7:28 PM, Christian König wrote:
>> Am 11.05.22 um 15:43 schrieb Andrey Grodzovsky:
>>> On 2022-05-11 03:38, Christian König wrote:
>>>> Am 10.05.22 um 20:53 schrieb Andrey Grodzovsky:
>>>>> [SNIP]
>>>>>> E.g. in the reset code (either before or after the reset, that's
>>>>>> debatable) you do something like this:
>>>>>>
>>>>>> for (i = 0; i < num_ring; ++i)
>>>>>> cancel_delayed_work(ring[i]->scheduler....)
>>>>>> cancel_work(adev->ras_work);
>>>>>> cancel_work(adev->iofault_work);
>>>>>> cancel_work(adev->debugfs_work);
>>>>>> ...
>>>>>>
>>>>>> You don't really need to track which reset source has fired and
>>>>>> which hasn't, because that would be racy again. Instead just
>>>>>> bluntly reset all possible sources.
>>>>>>
>>>>>> Christian.
>>>>>
>>>>>
>>>>> I don't say we care if it fired once or twice (I need to add a fix
>>>>> to only insert reset work to pending reset list if it's not
>>>>> already there), the point of using list (or array) to which you
>>>>> add and from which you remove is that the logic of this is
>>>>> encapsulated within reset domain. In your way we need to be aware
>>>>> who exactly schedules reset work and explicitly cancel them, this
>>>>> also means that for any new source added in the future you will
>>>>> need to remember to add him
>>>>
>>>> I don't think that this is a valid argument. Additionally to the
>>>> schedulers we probably just need less than a handful of reset
>>>> sources, most likely even just one or two is enough.
>>>>
>>>> The only justification I can see of having additional separate
>>>> reset sources would be if somebody wants to know if a specific
>>>> source has been handled or not (e.g. call flush_work() or
>>>> work_pending()). Like in the case of a reset triggered through
>>>> debugfs.
>>>
>>>
>>> This is indeed one reason, another is as we said before that if you
>>> share 'reset source' (meaning a delayed work) with another client
>>> (i.e. RAS and KFD) it means you make assumption that the other
>>> client always proceeds with the
>>> reset exactly the same way as you expect. So today we have this only
>>> in scheduler vs non scheduler reset happening - non scheduler reset
>>> clients assume the reset is always fully executed in HW while
>>> scheduler based reset makes shortcuts and not always does HW reset
>>> hence they cannot share 'reset source' (delayed work). Yes, we can
>>> always add this in the future if and when such problem will arise
>>> but no one will remember this then and a new bug will be introduced
>>> and will take time to find and resolve.
>>
>> Mhm, so your main concern is that we forget to correctly handle the
>> new reset sources?
>>
>> How about we do it like this then:
>>
>> struct amdgpu_reset_domain {
>> ....
>> union {
>> struct {
>> struct work_item debugfs;
>> struct work_item ras;
>> ....
>> };
>> struct work_item array[]
>> } reset_sources;
>> }
>>
>
> If it's only about static array,
>
> enum amdgpu_reset_soruce {
>
> AMDGPU_RESET_SRC_RAS,
> AMDGPU_RESET_SRC_ABC,
> .....
> AMDGPU_RESET_SRC_XYZ,
> AMDGPU_RESET_SRC_MAX
>
> };
>
> struct work_struct reset_work[AMDGPU_RESET_SRC_MAX]; => An index for
> each work item
>
>
> Thanks,
> Lijo
It's possible though it makes harder to generalize reset_domain later
for other drivers.
But still one caveat, look at amdgpu_recover_work_struct and it's usage
in amdgpu_device_gpu_recover and in gpu_recover_get,
At least for debugfs i need to return back the result of GPU reset and
so I cannot store actual work items in the array mentioned above
but rather pointers to work_item because i need a way to get back the
return value from gpu_recover like I do now in amdgpu_device_gpu_recover.
Andrey
>
>> Not 100% sure if that works, but something like that should do the
>> trick.
>>
>> My main concern is that I don't want to allocate the work items on
>> the stack and dynamic allocation (e.g. kmalloc) is usually not
>> possible either.
>>
>> Additional to that putting/removing work items from a list, array or
>> other container is a very common source for race conditions.
>>
>> Regards,
>> Christian.
>>
>>>
>>>>> to the cancellation list which you showed above. In current way
>>>>> all this done automatically within reset_domain code and it's
>>>>> agnostic to specific driver and it's specific list of reset
>>>>> sources. Also in case we would want to generalize reset_domain to
>>>>> other GPU drivers (which was
>>>>> a plan as far as i remember) this explicit mention of each reset
>>>>> works for cancellation is again less suitable in my opinion.
>>>>
>>>> Well we could put the work item for the scheduler independent reset
>>>> source into the reset domain as well. But I'm not sure those
>>>> additional reset sources should be part of any common handling,
>>>> that is largely amdgpu specific.
>>>
>>>
>>> So it's for sure more then one source for the reasons described
>>> above, also note that for scheduler we already cancel delayed work
>>> in drm_sched_stop so calling them again in amdgpu code kind of
>>> superfluous.
>>>
>>> Andrey
>>>
>>>
>>>>
>>>> Christian.
>>>>
>>>>>
>>>>> Andrey
>>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Andrey
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> The only difference is I chose to do the canceling right
>>>>>>>>> BEFORE the HW reset and not AFTER. I did this because I see a
>>>>>>>>> possible race where a new reset request is being generated
>>>>>>>>> exactly after we finished the HW reset but before we canceled
>>>>>>>>> out all pending resets - in such case you wold not want to
>>>>>>>>> cancel this 'border line new' reset request.
>>>>>>>>
>>>>>>>> Why not? Any new reset request directly after a hardware reset
>>>>>>>> is most likely just falsely generated by the reset itself.
>>>>>>>>
>>>>>>>> Ideally I would cancel all sources after the reset, but before
>>>>>>>> starting any new work.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Andrey
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Christian.
>>>>>>>>>>
>>>>>>>>>>>> You can see that if many different reset sources share same
>>>>>>>>>>>> work struct what can happen is that the first to obtain the
>>>>>>>>>>>> lock you describe bellow might opt out from full HW reset
>>>>>>>>>>>> because his bad job did signal for example or because his
>>>>>>>>>>>> hunged IP block was able to recover through SW reset but in
>>>>>>>>>>>> the meantime another reset source who needed an actual HW
>>>>>>>>>>>> reset just silently returned and we end up with unhandled
>>>>>>>>>>>> reset request. True that today this happens only to job
>>>>>>>>>>>> timeout reset sources that are handled form within the
>>>>>>>>>>>> scheduler and won't use this single work struct but no one
>>>>>>>>>>>> prevents a future case for this to happen and also, if we
>>>>>>>>>>>> actually want to unify scheduler time out handlers within
>>>>>>>>>>>> reset domain (which seems to me the right design approach)
>>>>>>>>>>>> we won't be able to use just one work struct for this
>>>>>>>>>>>> reason anyway.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Just to add to this point - a reset domain is co-operative
>>>>>>>>>>> domain. In addition to sharing a set of clients from various
>>>>>>>>>>> reset sources for one device, it also will have a set of
>>>>>>>>>>> devices like in XGMI hive. The job timeout on one device may
>>>>>>>>>>> not eventually result in result, but a RAS error happening
>>>>>>>>>>> on another device at the same time would need a reset. The
>>>>>>>>>>> second device's RAS error cannot return seeing that a reset
>>>>>>>>>>> work already started, or ignore the reset work given that
>>>>>>>>>>> another device has filled the reset data.
>>>>>>>>>>>
>>>>>>>>>>> When there is a reset domain, it should take care of the
>>>>>>>>>>> work scheduled and keeping it in device or any other level
>>>>>>>>>>> doesn't sound good.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Lijo
>>>>>>>>>>>
>>>>>>>>>>>> Andrey
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'd put the reset work struct into the reset_domain
>>>>>>>>>>>>> struct. That way you'd have exactly one worker for the
>>>>>>>>>>>>> reset domain. You could implement a lock-less scheme to
>>>>>>>>>>>>> decide whether you need to schedule a reset, e.g. using an
>>>>>>>>>>>>> atomic counter in the shared work struct that gets
>>>>>>>>>>>>> incremented when a client wants to trigger a reset
>>>>>>>>>>>>> (atomic_add_return). If that counter is exactly 1 after
>>>>>>>>>>>>> incrementing, you need to fill in the rest of the work
>>>>>>>>>>>>> struct and schedule the work. Otherwise, it's already
>>>>>>>>>>>>> scheduled (or another client is in the process of
>>>>>>>>>>>>> scheduling it) and you just return. When the worker
>>>>>>>>>>>>> finishes (after confirming a successful reset), it resets
>>>>>>>>>>>>> the counter to 0, so the next client requesting a reset
>>>>>>>>>>>>> will schedule the worker again.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Felix
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Additional to that keep in mind that you can't allocate
>>>>>>>>>>>>>>> any memory before or during the GPU reset nor wait for
>>>>>>>>>>>>>>> the reset to complete (so you can't allocate anything on
>>>>>>>>>>>>>>> the stack either).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> There is no dynamic allocation here, regarding stack
>>>>>>>>>>>>>> allocations - we do it all the time when we call
>>>>>>>>>>>>>> functions, even during GPU resets, how on stack
>>>>>>>>>>>>>> allocation of work struct in amdgpu_device_gpu_recover is
>>>>>>>>>>>>>> different from any other local variable we allocate in
>>>>>>>>>>>>>> any function we call ?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am also not sure why it's not allowed to wait for reset
>>>>>>>>>>>>>> to complete ? Also, see in amdgpu_ras_do_recovery and
>>>>>>>>>>>>>> gpu_recover_get (debugfs) - the caller expects the reset
>>>>>>>>>>>>>> to complete before he returns. I can probably work around
>>>>>>>>>>>>>> it in RAS code by calling atomic_set(&ras->in_recovery,
>>>>>>>>>>>>>> 0) from some callback within actual reset function but
>>>>>>>>>>>>>> regarding sysfs it actually expects a result returned
>>>>>>>>>>>>>> indicating whether the call was successful or not.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I don't think that concept you try here will work.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Also in general seems to me it's cleaner approach where
>>>>>>>>>>>>>>>> this logic (the work items) are held and handled in
>>>>>>>>>>>>>>>> reset_domain and are not split in each adev or any
>>>>>>>>>>>>>>>> other entity. We might want in the future to even move
>>>>>>>>>>>>>>>> the scheduler handling into reset domain since reset
>>>>>>>>>>>>>>>> domain is supposed to be a generic things and not only
>>>>>>>>>>>>>>>> or AMD.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Signed-off-by: Andrey Grodzovsky
>>>>>>>>>>>>>>>>>>>> <andrey.grodzovsky at amd.com>
>>>>>>>>>>>>>>>>>>>> Tested-by: Bai Zoy <Zoy.Bai at amd.com>
>>>>>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 11 +---
>>>>>>>>>>>>>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 17 +++--
>>>>>>>>>>>>>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 3 +
>>>>>>>>>>>>>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 73
>>>>>>>>>>>>>>>>>>>> +++++++++++++++++++++-
>>>>>>>>>>>>>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h | 3 +-
>>>>>>>>>>>>>>>>>>>> drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 7 ++-
>>>>>>>>>>>>>>>>>>>> drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 7 ++-
>>>>>>>>>>>>>>>>>>>> drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c | 7 ++-
>>>>>>>>>>>>>>>>>>>> 8 files changed, 104 insertions(+), 24 deletions(-)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>>>>>>>>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>>>>>>>>>>>>>>>>> index 4264abc5604d..99efd8317547 100644
>>>>>>>>>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>>>>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>>>>>>>>>>>>>>>>> @@ -109,6 +109,7 @@
>>>>>>>>>>>>>>>>>>>> #include "amdgpu_fdinfo.h"
>>>>>>>>>>>>>>>>>>>> #include "amdgpu_mca.h"
>>>>>>>>>>>>>>>>>>>> #include "amdgpu_ras.h"
>>>>>>>>>>>>>>>>>>>> +#include "amdgpu_reset.h"
>>>>>>>>>>>>>>>>>>>> #define MAX_GPU_INSTANCE 16
>>>>>>>>>>>>>>>>>>>> @@ -509,16 +510,6 @@ struct
>>>>>>>>>>>>>>>>>>>> amdgpu_allowed_register_entry {
>>>>>>>>>>>>>>>>>>>> bool grbm_indexed;
>>>>>>>>>>>>>>>>>>>> };
>>>>>>>>>>>>>>>>>>>> -enum amd_reset_method {
>>>>>>>>>>>>>>>>>>>> - AMD_RESET_METHOD_NONE = -1,
>>>>>>>>>>>>>>>>>>>> - AMD_RESET_METHOD_LEGACY = 0,
>>>>>>>>>>>>>>>>>>>> - AMD_RESET_METHOD_MODE0,
>>>>>>>>>>>>>>>>>>>> - AMD_RESET_METHOD_MODE1,
>>>>>>>>>>>>>>>>>>>> - AMD_RESET_METHOD_MODE2,
>>>>>>>>>>>>>>>>>>>> - AMD_RESET_METHOD_BACO,
>>>>>>>>>>>>>>>>>>>> - AMD_RESET_METHOD_PCI,
>>>>>>>>>>>>>>>>>>>> -};
>>>>>>>>>>>>>>>>>>>> -
>>>>>>>>>>>>>>>>>>>> struct amdgpu_video_codec_info {
>>>>>>>>>>>>>>>>>>>> u32 codec_type;
>>>>>>>>>>>>>>>>>>>> u32 max_width;
>>>>>>>>>>>>>>>>>>>> diff --git
>>>>>>>>>>>>>>>>>>>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>>>>>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>>>>>>>>>>>>>> index e582f1044c0f..7fa82269c30f 100644
>>>>>>>>>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>>>>>>>>>>>>>> @@ -5201,6 +5201,12 @@ int
>>>>>>>>>>>>>>>>>>>> amdgpu_device_gpu_recover_imp(struct amdgpu_device
>>>>>>>>>>>>>>>>>>>> *adev,
>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>> tmp_vram_lost_counter =
>>>>>>>>>>>>>>>>>>>> atomic_read(&((adev)->vram_lost_counter));
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> + /* Drop all pending resets since we will reset
>>>>>>>>>>>>>>>>>>>> now anyway */
>>>>>>>>>>>>>>>>>>>> + tmp_adev =
>>>>>>>>>>>>>>>>>>>> list_first_entry(device_list_handle, struct
>>>>>>>>>>>>>>>>>>>> amdgpu_device,
>>>>>>>>>>>>>>>>>>>> + reset_list);
>>>>>>>>>>>>>>>>>>>> + amdgpu_reset_pending_list(tmp_adev->reset_domain);
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> /* Actual ASIC resets if needed.*/
>>>>>>>>>>>>>>>>>>>> /* Host driver will handle XGMI hive reset
>>>>>>>>>>>>>>>>>>>> for SRIOV */
>>>>>>>>>>>>>>>>>>>> if (amdgpu_sriov_vf(adev)) {
>>>>>>>>>>>>>>>>>>>> @@ -5296,7 +5302,7 @@ int
>>>>>>>>>>>>>>>>>>>> amdgpu_device_gpu_recover_imp(struct amdgpu_device
>>>>>>>>>>>>>>>>>>>> *adev,
>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>> struct amdgpu_recover_work_struct {
>>>>>>>>>>>>>>>>>>>> - struct work_struct base;
>>>>>>>>>>>>>>>>>>>> + struct amdgpu_reset_work_struct base;
>>>>>>>>>>>>>>>>>>>> struct amdgpu_device *adev;
>>>>>>>>>>>>>>>>>>>> struct amdgpu_job *job;
>>>>>>>>>>>>>>>>>>>> int ret;
>>>>>>>>>>>>>>>>>>>> @@ -5304,7 +5310,7 @@ struct
>>>>>>>>>>>>>>>>>>>> amdgpu_recover_work_struct {
>>>>>>>>>>>>>>>>>>>> static void
>>>>>>>>>>>>>>>>>>>> amdgpu_device_queue_gpu_recover_work(struct
>>>>>>>>>>>>>>>>>>>> work_struct *work)
>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>> - struct amdgpu_recover_work_struct
>>>>>>>>>>>>>>>>>>>> *recover_work = container_of(work, struct
>>>>>>>>>>>>>>>>>>>> amdgpu_recover_work_struct, base);
>>>>>>>>>>>>>>>>>>>> + struct amdgpu_recover_work_struct
>>>>>>>>>>>>>>>>>>>> *recover_work = container_of(work, struct
>>>>>>>>>>>>>>>>>>>> amdgpu_recover_work_struct, base.base.work);
>>>>>>>>>>>>>>>>>>>> recover_work->ret =
>>>>>>>>>>>>>>>>>>>> amdgpu_device_gpu_recover_imp(recover_work->adev,
>>>>>>>>>>>>>>>>>>>> recover_work->job);
>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>> @@ -5316,12 +5322,15 @@ int
>>>>>>>>>>>>>>>>>>>> amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>> struct amdgpu_recover_work_struct work =
>>>>>>>>>>>>>>>>>>>> {.adev = adev, .job = job};
>>>>>>>>>>>>>>>>>>>> - INIT_WORK(&work.base,
>>>>>>>>>>>>>>>>>>>> amdgpu_device_queue_gpu_recover_work);
>>>>>>>>>>>>>>>>>>>> + INIT_DELAYED_WORK(&work.base.base,
>>>>>>>>>>>>>>>>>>>> amdgpu_device_queue_gpu_recover_work);
>>>>>>>>>>>>>>>>>>>> + INIT_LIST_HEAD(&work.base.node);
>>>>>>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>>>>>> (!amdgpu_reset_domain_schedule(adev->reset_domain,
>>>>>>>>>>>>>>>>>>>> &work.base))
>>>>>>>>>>>>>>>>>>>> return -EAGAIN;
>>>>>>>>>>>>>>>>>>>> - flush_work(&work.base);
>>>>>>>>>>>>>>>>>>>> + flush_delayed_work(&work.base.base);
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> amdgpu_reset_domain_del_pendning_work(adev->reset_domain,
>>>>>>>>>>>>>>>>>>>> &work.base);
>>>>>>>>>>>>>>>>>>>> return work.ret;
>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>> diff --git
>>>>>>>>>>>>>>>>>>>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
>>>>>>>>>>>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
>>>>>>>>>>>>>>>>>>>> index c80af0889773..ffddd419c351 100644
>>>>>>>>>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
>>>>>>>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
>>>>>>>>>>>>>>>>>>>> @@ -134,6 +134,9 @@ struct amdgpu_reset_domain
>>>>>>>>>>>>>>>>>>>> *amdgpu_reset_create_reset_domain(enum amdgpu_reset_d
>>>>>>>>>>>>>>>>>>>> atomic_set(&reset_domain->in_gpu_reset, 0);
>>>>>>>>>>>>>>>>>>>> init_rwsem(&reset_domain->sem);
>>>>>>>>>>>>>>>>>>>> + INIT_LIST_HEAD(&reset_domain->pending_works);
>>>>>>>>>>>>>>>>>>>> + mutex_init(&reset_domain->reset_lock);
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> return reset_domain;
>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>> diff --git
>>>>>>>>>>>>>>>>>>>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
>>>>>>>>>>>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
>>>>>>>>>>>>>>>>>>>> index 1949dbe28a86..863ec5720fc1 100644
>>>>>>>>>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
>>>>>>>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
>>>>>>>>>>>>>>>>>>>> @@ -24,7 +24,18 @@
>>>>>>>>>>>>>>>>>>>> #ifndef __AMDGPU_RESET_H__
>>>>>>>>>>>>>>>>>>>> #define __AMDGPU_RESET_H__
>>>>>>>>>>>>>>>>>>>> -#include "amdgpu.h"
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +#include <linux/atomic.h>
>>>>>>>>>>>>>>>>>>>> +#include <linux/mutex.h>
>>>>>>>>>>>>>>>>>>>> +#include <linux/list.h>
>>>>>>>>>>>>>>>>>>>> +#include <linux/kref.h>
>>>>>>>>>>>>>>>>>>>> +#include <linux/rwsem.h>
>>>>>>>>>>>>>>>>>>>> +#include <linux/workqueue.h>
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +struct amdgpu_device;
>>>>>>>>>>>>>>>>>>>> +struct amdgpu_job;
>>>>>>>>>>>>>>>>>>>> +struct amdgpu_hive_info;
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> enum AMDGPU_RESET_FLAGS {
>>>>>>>>>>>>>>>>>>>> @@ -32,6 +43,17 @@ enum AMDGPU_RESET_FLAGS {
>>>>>>>>>>>>>>>>>>>> AMDGPU_SKIP_HW_RESET = 1,
>>>>>>>>>>>>>>>>>>>> };
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +enum amd_reset_method {
>>>>>>>>>>>>>>>>>>>> + AMD_RESET_METHOD_NONE = -1,
>>>>>>>>>>>>>>>>>>>> + AMD_RESET_METHOD_LEGACY = 0,
>>>>>>>>>>>>>>>>>>>> + AMD_RESET_METHOD_MODE0,
>>>>>>>>>>>>>>>>>>>> + AMD_RESET_METHOD_MODE1,
>>>>>>>>>>>>>>>>>>>> + AMD_RESET_METHOD_MODE2,
>>>>>>>>>>>>>>>>>>>> + AMD_RESET_METHOD_BACO,
>>>>>>>>>>>>>>>>>>>> + AMD_RESET_METHOD_PCI,
>>>>>>>>>>>>>>>>>>>> +};
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> struct amdgpu_reset_context {
>>>>>>>>>>>>>>>>>>>> enum amd_reset_method method;
>>>>>>>>>>>>>>>>>>>> struct amdgpu_device *reset_req_dev;
>>>>>>>>>>>>>>>>>>>> @@ -40,6 +62,8 @@ struct amdgpu_reset_context {
>>>>>>>>>>>>>>>>>>>> unsigned long flags;
>>>>>>>>>>>>>>>>>>>> };
>>>>>>>>>>>>>>>>>>>> +struct amdgpu_reset_control;
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> struct amdgpu_reset_handler {
>>>>>>>>>>>>>>>>>>>> enum amd_reset_method reset_method;
>>>>>>>>>>>>>>>>>>>> struct list_head handler_list;
>>>>>>>>>>>>>>>>>>>> @@ -76,12 +100,21 @@ enum amdgpu_reset_domain_type {
>>>>>>>>>>>>>>>>>>>> XGMI_HIVE
>>>>>>>>>>>>>>>>>>>> };
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +struct amdgpu_reset_work_struct {
>>>>>>>>>>>>>>>>>>>> + struct delayed_work base;
>>>>>>>>>>>>>>>>>>>> + struct list_head node;
>>>>>>>>>>>>>>>>>>>> +};
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> struct amdgpu_reset_domain {
>>>>>>>>>>>>>>>>>>>> struct kref refcount;
>>>>>>>>>>>>>>>>>>>> struct workqueue_struct *wq;
>>>>>>>>>>>>>>>>>>>> enum amdgpu_reset_domain_type type;
>>>>>>>>>>>>>>>>>>>> struct rw_semaphore sem;
>>>>>>>>>>>>>>>>>>>> atomic_t in_gpu_reset;
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> + struct list_head pending_works;
>>>>>>>>>>>>>>>>>>>> + struct mutex reset_lock;
>>>>>>>>>>>>>>>>>>>> };
>>>>>>>>>>>>>>>>>>>> @@ -113,9 +146,43 @@ static inline void
>>>>>>>>>>>>>>>>>>>> amdgpu_reset_put_reset_domain(struct
>>>>>>>>>>>>>>>>>>>> amdgpu_reset_domain *dom
>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>> static inline bool
>>>>>>>>>>>>>>>>>>>> amdgpu_reset_domain_schedule(struct
>>>>>>>>>>>>>>>>>>>> amdgpu_reset_domain *domain,
>>>>>>>>>>>>>>>>>>>> - struct work_struct *work)
>>>>>>>>>>>>>>>>>>>> + struct amdgpu_reset_work_struct *work)
>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>> - return queue_work(domain->wq, work);
>>>>>>>>>>>>>>>>>>>> + mutex_lock(&domain->reset_lock);
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> + if (!queue_delayed_work(domain->wq,
>>>>>>>>>>>>>>>>>>>> &work->base, 0)) {
>>>>>>>>>>>>>>>>>>>> + mutex_unlock(&domain->reset_lock);
>>>>>>>>>>>>>>>>>>>> + return false;
>>>>>>>>>>>>>>>>>>>> + }
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> + list_add_tail(&work->node, &domain->pending_works);
>>>>>>>>>>>>>>>>>>>> + mutex_unlock(&domain->reset_lock);
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> + return true;
>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +static inline void
>>>>>>>>>>>>>>>>>>>> amdgpu_reset_domain_del_pendning_work(struct
>>>>>>>>>>>>>>>>>>>> amdgpu_reset_domain *domain,
>>>>>>>>>>>>>>>>>>>> + struct amdgpu_reset_work_struct
>>>>>>>>>>>>>>>>>>>> *work)
>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>> + mutex_lock(&domain->reset_lock);
>>>>>>>>>>>>>>>>>>>> + list_del_init(&work->node);
>>>>>>>>>>>>>>>>>>>> + mutex_unlock(&domain->reset_lock);
>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +static inline void
>>>>>>>>>>>>>>>>>>>> amdgpu_reset_pending_list(struct
>>>>>>>>>>>>>>>>>>>> amdgpu_reset_domain *domain)
>>>>>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>>>>>> + struct amdgpu_reset_work_struct *entry, *tmp;
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> + mutex_lock(&domain->reset_lock);
>>>>>>>>>>>>>>>>>>>> + list_for_each_entry_safe(entry, tmp,
>>>>>>>>>>>>>>>>>>>> &domain->pending_works, node) {
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> + list_del_init(&entry->node);
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> + /* Stop any other related pending resets */
>>>>>>>>>>>>>>>>>>>> + cancel_delayed_work(&entry->base);
>>>>>>>>>>>>>>>>>>>> + }
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> + mutex_unlock(&domain->reset_lock);
>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>> void amdgpu_device_lock_reset_domain(struct
>>>>>>>>>>>>>>>>>>>> amdgpu_reset_domain *reset_domain);
>>>>>>>>>>>>>>>>>>>> diff --git
>>>>>>>>>>>>>>>>>>>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>>>>>>>>>>>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>>>>>>>>>>>>>>>>>>>> index 239f232f9c02..574e870d3064 100644
>>>>>>>>>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>>>>>>>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>>>>>>>>>>>>>>>>>>>> @@ -25,6 +25,7 @@
>>>>>>>>>>>>>>>>>>>> #define AMDGPU_VIRT_H
>>>>>>>>>>>>>>>>>>>> #include "amdgv_sriovmsg.h"
>>>>>>>>>>>>>>>>>>>> +#include "amdgpu_reset.h"
>>>>>>>>>>>>>>>>>>>> #define AMDGPU_SRIOV_CAPS_SRIOV_VBIOS (1 << 0)
>>>>>>>>>>>>>>>>>>>> /* vBIOS is sr-iov ready */
>>>>>>>>>>>>>>>>>>>> #define AMDGPU_SRIOV_CAPS_ENABLE_IOV (1 << 1) /*
>>>>>>>>>>>>>>>>>>>> sr-iov is enabled on this GPU */
>>>>>>>>>>>>>>>>>>>> @@ -230,7 +231,7 @@ struct amdgpu_virt {
>>>>>>>>>>>>>>>>>>>> uint32_t reg_val_offs;
>>>>>>>>>>>>>>>>>>>> struct amdgpu_irq_src ack_irq;
>>>>>>>>>>>>>>>>>>>> struct amdgpu_irq_src rcv_irq;
>>>>>>>>>>>>>>>>>>>> - struct work_struct flr_work;
>>>>>>>>>>>>>>>>>>>> + struct amdgpu_reset_work_struct flr_work;
>>>>>>>>>>>>>>>>>>>> struct amdgpu_mm_table mm_table;
>>>>>>>>>>>>>>>>>>>> const struct amdgpu_virt_ops *ops;
>>>>>>>>>>>>>>>>>>>> struct amdgpu_vf_error_buffer vf_errors;
>>>>>>>>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>>>>>>>>>>>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>>>>>>>>>>>>>>>>>>>> index b81acf59870c..f3d1c2be9292 100644
>>>>>>>>>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>>>>>>>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>>>>>>>>>>>>>>>>>>>> @@ -251,7 +251,7 @@ static int
>>>>>>>>>>>>>>>>>>>> xgpu_ai_set_mailbox_ack_irq(struct amdgpu_device
>>>>>>>>>>>>>>>>>>>> *adev,
>>>>>>>>>>>>>>>>>>>> static void xgpu_ai_mailbox_flr_work(struct
>>>>>>>>>>>>>>>>>>>> work_struct *work)
>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>> - struct amdgpu_virt *virt = container_of(work,
>>>>>>>>>>>>>>>>>>>> struct amdgpu_virt, flr_work);
>>>>>>>>>>>>>>>>>>>> + struct amdgpu_virt *virt = container_of(work,
>>>>>>>>>>>>>>>>>>>> struct amdgpu_virt, flr_work.base.work);
>>>>>>>>>>>>>>>>>>>> struct amdgpu_device *adev =
>>>>>>>>>>>>>>>>>>>> container_of(virt, struct amdgpu_device, virt);
>>>>>>>>>>>>>>>>>>>> int timeout = AI_MAILBOX_POLL_FLR_TIMEDOUT;
>>>>>>>>>>>>>>>>>>>> @@ -380,7 +380,8 @@ int
>>>>>>>>>>>>>>>>>>>> xgpu_ai_mailbox_get_irq(struct amdgpu_device *adev)
>>>>>>>>>>>>>>>>>>>> return r;
>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>> - INIT_WORK(&adev->virt.flr_work,
>>>>>>>>>>>>>>>>>>>> xgpu_ai_mailbox_flr_work);
>>>>>>>>>>>>>>>>>>>> + INIT_DELAYED_WORK(&adev->virt.flr_work.base,
>>>>>>>>>>>>>>>>>>>> xgpu_ai_mailbox_flr_work);
>>>>>>>>>>>>>>>>>>>> + INIT_LIST_HEAD(&adev->virt.flr_work.node);
>>>>>>>>>>>>>>>>>>>> return 0;
>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>> @@ -389,6 +390,8 @@ void
>>>>>>>>>>>>>>>>>>>> xgpu_ai_mailbox_put_irq(struct amdgpu_device *adev)
>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>> amdgpu_irq_put(adev, &adev->virt.ack_irq, 0);
>>>>>>>>>>>>>>>>>>>> amdgpu_irq_put(adev, &adev->virt.rcv_irq, 0);
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> amdgpu_reset_domain_del_pendning_work(adev->reset_domain,
>>>>>>>>>>>>>>>>>>>> &adev->virt.flr_work);
>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>> static int xgpu_ai_request_init_data(struct
>>>>>>>>>>>>>>>>>>>> amdgpu_device *adev)
>>>>>>>>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>>>>>>>>>>>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>>>>>>>>>>>>>>>>>>>> index 22c10b97ea81..927b3d5bb1d0 100644
>>>>>>>>>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>>>>>>>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>>>>>>>>>>>>>>>>>>>> @@ -275,7 +275,7 @@ static int
>>>>>>>>>>>>>>>>>>>> xgpu_nv_set_mailbox_ack_irq(struct amdgpu_device
>>>>>>>>>>>>>>>>>>>> *adev,
>>>>>>>>>>>>>>>>>>>> static void xgpu_nv_mailbox_flr_work(struct
>>>>>>>>>>>>>>>>>>>> work_struct *work)
>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>> - struct amdgpu_virt *virt = container_of(work,
>>>>>>>>>>>>>>>>>>>> struct amdgpu_virt, flr_work);
>>>>>>>>>>>>>>>>>>>> + struct amdgpu_virt *virt = container_of(work,
>>>>>>>>>>>>>>>>>>>> struct amdgpu_virt, flr_work.base.work);
>>>>>>>>>>>>>>>>>>>> struct amdgpu_device *adev =
>>>>>>>>>>>>>>>>>>>> container_of(virt, struct amdgpu_device, virt);
>>>>>>>>>>>>>>>>>>>> int timeout = NV_MAILBOX_POLL_FLR_TIMEDOUT;
>>>>>>>>>>>>>>>>>>>> @@ -407,7 +407,8 @@ int
>>>>>>>>>>>>>>>>>>>> xgpu_nv_mailbox_get_irq(struct amdgpu_device *adev)
>>>>>>>>>>>>>>>>>>>> return r;
>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>> - INIT_WORK(&adev->virt.flr_work,
>>>>>>>>>>>>>>>>>>>> xgpu_nv_mailbox_flr_work);
>>>>>>>>>>>>>>>>>>>> + INIT_DELAYED_WORK(&adev->virt.flr_work.base,
>>>>>>>>>>>>>>>>>>>> xgpu_nv_mailbox_flr_work);
>>>>>>>>>>>>>>>>>>>> + INIT_LIST_HEAD(&adev->virt.flr_work.node);
>>>>>>>>>>>>>>>>>>>> return 0;
>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>> @@ -416,6 +417,8 @@ void
>>>>>>>>>>>>>>>>>>>> xgpu_nv_mailbox_put_irq(struct amdgpu_device *adev)
>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>> amdgpu_irq_put(adev, &adev->virt.ack_irq, 0);
>>>>>>>>>>>>>>>>>>>> amdgpu_irq_put(adev, &adev->virt.rcv_irq, 0);
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> amdgpu_reset_domain_del_pendning_work(adev->reset_domain,
>>>>>>>>>>>>>>>>>>>> &adev->virt.flr_work);
>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>> const struct amdgpu_virt_ops xgpu_nv_virt_ops = {
>>>>>>>>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
>>>>>>>>>>>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
>>>>>>>>>>>>>>>>>>>> index 7b63d30b9b79..1d4ef5c70730 100644
>>>>>>>>>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
>>>>>>>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
>>>>>>>>>>>>>>>>>>>> @@ -512,7 +512,7 @@ static int
>>>>>>>>>>>>>>>>>>>> xgpu_vi_set_mailbox_ack_irq(struct amdgpu_device
>>>>>>>>>>>>>>>>>>>> *adev,
>>>>>>>>>>>>>>>>>>>> static void xgpu_vi_mailbox_flr_work(struct
>>>>>>>>>>>>>>>>>>>> work_struct *work)
>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>> - struct amdgpu_virt *virt = container_of(work,
>>>>>>>>>>>>>>>>>>>> struct amdgpu_virt, flr_work);
>>>>>>>>>>>>>>>>>>>> + struct amdgpu_virt *virt = container_of(work,
>>>>>>>>>>>>>>>>>>>> struct amdgpu_virt, flr_work.base.work);
>>>>>>>>>>>>>>>>>>>> struct amdgpu_device *adev =
>>>>>>>>>>>>>>>>>>>> container_of(virt, struct amdgpu_device, virt);
>>>>>>>>>>>>>>>>>>>> /* wait until RCV_MSG become 3 */
>>>>>>>>>>>>>>>>>>>> @@ -610,7 +610,8 @@ int
>>>>>>>>>>>>>>>>>>>> xgpu_vi_mailbox_get_irq(struct amdgpu_device *adev)
>>>>>>>>>>>>>>>>>>>> return r;
>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>> - INIT_WORK(&adev->virt.flr_work,
>>>>>>>>>>>>>>>>>>>> xgpu_vi_mailbox_flr_work);
>>>>>>>>>>>>>>>>>>>> + INIT_DELAYED_WORK(&adev->virt.flr_work.base,
>>>>>>>>>>>>>>>>>>>> xgpu_vi_mailbox_flr_work);
>>>>>>>>>>>>>>>>>>>> + INIT_LIST_HEAD(&adev->virt.flr_work.node);
>>>>>>>>>>>>>>>>>>>> return 0;
>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>> @@ -619,6 +620,8 @@ void
>>>>>>>>>>>>>>>>>>>> xgpu_vi_mailbox_put_irq(struct amdgpu_device *adev)
>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>> amdgpu_irq_put(adev, &adev->virt.ack_irq, 0);
>>>>>>>>>>>>>>>>>>>> amdgpu_irq_put(adev, &adev->virt.rcv_irq, 0);
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>> amdgpu_reset_domain_del_pendning_work(adev->reset_domain,
>>>>>>>>>>>>>>>>>>>> &adev->virt.flr_work);
>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>> const struct amdgpu_virt_ops xgpu_vi_virt_ops = {
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>
More information about the amd-gfx
mailing list