[PATCH 5/5] drm/amdgpu: Refactor GPU reset for XGMI hive case.

Thu Nov 22 19:56:50 UTC 2018

On 11/22/2018 02:03 PM, Christian König wrote:
> Am 22.11.18 um 16:44 schrieb Grodzovsky, Andrey:
>>
>> On 11/22/2018 06:16 AM, Christian König wrote:
>>> How about using a lock per hive and then acquiring that with trylock()
>>> instead?
>>>
>>> This way you should at least catch cases where multiple causes try to
>>> reset the same hive at the same time. True that there is still some
>>> racing involved, but it's at least a good start.
>> What about using per hive work_struct which when scheduled will execute
>> amdgpu_device_gpu_recover ? Since work queue rejects duplicates we will
>> get per hive serialization automatically from that without need of using
>> any new mutex. Also this might be necessary anyway for RAS as I believe
>> RAS will trigger interrupt when errors are detected and might then
>> decide to reset the GPU so we will have to switch to button half context
>> anyway. And I don't think it's a problem to schedule this work from job
>> timeout handler if needed.
>
> Mhm, what is the advantage to the trylock variant? The key problem 
> with that is how to get the offending job into the background worker.
>
> E.g. we have trouble guaranteeing that the job isn't destroyed when 
> this is in a background worker.

Yea, actually there isn't.

>
>>>
>>>
>>> Additional to that I would try improve the pre, middle, post handling
>>> towards checking if we made some progress in between.
>>>
>>> In other words we stop all schedulers in the pre handling and
>>> disconnect the scheduler fences from the hardware fence like I did in
>>> patch "drm/sched: fix timeout handling v2".
>>>
>>> Then before we do the actual reset in the middle handling we check if
>>> the offending job has completed or at least made some progress in the
>>> meantime.
>> I understand how to check if the job completed - if it's fence already
>> signaled, but how do I test if the job made 'at least some progress' ?
>
> Good question. Maybe we can somehow query from the hardware the number 
> of primitives or pixels processed so far and then compare after a moment?

I will check on this later. In the mean while I will update the code 
with the proposed per hive locking and I will add the check if the 
guilty job completed before ASIC reset skipping the reset if it's did.

Andrey

>
>> Another question - what's the purpose of this progress check - if I've
>> already completed the pre handling sequence  I can't bail out even if
>> the guilty job is is signaled by the time I do the progress check, I
>> have to complete at least the post handling to. Do you mean I can at
>> least skip the ASIC reset phase in that case ?
>
> Yes exactly. We have a rather big race problem in the current reset 
> logic between completing the job and resetting the hardware.
>
> In other words when the job completes exactly in the moment we reset 
> the GPU we try to signal it twice etc etc... The whole handling here 
> is not really thought through.
>
> Christian.
>
>>
>> Andrey
>>
>>> In the case of a manual reset we skip that because we don't have an
>>> offending job to check.
>>>
>>> In the post handling we stitch everything together again and start the
>>> scheduler to go on with job submission.
>>>
>>> Christian.
>>>
>>> Am 21.11.18 um 23:02 schrieb Grodzovsky, Andrey:
>>>> Depends what was the reason for triggering the reset for that node how
>>>> do we know ?
>>>> If the reason was RAS error that probably not hard to check all errors
>>>> are cleared, but
>>>> if the reason was job timeout on that specific node I will need to
>>>> recheck that no jobs are left in incomplete state
>>>> state. And if the reason is manual gpu reset trigger from sysfs, then
>>>> what's the policy ?
>>>> Sounds to me it's just easier to go ahead and allow all the pending
>>>> resets to proceed unless there is
>>>> a clear and quick criteria you can check after you grab the mutex then
>>>> sure - but I don't know what it would be.
>>>>
>>>> Andrey
>>>>
>>>> On 11/21/2018 03:49 PM, Liu, Shaoyun wrote:
>>>>> I saw you use the  global xgmi_mutex to prevent concurrent reset 
>>>>> to be
>>>>> triggered by different nodes ,  but after the  mutex been released ,
>>>>> current node may grap the mutex and continue to do another reset .
>>>>> Maybe  we should check the GPU status and  skip the  reset in this 
>>>>> case
>>>>> since the  GPU may already be in good state .
>>>>>
>>>>> Regards
>>>>>
>>>>> shaoyun.liu
>>>>>
>>>>> On 2018-11-21 1:10 p.m., Andrey Grodzovsky wrote:
>>>>>> For XGMI hive case do reset in steps where each step iterates over
>>>>>> all devs in hive. This especially important for asic reset
>>>>>> since all PSP FW in hive must come up within a limited time
>>>>>> (around 1 sec) to properply negotiate the link.
>>>>>> Do this by  refactoring  amdgpu_device_gpu_recover and
>>>>>> amdgpu_device_reset
>>>>>> into pre_asic_reset, asic_reset and post_asic_reset functions where
>>>>>> is part
>>>>>> is exectued for all the GPUs in the hive before going to the next
>>>>>> step.
>>>>>>
>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>>>>>> ---
>>>>>>      drivers/gpu/drm/amd/amdgpu/amdgpu.h        |   5 +-
>>>>>>      drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 375
>>>>>> ++++++++++++++++++++---------
>>>>>>      2 files changed, 264 insertions(+), 116 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>>> index 4ef5f7a..bd06d45 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>>> @@ -1026,6 +1026,9 @@ struct amdgpu_device {
>>>>>>          unsigned long last_mm_index;
>>>>>>          bool                            in_gpu_reset;
>>>>>>          struct mutex  lock_reset;
>>>>>> +
>>>>>> +    int asic_reset_res;
>>>>>> +    int resched;
>>>>>>      };
>>>>>>          static inline struct amdgpu_device *amdgpu_ttm_adev(struct
>>>>>> ttm_bo_device *bdev)
>>>>>> @@ -1232,7 +1235,7 @@ struct amdgpu_hive_info;
>>>>>>          struct list_head *amdgpu_xgmi_get_adev_list_handle(struct
>>>>>> amdgpu_hive_info *hive);
>>>>>>      struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct
>>>>>> amdgpu_device *adev);
>>>>>> -int amdgpu_xgmi_update_topology(struct amdgpu_hive_info *hive);
>>>>>> +int amdgpu_xgmi_update_topology(struct amdgpu_hive_info *hive,
>>>>>> struct amdgpu_device *adev);
>>>>>>      int amdgpu_xgmi_add_device(struct amdgpu_device *adev);
>>>>>>          /*
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>> index cb06e68..8e94d7f 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>> @@ -3157,86 +3157,6 @@ static int amdgpu_device_recover_vram(struct
>>>>>> amdgpu_device *adev)
>>>>>>          return 0;
>>>>>>      }
>>>>>>      -/**
>>>>>> - * amdgpu_device_reset - reset ASIC/GPU for bare-metal or 
>>>>>> passthrough
>>>>>> - *
>>>>>> - * @adev: amdgpu device pointer
>>>>>> - *
>>>>>> - * attempt to do soft-reset or full-reset and reinitialize Asic
>>>>>> - * return 0 means succeeded otherwise failed
>>>>>> - */
>>>>>> -static int amdgpu_device_reset(struct amdgpu_device *adev)
>>>>>> -{
>>>>>> -    bool need_full_reset, vram_lost = 0;
>>>>>> -    int r;
>>>>>> -
>>>>>> -    need_full_reset = amdgpu_device_ip_need_full_reset(adev);
>>>>>> -
>>>>>> -    if (!need_full_reset) {
>>>>>> -        amdgpu_device_ip_pre_soft_reset(adev);
>>>>>> -        r = amdgpu_device_ip_soft_reset(adev);
>>>>>> -        amdgpu_device_ip_post_soft_reset(adev);
>>>>>> -        if (r || amdgpu_device_ip_check_soft_reset(adev)) {
>>>>>> -            DRM_INFO("soft reset failed, will fallback to full
>>>>>> reset!\n");
>>>>>> -            need_full_reset = true;
>>>>>> -        }
>>>>>> -    }
>>>>>> -
>>>>>> -    if (need_full_reset) {
>>>>>> -        r = amdgpu_device_ip_suspend(adev);
>>>>>> -
>>>>>> -retry:
>>>>>> -        r = amdgpu_asic_reset(adev);
>>>>>> -        /* post card */
>>>>>> - amdgpu_atom_asic_init(adev->mode_info.atom_context);
>>>>>> -
>>>>>> -        if (!r) {
>>>>>> -            dev_info(adev->dev, "GPU reset succeeded, trying to
>>>>>> resume\n");
>>>>>> -            r = amdgpu_device_ip_resume_phase1(adev);
>>>>>> -            if (r)
>>>>>> -                goto out;
>>>>>> -
>>>>>> -            vram_lost = amdgpu_device_check_vram_lost(adev);
>>>>>> -            if (vram_lost) {
>>>>>> -                DRM_ERROR("VRAM is lost!\n");
>>>>>> - atomic_inc(&adev->vram_lost_counter);
>>>>>> -            }
>>>>>> -
>>>>>> -            r = amdgpu_gtt_mgr_recover(
>>>>>> - &adev->mman.bdev.man[TTM_PL_TT]);
>>>>>> -            if (r)
>>>>>> -                goto out;
>>>>>> -
>>>>>> -            r = amdgpu_device_fw_loading(adev);
>>>>>> -            if (r)
>>>>>> -                return r;
>>>>>> -
>>>>>> -            r = amdgpu_device_ip_resume_phase2(adev);
>>>>>> -            if (r)
>>>>>> -                goto out;
>>>>>> -
>>>>>> -            if (vram_lost)
>>>>>> -                amdgpu_device_fill_reset_magic(adev);
>>>>>> -        }
>>>>>> -    }
>>>>>> -
>>>>>> -out:
>>>>>> -    if (!r) {
>>>>>> -        amdgpu_irq_gpu_reset_resume_helper(adev);
>>>>>> -        r = amdgpu_ib_ring_tests(adev);
>>>>>> -        if (r) {
>>>>>> -            dev_err(adev->dev, "ib ring test failed (%d).\n", r);
>>>>>> -            r = amdgpu_device_ip_suspend(adev);
>>>>>> -            need_full_reset = true;
>>>>>> -            goto retry;
>>>>>> -        }
>>>>>> -    }
>>>>>> -
>>>>>> -    if (!r)
>>>>>> -        r = amdgpu_device_recover_vram(adev);
>>>>>> -
>>>>>> -    return r;
>>>>>> -}
>>>>>>          /**
>>>>>>       * amdgpu_device_reset_sriov - reset ASIC for SR-IOV vf
>>>>>> @@ -3335,31 +3255,16 @@ bool
>>>>>> amdgpu_device_should_recover_gpu(struct amdgpu_device *adev)
>>>>>>              return false;
>>>>>>      }
>>>>>>      -/**
>>>>>> - * amdgpu_device_gpu_recover - reset the asic and recover scheduler
>>>>>> - *
>>>>>> - * @adev: amdgpu device pointer
>>>>>> - * @job: which job trigger hang
>>>>>> - *
>>>>>> - * Attempt to reset the GPU if it has hung (all asics).
>>>>>> - * Returns 0 for success or an error on failure.
>>>>>> - */
>>>>>> -int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>>>>>> -                  struct amdgpu_job *job)
>>>>>> -{
>>>>>> -    int i, r, resched;
>>>>>> -
>>>>>> -    dev_info(adev->dev, "GPU reset begin!\n");
>>>>>> -
>>>>>> -    mutex_lock(&adev->lock_reset);
>>>>>> -    atomic_inc(&adev->gpu_reset_counter);
>>>>>> -    adev->in_gpu_reset = 1;
>>>>>>      -    /* Block kfd */
>>>>>> -    amdgpu_amdkfd_pre_reset(adev);
>>>>>> +static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev,
>>>>>> +                    struct amdgpu_job *job,
>>>>>> +                    bool *need_full_reset_arg)
>>>>>> +{
>>>>>> +    int i, r = 0;
>>>>>> +    bool need_full_reset  = *need_full_reset_arg;
>>>>>>              /* block TTM */
>>>>>> -    resched = ttm_bo_lock_delayed_workqueue(&adev->mman.bdev);
>>>>>> +    adev->resched = 
>>>>>> ttm_bo_lock_delayed_workqueue(&adev->mman.bdev);
>>>>>>              /* block all schedulers and reset given job's ring */
>>>>>>          for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
>>>>>> @@ -3379,10 +3284,121 @@ int amdgpu_device_gpu_recover(struct
>>>>>> amdgpu_device *adev,
>>>>>>              amdgpu_fence_driver_force_completion(ring);
>>>>>>          }
>>>>>>      -    if (amdgpu_sriov_vf(adev))
>>>>>> -        r = amdgpu_device_reset_sriov(adev, job ? false : true);
>>>>>> -    else
>>>>>> -        r = amdgpu_device_reset(adev);
>>>>>> +    if (!amdgpu_sriov_vf(adev)) {
>>>>>> +
>>>>>> +        if (!need_full_reset)
>>>>>> +            need_full_reset = 
>>>>>> amdgpu_device_ip_need_full_reset(adev);
>>>>>> +
>>>>>> +        if (!need_full_reset) {
>>>>>> +            amdgpu_device_ip_pre_soft_reset(adev);
>>>>>> +            r = amdgpu_device_ip_soft_reset(adev);
>>>>>> +            amdgpu_device_ip_post_soft_reset(adev);
>>>>>> +            if (r || amdgpu_device_ip_check_soft_reset(adev)) {
>>>>>> +                DRM_INFO("soft reset failed, will fallback to full
>>>>>> reset!\n");
>>>>>> +                need_full_reset = true;
>>>>>> +            }
>>>>>> +        }
>>>>>> +
>>>>>> +        if (need_full_reset)
>>>>>> +            r = amdgpu_device_ip_suspend(adev);
>>>>>> +
>>>>>> +        *need_full_reset_arg = need_full_reset;
>>>>>> +    }
>>>>>> +
>>>>>> +    return r;
>>>>>> +}
>>>>>> +
>>>>>> +static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive,
>>>>>> +                   struct list_head *device_list_handle,
>>>>>> +                   bool *need_full_reset_arg)
>>>>>> +{
>>>>>> +    struct amdgpu_device *tmp_adev = NULL;
>>>>>> +    bool need_full_reset = *need_full_reset_arg, vram_lost = false;
>>>>>> +    int r = 0;
>>>>>> +
>>>>>> +    /*
>>>>>> +     * ASIC reset has to be done on all HGMI hive nodes ASAP
>>>>>> +     * to allow proper links negotiation in FW (within 1 sec)
>>>>>> +     */
>>>>>> +    if (need_full_reset) {
>>>>>> +        list_for_each_entry(tmp_adev, device_list_handle,
>>>>>> gmc.xgmi.head) {
>>>>>> +            r = amdgpu_asic_reset(tmp_adev);
>>>>>> +            if (r)
>>>>>> +                DRM_WARN("ASIC reset failed with err r, %d for drm
>>>>>> dev, %s",
>>>>>> +                     r, tmp_adev->ddev->unique);
>>>>>> +        }
>>>>>> +    }
>>>>>> +
>>>>>> +
>>>>>> +    list_for_each_entry(tmp_adev, device_list_handle,
>>>>>> gmc.xgmi.head) {
>>>>>> +        if (need_full_reset) {
>>>>>> +            /* post card */
>>>>>> +            if
>>>>>> (amdgpu_atom_asic_init(tmp_adev->mode_info.atom_context))
>>>>>> +                DRM_WARN("asic atom init failed!");
>>>>>> +
>>>>>> +            if (!r) {
>>>>>> +                dev_info(tmp_adev->dev, "GPU reset succeeded,
>>>>>> trying to resume\n");
>>>>>> +                r = amdgpu_device_ip_resume_phase1(tmp_adev);
>>>>>> +                if (r)
>>>>>> +                    goto out;
>>>>>> +
>>>>>> +                vram_lost = 
>>>>>> amdgpu_device_check_vram_lost(tmp_adev);
>>>>>> +                if (vram_lost) {
>>>>>> +                    DRM_ERROR("VRAM is lost!\n");
>>>>>> + atomic_inc(&tmp_adev->vram_lost_counter);
>>>>>> +                }
>>>>>> +
>>>>>> +                r = amdgpu_gtt_mgr_recover(
>>>>>> + &tmp_adev->mman.bdev.man[TTM_PL_TT]);
>>>>>> +                if (r)
>>>>>> +                    goto out;
>>>>>> +
>>>>>> +                r = amdgpu_device_fw_loading(tmp_adev);
>>>>>> +                if (r)
>>>>>> +                    return r;
>>>>>> +
>>>>>> +                r = amdgpu_device_ip_resume_phase2(tmp_adev);
>>>>>> +                if (r)
>>>>>> +                    goto out;
>>>>>> +
>>>>>> +                if (vram_lost)
>>>>>> + amdgpu_device_fill_reset_magic(tmp_adev);
>>>>>> +
>>>>>> +                /* Update PSP FW topology after reset */
>>>>>> +                if (tmp_adev->gmc.xgmi.num_physical_nodes > 1)
>>>>>> +                    r = amdgpu_xgmi_update_topology(hive, 
>>>>>> tmp_adev);
>>>>>> +            }
>>>>>> +        }
>>>>>> +
>>>>>> +
>>>>>> +out:
>>>>>> +        if (!r) {
>>>>>> + amdgpu_irq_gpu_reset_resume_helper(tmp_adev);
>>>>>> +            r = amdgpu_ib_ring_tests(tmp_adev);
>>>>>> +            if (r) {
>>>>>> +                dev_err(tmp_adev->dev, "ib ring test failed
>>>>>> (%d).\n", r);
>>>>>> +                r = amdgpu_device_ip_suspend(tmp_adev);
>>>>>> +                need_full_reset = true;
>>>>>> +                r = -EAGAIN;
>>>>>> +                goto end;
>>>>>> +            }
>>>>>> +        }
>>>>>> +
>>>>>> +        if (!r)
>>>>>> +            r = amdgpu_device_recover_vram(tmp_adev);
>>>>>> +        else
>>>>>> +            tmp_adev->asic_reset_res = r;
>>>>>> +    }
>>>>>> +
>>>>>> +end:
>>>>>> +    *need_full_reset_arg = need_full_reset;
>>>>>> +    return r;
>>>>>> +}
>>>>>> +
>>>>>> +static void amdgpu_device_post_asic_reset(struct amdgpu_device 
>>>>>> *adev,
>>>>>> +                      struct amdgpu_job *job)
>>>>>> +{
>>>>>> +    int i;
>>>>>>              for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
>>>>>>              struct amdgpu_ring *ring = adev->rings[i];
>>>>>> @@ -3394,7 +3410,7 @@ int amdgpu_device_gpu_recover(struct
>>>>>> amdgpu_device *adev,
>>>>>>               * or all rings (in the case @job is NULL)
>>>>>>               * after above amdgpu_reset accomplished
>>>>>>               */
>>>>>> -        if ((!job || job->base.sched == &ring->sched) && !r)
>>>>>> +        if ((!job || job->base.sched == &ring->sched) &&
>>>>>> !adev->asic_reset_res)
>>>>>> drm_sched_job_recovery(&ring->sched);
>>>>>>                  kthread_unpark(ring->sched.thread);
>>>>>> @@ -3404,21 +3420,150 @@ int amdgpu_device_gpu_recover(struct
>>>>>> amdgpu_device *adev,
>>>>>> drm_helper_resume_force_mode(adev->ddev);
>>>>>>          }
>>>>>>      - ttm_bo_unlock_delayed_workqueue(&adev->mman.bdev, resched);
>>>>>> + ttm_bo_unlock_delayed_workqueue(&adev->mman.bdev, adev->resched);
>>>>>>      -    if (r) {
>>>>>> -        /* bad news, how to tell it to userspace ? */
>>>>>> -        dev_info(adev->dev, "GPU reset(%d) failed\n",
>>>>>> atomic_read(&adev->gpu_reset_counter));
>>>>>> -        amdgpu_vf_error_put(adev, AMDGIM_ERROR_VF_GPU_RESET_FAIL,
>>>>>> 0, r);
>>>>>> -    } else {
>>>>>> -        dev_info(adev->dev, "GPU reset(%d)
>>>>>> succeeded!\n",atomic_read(&adev->gpu_reset_counter));
>>>>>> -    }
>>>>>> +    adev->asic_reset_res = adev->resched = 0;
>>>>>> +
>>>>>> +}
>>>>>>      +static void amdgpu_lock_adev(struct amdgpu_device *adev)
>>>>>> +{
>>>>>> +    mutex_lock(&adev->lock_reset);
>>>>>> +    atomic_inc(&adev->gpu_reset_counter);
>>>>>> +    adev->in_gpu_reset = 1;
>>>>>> +    /* Block kfd */
>>>>>> +    amdgpu_amdkfd_pre_reset(adev);
>>>>>> +}
>>>>>> +
>>>>>> +static void amdgpu_unlock_adev(struct amdgpu_device *adev)
>>>>>> +{
>>>>>>          /*unlock kfd */
>>>>>>          amdgpu_amdkfd_post_reset(adev);
>>>>>>          amdgpu_vf_error_trans_all(adev);
>>>>>>          adev->in_gpu_reset = 0;
>>>>>>          mutex_unlock(&adev->lock_reset);
>>>>>> +}
>>>>>> +
>>>>>> +
>>>>>> +/**
>>>>>> + * amdgpu_device_gpu_recover - reset the asic and recover scheduler
>>>>>> + *
>>>>>> + * @adev: amdgpu device pointer
>>>>>> + * @job: which job trigger hang
>>>>>> + *
>>>>>> + * Attempt to reset the GPU if it has hung (all asics).
>>>>>> + * Attempt to do soft-reset or full-reset and reinitialize Asic
>>>>>> + * Returns 0 for success or an error on failure.
>>>>>> + */
>>>>>> +
>>>>>> +int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>>>>>> +                  struct amdgpu_job *job)
>>>>>> +{
>>>>>> +    int r;
>>>>>> +    struct amdgpu_hive_info *hive = NULL;
>>>>>> +    bool need_full_reset = false;
>>>>>> +    struct amdgpu_device *tmp_adev = NULL;
>>>>>> +    struct list_head device_list, *device_list_handle = NULL;
>>>>>> +
>>>>>> +    INIT_LIST_HEAD(&device_list);
>>>>>> +
>>>>>> +    dev_info(adev->dev, "GPU reset begin!\n");
>>>>>> +
>>>>>> +    /*
>>>>>> +     * In case of XGMI hive disallow concurrent resets to be
>>>>>> triggered
>>>>>> +     * by different nodes.
>>>>>> +     */
>>>>>> +    if (adev->gmc.xgmi.num_physical_nodes > 1)
>>>>>> +        mutex_lock(&xgmi_mutex);
>>>>>> +
>>>>>> +    /* Start with adev pre asic reset first for soft reset check.*/
>>>>>> +    amdgpu_lock_adev(adev);
>>>>>> +    r = amdgpu_device_pre_asic_reset(adev,
>>>>>> +                     job,
>>>>>> +                     &need_full_reset);
>>>>>> +    if (r) {
>>>>>> +        /*TODO Should we stop ?*/
>>>>>> +        DRM_ERROR("GPU pre asic reset failed with err, %d for drm
>>>>>> dev, %s ",
>>>>>> +              r, adev->ddev->unique);
>>>>>> +        adev->asic_reset_res = r;
>>>>>> +    }
>>>>>> +
>>>>>> +    /* Build list of devices to reset */
>>>>>> +    if  (need_full_reset && adev->gmc.xgmi.num_physical_nodes > 
>>>>>> 1) {
>>>>>> +        hive = amdgpu_get_xgmi_hive(adev);
>>>>>> +        if (!hive) {
>>>>>> +            r = -ENODEV;
>>>>>> +
>>>>>> +            amdgpu_unlock_adev(adev);
>>>>>> +
>>>>>> +            if (adev->gmc.xgmi.num_physical_nodes > 1)
>>>>>> +                mutex_unlock(&xgmi_mutex);
>>>>>> +            return r;
>>>>>> +        }
>>>>>> +
>>>>>> +        /*
>>>>>> +         * In case we are in XGMI hive mode device reset is done
>>>>>> for all the
>>>>>> +         * nodes in the hive to retrain all XGMI links and hence
>>>>>> the reset
>>>>>> +         * sequence is executed in loop on all nodes.
>>>>>> +         */
>>>>>> +        device_list_handle = 
>>>>>> amdgpu_xgmi_get_adev_list_handle(hive);
>>>>>> +    } else {
>>>>>> +        list_add_tail(&adev->gmc.xgmi.head, &device_list);
>>>>>> +        device_list_handle = &device_list;
>>>>>> +    }
>>>>>> +
>>>>>> +retry:    /* Rest of adevs pre asic reset from XGMI hive. */
>>>>>> +    list_for_each_entry(tmp_adev, device_list_handle,
>>>>>> gmc.xgmi.head) {
>>>>>> +
>>>>>> +        if (tmp_adev == adev)
>>>>>> +            continue;
>>>>>> +
>>>>>> +        dev_info(tmp_adev->dev, "GPU reset begin for drm dev
>>>>>> %s!\n", adev->ddev->unique);
>>>>>> +
>>>>>> +        amdgpu_lock_adev(tmp_adev);
>>>>>> +        r = amdgpu_device_pre_asic_reset(tmp_adev,
>>>>>> +                         NULL,
>>>>>> +                         &need_full_reset);
>>>>>> +        /*TODO Should we stop ?*/
>>>>>> +        if (r) {
>>>>>> +            DRM_ERROR("GPU pre asic reset failed with err, %d for
>>>>>> drm dev, %s ",
>>>>>> +                  r, tmp_adev->ddev->unique);
>>>>>> +            tmp_adev->asic_reset_res = r;
>>>>>> +        }
>>>>>> +    }
>>>>>> +
>>>>>> +    /* Actual ASIC resets if needed.*/
>>>>>> +    /* TODO Implement XGMI hive reset logic for SRIOV */
>>>>>> +    if (amdgpu_sriov_vf(adev)) {
>>>>>> +        r = amdgpu_device_reset_sriov(adev, job ? false : true);
>>>>>> +        if (r)
>>>>>> +            adev->asic_reset_res = r;
>>>>>> +    } else {
>>>>>> +        r  = amdgpu_do_asic_reset(hive, device_list_handle,
>>>>>> &need_full_reset);
>>>>>> +        if (r && r == -EAGAIN)
>>>>>> +            goto retry;
>>>>>> +    }
>>>>>> +
>>>>>> +    /* Post ASIC reset for all devs .*/
>>>>>> +    list_for_each_entry(tmp_adev, device_list_handle,
>>>>>> gmc.xgmi.head) {
>>>>>> +        amdgpu_device_post_asic_reset(tmp_adev, tmp_adev == adev ?
>>>>>> job : NULL);
>>>>>> +
>>>>>> +        if (r) {
>>>>>> +            /* bad news, how to tell it to userspace ? */
>>>>>> +            dev_info(tmp_adev->dev, "GPU reset(%d) failed\n",
>>>>>> atomic_read(&adev->gpu_reset_counter));
>>>>>> +            amdgpu_vf_error_put(tmp_adev,
>>>>>> AMDGIM_ERROR_VF_GPU_RESET_FAIL, 0, r);
>>>>>> +        } else {
>>>>>> +            dev_info(tmp_adev->dev, "GPU reset(%d) succeeded!\n",
>>>>>> atomic_read(&adev->gpu_reset_counter));
>>>>>> +        }
>>>>>> +
>>>>>> +        amdgpu_unlock_adev(tmp_adev);
>>>>>> +    }
>>>>>> +
>>>>>> +    if (adev->gmc.xgmi.num_physical_nodes > 1)
>>>>>> +        mutex_unlock(&xgmi_mutex);
>>>>>> +
>>>>>> +    if (r)
>>>>>> +        dev_info(adev->dev, "GPU reset end with ret = %d\n", r);
>>>>>>          return r;
>>>>>>      }
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx at lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>