[PATCH 2/2] drm: add tdr support for embeded hw_fence

Fri Jul 23 06:33:02 UTC 2021

Am 22.07.21 um 18:47 schrieb Jingwen Chen:
> On Thu Jul 22, 2021 at 06:24:28PM +0200, Christian König wrote:
>> Am 22.07.21 um 16:45 schrieb Andrey Grodzovsky:
>>> On 2021-07-22 6:45 a.m., Jingwen Chen wrote:
>>>> On Wed Jul 21, 2021 at 12:53:51PM -0400, Andrey Grodzovsky wrote:
>>>>> On 2021-07-20 11:13 p.m., Jingwen Chen wrote:
>>>>>> [Why]
>>>>>> After embeded hw_fence to amdgpu_job, we need to add tdr support
>>>>>> for this feature.
>>>>>>
>>>>>> [How]
>>>>>> 1. Add a resubmit_flag for resubmit jobs.
>>>>>> 2. Clear job fence from RCU and force complete vm flush fences in
>>>>>>       pre_asic_reset
>>>>>> 3. skip dma_fence_get for resubmit jobs and add a dma_fence_put
>>>>>>       for guilty jobs.
>>>>>>
>>>>>> Signed-off-by: Jack Zhang <Jack.Zhang1 at amd.com>
>>>>>> Signed-off-by: Jingwen Chen <Jingwen.Chen2 at amd.com>
>>>>>> ---
>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 12 +++++++++++-
>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 16 +++++++++++-----
>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    |  4 +++-
>>>>>>     drivers/gpu/drm/scheduler/sched_main.c     |  1 +
>>>>>>     include/drm/gpu_scheduler.h                |  1 +
>>>>>>     5 files changed, 27 insertions(+), 7 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>> index 40461547701a..fe0237f72a09 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>> @@ -4382,7 +4382,7 @@ int amdgpu_device_mode1_reset(struct
>>>>>> amdgpu_device *adev)
>>>>>>     int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev,
>>>>>>                      struct amdgpu_reset_context *reset_context)
>>>>>>     {
>>>>>> -    int i, r = 0;
>>>>>> +    int i, j, r = 0;
>>>>>>         struct amdgpu_job *job = NULL;
>>>>>>         bool need_full_reset =
>>>>>>             test_bit(AMDGPU_NEED_FULL_RESET, &reset_context->flags);
>>>>>> @@ -4406,6 +4406,16 @@ int
>>>>>> amdgpu_device_pre_asic_reset(struct amdgpu_device *adev,
>>>>>>             if (!ring || !ring->sched.thread)
>>>>>>                 continue;
>>>>>> +        /*clear job fence from fence drv to avoid force_completion
>>>>>> +         *leave NULL and vm flush fence in fence drv */
>>>>>> +        for (j = 0; j <= ring->fence_drv.num_fences_mask; j ++) {
>>>>>> +            struct dma_fence *old,**ptr;
>>>>>> +            ptr = &ring->fence_drv.fences[j];
>>>>>> +            old = rcu_dereference_protected(*ptr, 1);
>>>>>> +            if (old && test_bit(DMA_FENCE_FLAG_USER_BITS,
>>>>>> &old->flags))) {
>>>>>> +                RCU_INIT_POINTER(*ptr, NULL);
>>>>>> +            }
>>>>> Is this to avoid premature job free because of dma_fence_put inside
>>>>> amdgpu_fence_process ?
>>>>> I can't currently remember why but we probably want all the HW fences
>>>>> currently in the ring to
>>>>> be forced signaled - maybe better to test for DMA_FENCE_FLAG_USER_BITS
>>>>> inside amdgpu_fence_process
>>>>> and still do the signaling but not the dma_fence_put part
>>>>>
>>>>> Andrey
>>>> Hi Andrey,
>>>>
>>>> This is to avoid signaling the same fence twice. If we still do the
>>>> signaling, then the job in the pending list will be signaled first in
>>>> force_completion, and later be signaled in resubmit. This will go to
>>>> BUG() in amdgpu_fence_process.
>>>
>>> Oh, i see, how about just adding 'skip' flag to amdgpu_ring and setting
>>> it before calling
>>> amdgpu_fence_driver_force_completion and resetting it after, then inside
>>> amdgpu_fence_driver_force_completion
>>> you can just skip the signaling part with this flag for fences with
>>> DMA_FENCE_FLAG_USER_BITS set
>>> Less lines of code at least.
>> Still sounds quite a bit hacky.
>>
>> I would rather suggest to completely drop the approach with
>> amdgpu_fence_driver_force_completion(). I could never see why we would want
>> that in the first place.
>>
>> Regards,
>> Christian.
>>
> Hi Christian,
>
> I keep the amdgpu_fence_driver_force_completion here to make sure the vm
> flush fence is signaled and put.
> So the key question is whether the fence of ib test should be the first
> fence to be signaled after reset.
> If it should be, it means not only fences with DMA_FENCE_FLAG_USER_BITS
> but also vm flush fences should be removed from RCU fence array before
> ib_test. Then we must do the force_completion here for vm flush
> fence, otherwise there will be a memory leak here as no one will signal
> and put it after we remove it from RCU.
> If not, then the first fence to signal could be vm flush fence. And I'm
> OK with drop amdgpu_fence_driver_force_completion here.

The key problem is that amdgpu_fence_driver_force_completion() goes over 
the RCU array and signals everything there.

What we should do instead is to go over the jobs and cleanup the ones we 
don't want to re-submit and keep the rest.

And please drop any usage of DMA_FENCE_FLAG_USER_BITS. That is not 
something which should be abused for reset handling.

Regards,
Christian.

>
>
> Best Regards,
> JingWen Chen
>>> Andrey
>>>
>>>
>>>>>> +        }
>>>>>>             /* after all hw jobs are reset, hw fence is
>>>>>> meaningless, so force_completion */
>>>>>>             amdgpu_fence_driver_force_completion(ring);
>>>>>>         }
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>> index eecf21d8ec33..815776c9a013 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>> @@ -156,11 +156,17 @@ int amdgpu_fence_emit(struct
>>>>>> amdgpu_ring *ring, struct dma_fence **f, struct amd
>>>>>>             job->ring = ring;
>>>>>>         }
>>>>>> -    seq = ++ring->fence_drv.sync_seq;
>>>>>> -    dma_fence_init(fence, &amdgpu_fence_ops,
>>>>>> -               &ring->fence_drv.lock,
>>>>>> -               adev->fence_context + ring->idx,
>>>>>> -               seq);
>>>>>> +    if (job != NULL && job->base.resubmit_flag == 1) {
>>>>>> +        /* reinit seq for resubmitted jobs */
>>>>>> +        seq = ++ring->fence_drv.sync_seq;
>>>>>> +        fence->seqno = seq;
>>>>>> +    } else {
>>>>>> +        seq = ++ring->fence_drv.sync_seq;
>>>>> Seems like you could do the above line only once above if-else
>>>>> as it was
>>>>> before
>>>> Sure, I will modify this.
>>>>
>>>>
>>>> Best Regards,
>>>> JingWen Chen
>>>>>> +        dma_fence_init(fence, &amdgpu_fence_ops,
>>>>>> +                &ring->fence_drv.lock,
>>>>>> +                adev->fence_context + ring->idx,
>>>>>> +                seq);
>>>>>> +    }
>>>>>>         if (job != NULL) {
>>>>>>             /* mark this fence has a parent job */
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>> index 7c426e225b24..d6f848adc3f4 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>> @@ -241,6 +241,7 @@ static struct dma_fence
>>>>>> *amdgpu_job_run(struct drm_sched_job *sched_job)
>>>>>>             dma_fence_set_error(finished, -ECANCELED);/* skip
>>>>>> IB as well if VRAM lost */
>>>>>>         if (finished->error < 0) {
>>>>>> +        dma_fence_put(&job->hw_fence);
>>>>>>             DRM_INFO("Skip scheduling IBs!\n");
>>>>>>         } else {
>>>>>>             r = amdgpu_ib_schedule(ring, job->num_ibs, job->ibs, job,
>>>>>> @@ -249,7 +250,8 @@ static struct dma_fence
>>>>>> *amdgpu_job_run(struct drm_sched_job *sched_job)
>>>>>>                 DRM_ERROR("Error scheduling IBs (%d)\n", r);
>>>>>>         }
>>>>>> -    dma_fence_get(fence);
>>>>>> +    if (!job->base.resubmit_flag)
>>>>>> +        dma_fence_get(fence);
>>>>>>         amdgpu_job_free_resources(job);
>>>>>>         fence = r ? ERR_PTR(r) : fence;
>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>> index f4f474944169..5a36ab5aea2d 100644
>>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>> @@ -544,6 +544,7 @@ void drm_sched_resubmit_jobs_ext(struct
>>>>>> drm_gpu_scheduler *sched, int max)
>>>>>> dma_fence_set_error(&s_fence->finished, -ECANCELED);
>>>>>>             dma_fence_put(s_job->s_fence->parent);
>>>>>> +        s_job->resubmit_flag = 1;
>>>>>>             fence = sched->ops->run_job(s_job);
>>>>>>             i++;
>>>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>>>>> index 4ea8606d91fe..06c101af1f71 100644
>>>>>> --- a/include/drm/gpu_scheduler.h
>>>>>> +++ b/include/drm/gpu_scheduler.h
>>>>>> @@ -198,6 +198,7 @@ struct drm_sched_job {
>>>>>>         enum drm_sched_priority        s_priority;
>>>>>>         struct drm_sched_entity         *entity;
>>>>>>         struct dma_fence_cb        cb;
>>>>>> +    int resubmit_flag;
>>>>>>     };
>>>>>>     static inline bool drm_sched_invalidate_job(struct
>>>>>> drm_sched_job *s_job,