[PATCH 2/2] drm: add tdr support for embeded hw_fence

Fri Jul 23 04:36:12 UTC 2021

On Fri Jul 23, 2021 at 12:06:32AM -0400, Andrey Grodzovsky wrote:
> 
> On 2021-07-22 8:20 p.m., Jingwen Chen wrote:
> > On Thu Jul 22, 2021 at 01:50:09PM -0400, Andrey Grodzovsky wrote:
> > > On 2021-07-22 1:27 p.m., Jingwen Chen wrote:
> > > > On Thu Jul 22, 2021 at 01:17:13PM -0400, Andrey Grodzovsky wrote:
> > > > > On 2021-07-22 12:47 p.m., Jingwen Chen wrote:
> > > > > > On Thu Jul 22, 2021 at 06:24:28PM +0200, Christian König wrote:
> > > > > > > Am 22.07.21 um 16:45 schrieb Andrey Grodzovsky:
> > > > > > > > On 2021-07-22 6:45 a.m., Jingwen Chen wrote:
> > > > > > > > > On Wed Jul 21, 2021 at 12:53:51PM -0400, Andrey Grodzovsky wrote:
> > > > > > > > > > On 2021-07-20 11:13 p.m., Jingwen Chen wrote:
> > > > > > > > > > > [Why]
> > > > > > > > > > > After embeded hw_fence to amdgpu_job, we need to add tdr support
> > > > > > > > > > > for this feature.
> > > > > > > > > > > 
> > > > > > > > > > > [How]
> > > > > > > > > > > 1. Add a resubmit_flag for resubmit jobs.
> > > > > > > > > > > 2. Clear job fence from RCU and force complete vm flush fences in
> > > > > > > > > > >         pre_asic_reset
> > > > > > > > > > > 3. skip dma_fence_get for resubmit jobs and add a dma_fence_put
> > > > > > > > > > >         for guilty jobs.
> > > > > > > > > > > 
> > > > > > > > > > > Signed-off-by: Jack Zhang <Jack.Zhang1 at amd.com>
> > > > > > > > > > > Signed-off-by: Jingwen Chen <Jingwen.Chen2 at amd.com>
> > > > > > > > > > > ---
> > > > > > > > > > >       drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 12 +++++++++++-
> > > > > > > > > > >       drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 16 +++++++++++-----
> > > > > > > > > > >       drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    |  4 +++-
> > > > > > > > > > >       drivers/gpu/drm/scheduler/sched_main.c     |  1 +
> > > > > > > > > > >       include/drm/gpu_scheduler.h                |  1 +
> > > > > > > > > > >       5 files changed, 27 insertions(+), 7 deletions(-)
> > > > > > > > > > > 
> > > > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > > > > > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > > > > > > > > > index 40461547701a..fe0237f72a09 100644
> > > > > > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > > > > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > > > > > > > > > @@ -4382,7 +4382,7 @@ int amdgpu_device_mode1_reset(struct
> > > > > > > > > > > amdgpu_device *adev)
> > > > > > > > > > >       int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev,
> > > > > > > > > > >                        struct amdgpu_reset_context *reset_context)
> > > > > > > > > > >       {
> > > > > > > > > > > -    int i, r = 0;
> > > > > > > > > > > +    int i, j, r = 0;
> > > > > > > > > > >           struct amdgpu_job *job = NULL;
> > > > > > > > > > >           bool need_full_reset =
> > > > > > > > > > >               test_bit(AMDGPU_NEED_FULL_RESET, &reset_context->flags);
> > > > > > > > > > > @@ -4406,6 +4406,16 @@ int
> > > > > > > > > > > amdgpu_device_pre_asic_reset(struct amdgpu_device *adev,
> > > > > > > > > > >               if (!ring || !ring->sched.thread)
> > > > > > > > > > >                   continue;
> > > > > > > > > > > +        /*clear job fence from fence drv to avoid force_completion
> > > > > > > > > > > +         *leave NULL and vm flush fence in fence drv */
> > > > > > > > > > > +        for (j = 0; j <= ring->fence_drv.num_fences_mask; j ++) {
> > > > > > > > > > > +            struct dma_fence *old,**ptr;
> > > > > > > > > > > +            ptr = &ring->fence_drv.fences[j];
> > > > > > > > > > > +            old = rcu_dereference_protected(*ptr, 1);
> > > > > > > > > > > +            if (old && test_bit(DMA_FENCE_FLAG_USER_BITS,
> > > > > > > > > > > &old->flags))) {
> > > > > > > > > > > +                RCU_INIT_POINTER(*ptr, NULL);
> > > > > > > > > > > +            }
> > > > > > > > > > Is this to avoid premature job free because of dma_fence_put inside
> > > > > > > > > > amdgpu_fence_process ?
> > > > > > > > > > I can't currently remember why but we probably want all the HW fences
> > > > > > > > > > currently in the ring to
> > > > > > > > > > be forced signaled - maybe better to test for DMA_FENCE_FLAG_USER_BITS
> > > > > > > > > > inside amdgpu_fence_process
> > > > > > > > > > and still do the signaling but not the dma_fence_put part
> > > > > > > > > > 
> > > > > > > > > > Andrey
> > > > > > > > > Hi Andrey,
> > > > > > > > > 
> > > > > > > > > This is to avoid signaling the same fence twice. If we still do the
> > > > > > > > > signaling, then the job in the pending list will be signaled first in
> > > > > > > > > force_completion, and later be signaled in resubmit. This will go to
> > > > > > > > > BUG() in amdgpu_fence_process.
> > > > > > > > Oh, i see, how about just adding 'skip' flag to amdgpu_ring and setting
> > > > > > > > it before calling
> > > > > > > > amdgpu_fence_driver_force_completion and resetting it after, then inside
> > > > > > > > amdgpu_fence_driver_force_completion
> > > > > > > > you can just skip the signaling part with this flag for fences with
> > > > > > > > DMA_FENCE_FLAG_USER_BITS set
> > > > > > > > Less lines of code at least.
> > > > > > > Still sounds quite a bit hacky.
> > > > > > > 
> > > > > > > I would rather suggest to completely drop the approach with
> > > > > > > amdgpu_fence_driver_force_completion(). I could never see why we would want
> > > > > > > that in the first place.
> > > > > > > 
> > > > > > > Regards,
> > > > > > > Christian.
> > > > > > > 
> > > > > > Hi Christian,
> > > > > > 
> > > > > > I keep the amdgpu_fence_driver_force_completion here to make sure the vm
> > > > > > flush fence is signaled and put.
> > > > > Right, so we need to do force completion for the sake of all the fences
> > > > > without parent jobs since there is code which wait directly on them.
> > > > > 
> > > > > 
> > > > > > So the key question is whether the fence of ib test should be the first
> > > > > > fence to be signaled after reset.
> > > > > What do you mean by 'after reset' - at this point in the code there was
> > > > > no ASIC reset yet, we just stopped the schedulers and detached the
> > > > > HW fences from their parent jobs (sched_fence)
> > > > I mean the ASIC reset. There will be a ib_test for each ring after ASIC
> > > > reset.
> > > 
> > > Then why wouldn't they be the first ? They will emit new fences into the
> > > ring
> > > which will be signaled right away because the ASIC went through reset and is
> > > not
> > > hung anymore.  All the previous fences, including VM flush once from before
> > > the
> > > reset will be already signaled by this time from
> > > amdgpu_fence_driver_force_completion.
> > > 
> > Hi Andrey,
> > 
> > Sorry I didn't make it clear. I mean if we drop force_completion here,
> > and keep other code unchanged, then the ib_test wouldn't be the first to
> > be signaled.
> 
> 
> At least in my opinion the order is not important of who will be signaled
> first. I do think it's important
> to force signal all the old fences before HW reset for 2 reasons: 1st - it's
> conceptually wrong to leave them
> unsignaled and let them be signaled post ASIC reset because  it's not them
> who actually completed whatever
> they were doing but rather ASIC reset allowed later fences (with higher
> sync_seq) to signal (e.g. ring test) and
> those earlier remaining fences (e.g. VM_FLUSH)  just were signaled as part
> of this. 2nd - Possible deadlock where
> ring->fence_drv.fences is full with fences before ASIC reset because we
> don't force signal and so don't empty it and then
> after reset we call amdgpu_fence_emit as part of ib_test but it
> de-references old yet unsignaled  fence and does
> dma_fence_wait to make sure it completed which it didn't and will not
> because EOP ISR will be called only after actual
> fence emission which is blocked by the wait.
> 
> Andrey
Hi Andrey

Yes, I forget to talk about the deadlock previously. Actually I have
encountered a deadlock during debug already which is exactly the same
reason you mentioned.

I totally agree with not dropping force_completion.
> 
> 
> > 
> > 
> > Best Regards,
> > JingWen Chen
> > > > > > If it should be, it means not only fences with DMA_FENCE_FLAG_USER_BITS
> > > > > > but also vm flush fences should be removed from RCU fence array before
> > > > > > ib_test.
> > > 
> > > As I mentioned above, not clear to me why VM fences should get special
> > > treatment here.
> > > 
> > > Andrey
> > > 
> > > 
> > > > > Which ib_test is it, the one after ASIC reset ?
> > > > > 
> > > > Yes
> > > > 
> > > > 
> > > > Best Regards,
> > > > JingWen Chen
> > > > > Andrey
> > > > > 
> > > > > >     Then we must do the force_completion here for vm flush
> > > > > > fence, otherwise there will be a memory leak here as no one will signal
> > > > > > and put it after we remove it from RCU.
> > > > > > If not, then the first fence to signal could be vm flush fence. And I'm
> > > > > > OK with drop amdgpu_fence_driver_force_completion here.
> > > > > > 
> > > > > > 
> > > > > > Best Regards,
> > > > > > JingWen Chen
> > > > > > > > Andrey
> > > > > > > > 
> > > > > > > > 
> > > > > > > > > > > +        }
> > > > > > > > > > >               /* after all hw jobs are reset, hw fence is
> > > > > > > > > > > meaningless, so force_completion */
> > > > > > > > > > >               amdgpu_fence_driver_force_completion(ring);
> > > > > > > > > > >           }
> > > > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > > > > > > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > > > > > > > > > > index eecf21d8ec33..815776c9a013 100644
> > > > > > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > > > > > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > > > > > > > > > > @@ -156,11 +156,17 @@ int amdgpu_fence_emit(struct
> > > > > > > > > > > amdgpu_ring *ring, struct dma_fence **f, struct amd
> > > > > > > > > > >               job->ring = ring;
> > > > > > > > > > >           }
> > > > > > > > > > > -    seq = ++ring->fence_drv.sync_seq;
> > > > > > > > > > > -    dma_fence_init(fence, &amdgpu_fence_ops,
> > > > > > > > > > > -               &ring->fence_drv.lock,
> > > > > > > > > > > -               adev->fence_context + ring->idx,
> > > > > > > > > > > -               seq);
> > > > > > > > > > > +    if (job != NULL && job->base.resubmit_flag == 1) {
> > > > > > > > > > > +        /* reinit seq for resubmitted jobs */
> > > > > > > > > > > +        seq = ++ring->fence_drv.sync_seq;
> > > > > > > > > > > +        fence->seqno = seq;
> > > > > > > > > > > +    } else {
> > > > > > > > > > > +        seq = ++ring->fence_drv.sync_seq;
> > > > > > > > > > Seems like you could do the above line only once above if-else
> > > > > > > > > > as it was
> > > > > > > > > > before
> > > > > > > > > Sure, I will modify this.
> > > > > > > > > 
> > > > > > > > > 
I will modify this part and upload a v2 patch. Can you help add a ACK or
review by?

Best Regards,
JingWen Chen
> > > > > > > > > Best Regards,
> > > > > > > > > JingWen Chen
> > > > > > > > > > > +        dma_fence_init(fence, &amdgpu_fence_ops,
> > > > > > > > > > > +                &ring->fence_drv.lock,
> > > > > > > > > > > +                adev->fence_context + ring->idx,
> > > > > > > > > > > +                seq);
> > > > > > > > > > > +    }
> > > > > > > > > > >           if (job != NULL) {
> > > > > > > > > > >               /* mark this fence has a parent job */
> > > > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > > > > > > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > > > > > > > > > > index 7c426e225b24..d6f848adc3f4 100644
> > > > > > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > > > > > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > > > > > > > > > > @@ -241,6 +241,7 @@ static struct dma_fence
> > > > > > > > > > > *amdgpu_job_run(struct drm_sched_job *sched_job)
> > > > > > > > > > >               dma_fence_set_error(finished, -ECANCELED);/* skip
> > > > > > > > > > > IB as well if VRAM lost */
> > > > > > > > > > >           if (finished->error < 0) {
> > > > > > > > > > > +        dma_fence_put(&job->hw_fence);
> > > > > > > > > > >               DRM_INFO("Skip scheduling IBs!\n");
> > > > > > > > > > >           } else {
> > > > > > > > > > >               r = amdgpu_ib_schedule(ring, job->num_ibs, job->ibs, job,
> > > > > > > > > > > @@ -249,7 +250,8 @@ static struct dma_fence
> > > > > > > > > > > *amdgpu_job_run(struct drm_sched_job *sched_job)
> > > > > > > > > > >                   DRM_ERROR("Error scheduling IBs (%d)\n", r);
> > > > > > > > > > >           }
> > > > > > > > > > > -    dma_fence_get(fence);
> > > > > > > > > > > +    if (!job->base.resubmit_flag)
> > > > > > > > > > > +        dma_fence_get(fence);
> > > > > > > > > > >           amdgpu_job_free_resources(job);
> > > > > > > > > > >           fence = r ? ERR_PTR(r) : fence;
> > > > > > > > > > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > > > > > > b/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > > > > > > index f4f474944169..5a36ab5aea2d 100644
> > > > > > > > > > > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > > > > > > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > > > > > > @@ -544,6 +544,7 @@ void drm_sched_resubmit_jobs_ext(struct
> > > > > > > > > > > drm_gpu_scheduler *sched, int max)
> > > > > > > > > > > dma_fence_set_error(&s_fence->finished, -ECANCELED);
> > > > > > > > > > >               dma_fence_put(s_job->s_fence->parent);
> > > > > > > > > > > +        s_job->resubmit_flag = 1;
> > > > > > > > > > >               fence = sched->ops->run_job(s_job);
> > > > > > > > > > >               i++;
> > > > > > > > > > > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> > > > > > > > > > > index 4ea8606d91fe..06c101af1f71 100644
> > > > > > > > > > > --- a/include/drm/gpu_scheduler.h
> > > > > > > > > > > +++ b/include/drm/gpu_scheduler.h
> > > > > > > > > > > @@ -198,6 +198,7 @@ struct drm_sched_job {
> > > > > > > > > > >           enum drm_sched_priority        s_priority;
> > > > > > > > > > >           struct drm_sched_entity         *entity;
> > > > > > > > > > >           struct dma_fence_cb        cb;
> > > > > > > > > > > +    int resubmit_flag;
> > > > > > > > > > >       };
> > > > > > > > > > >       static inline bool drm_sched_invalidate_job(struct
> > > > > > > > > > > drm_sched_job *s_job,
> > > > > > _______________________________________________
> > > > > > amd-gfx mailing list
> > > > > > amd-gfx at lists.freedesktop.org
> > > > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7Candrey.grodzovsky%40amd.com%7C9749ced7ce6645bd832408d94d305aaa%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637625692355112825%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=OhLndCUDlWcrhg%2FA0QQ%2FoONxdmQ46CT7P%2F8uvSTGnQ8%3D&reserved=0