[PATCH] drm/amdgpu: add rcu_barrier after entity fini

Deng, Emily Emily.Deng at amd.com
Fri May 18 03:20:11 UTC 2018


Hi Christian,
     Yes, it has already one rcu_barrier, but it has called twice call_rcu, so the one rcu_barrier just could barrier one call_rcu some time.
    After I added another rcu_barrier, the kernel issue will disappear.

Best Wishes,
Emily Deng

> -----Original Message-----
> From: Christian König [mailto:ckoenig.leichtzumerken at gmail.com]
> Sent: Thursday, May 17, 2018 7:08 PM
> To: Deng, Emily <Emily.Deng at amd.com>; amd-gfx at lists.freedesktop.org
> Subject: Re: [PATCH] drm/amdgpu: add rcu_barrier after entity fini
> 
> Am 17.05.2018 um 12:03 schrieb Emily Deng:
> > To free the fence from the amdgpu_fence_slab, need twice call_rcu, to
> > avoid the amdgpu_fence_slab_fini call
> > kmem_cache_destroy(amdgpu_fence_slab) before
> kmem_cache_free(amdgpu_fence_slab, fence), add rcu_barrier after
> drm_sched_entity_fini.
> >
> > The kmem_cache_free(amdgpu_fence_slab, fence)'s call trace as below:
> > 1.drm_sched_entity_fini ->
> > drm_sched_entity_cleanup ->
> > dma_fence_put(entity->last_scheduled) ->
> > drm_sched_fence_release_finished ->
> drm_sched_fence_release_scheduled
> > -> call_rcu(&fence->finished.rcu, drm_sched_fence_free)
> >
> > 2.drm_sched_fence_free ->
> > dma_fence_put(fence->parent) ->
> > amdgpu_fence_release ->
> > call_rcu(&f->rcu, amdgpu_fence_free) ->
> > kmem_cache_free(amdgpu_fence_slab, fence);
> >
> > v2:put the barrier before the kmem_cache_destroy
> >
> > Change-Id: I8dcadd3372f97e72461bf46b41cc26d90f09b8df
> > Signed-off-by: Emily Deng <Emily.Deng at amd.com>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 1 +
> >   1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > index 39ec6b8..42be65b 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > @@ -69,6 +69,7 @@ int amdgpu_fence_slab_init(void)
> >   void amdgpu_fence_slab_fini(void)
> >   {
> >   	rcu_barrier();
> > +	rcu_barrier();
> 
> Well, you should have noted that there is already an rcu_barrier here and
> adding another one shouldn't have any additional effect. So your explanation
> and the proposed solution doesn't make to much sense.
> 
> I think the problem you run into is rather that the fence is reference counted
> and might live longer than the module who created it.
> 
> Complicated issue, one possible solution would be to release
> fence->parent earlier in the scheduler fence but that doesn't sound like
> a general purpose solution.
> 
> Christian.
> 
> >   	kmem_cache_destroy(amdgpu_fence_slab);
> >   }
> >   /*



More information about the amd-gfx mailing list