[PATCH] drm/amdgpu: add rcu_barrier after entity fini

Deng, Emily Emily.Deng at amd.com
Fri May 18 09:56:58 UTC 2018


Hi Christian,
        When we free an IB fence, we first call one call_rcu in drm_sched_fence_release_scheduled as the call trace one, then after the call trace one,
 we call the call_rcu second in the  amdgpu_fence_release in call trace two, as below.

 The kmem_cache_free(amdgpu_fence_slab, fence)'s call trace as below:
    1.drm_sched_entity_fini ->
    drm_sched_entity_cleanup ->
    dma_fence_put(entity->last_scheduled) ->
    drm_sched_fence_release_finished ->
    drm_sched_fence_release_scheduled ->
    call_rcu(&fence->finished.rcu, drm_sched_fence_free)

    2.drm_sched_fence_free ->
    dma_fence_put(fence->parent) ->
    amdgpu_fence_release ->
    call_rcu(&f->rcu, amdgpu_fence_free) ->
    kmem_cache_free(amdgpu_fence_slab, fence);

> -----Original Message-----
> From: Koenig, Christian
> Sent: Friday, May 18, 2018 5:46 PM
> To: Deng, Emily <Emily.Deng at amd.com>; amd-gfx at lists.freedesktop.org
> Subject: Re: [PATCH] drm/amdgpu: add rcu_barrier after entity fini
>
> Ok, I'm lost where do we use call_rcu() twice? Cause that sounds incorrect in
> the first place.
>
> Christian.
>
> Am 18.05.2018 um 11:41 schrieb Deng, Emily:
> > Ping......
> >
> > Best Wishes,
> > Emily Deng
> >
> >
> >
> >
> >> -----Original Message-----
> >> From: amd-gfx [mailto:amd-gfx-bounces at lists.freedesktop.org] On
> >> Behalf Of Deng, Emily
> >> Sent: Friday, May 18, 2018 11:20 AM
> >> To: Koenig, Christian <Christian.Koenig at amd.com<mailto:Christian.Koenig at amd.com>>; amd-
> >> gfx at lists.freedesktop.org<mailto:gfx at lists.freedesktop.org>
> >> Subject: RE: [PATCH] drm/amdgpu: add rcu_barrier after entity fini
> >>
> >> Hi Christian,
> >>       Yes, it has already one rcu_barrier, but it has called twice
> >> call_rcu, so the one rcu_barrier just could barrier one call_rcu some time.
> >>      After I added another rcu_barrier, the kernel issue will disappear.
> >>
> >> Best Wishes,
> >> Emily Deng
> >>
> >>> -----Original Message-----
> >>> From: Christian König [mailto:ckoenig.leichtzumerken at gmail.com]
> >>> Sent: Thursday, May 17, 2018 7:08 PM
> >>> To: Deng, Emily <Emily.Deng at amd.com<mailto:Emily.Deng at amd.com>>; amd-<mailto:amd-gfx at lists.freedesktop.org>
> gfx at lists.freedesktop.org
> >>> Subject: Re: [PATCH] drm/amdgpu: add rcu_barrier after entity fini
> >>>
> >>> Am 17.05.2018 um 12:03 schrieb Emily Deng:
> >>>> To free the fence from the amdgpu_fence_slab, need twice call_rcu,
> >>>> to avoid the amdgpu_fence_slab_fini call
> >>>> kmem_cache_destroy(amdgpu_fence_slab) before
> >>> kmem_cache_free(amdgpu_fence_slab, fence), add rcu_barrier after
> >>> drm_sched_entity_fini.
> >>>> The kmem_cache_free(amdgpu_fence_slab, fence)'s call trace as
> below:
> >>>> 1.drm_sched_entity_fini ->
> >>>> drm_sched_entity_cleanup ->
> >>>> dma_fence_put(entity->last_scheduled) ->
> >>>> drm_sched_fence_release_finished ->
> >>> drm_sched_fence_release_scheduled
> >>>> -> call_rcu(&fence->finished.rcu, drm_sched_fence_free)
> >>>>
> >>>> 2.drm_sched_fence_free ->
> >>>> dma_fence_put(fence->parent) ->
> >>>> amdgpu_fence_release ->
> >>>> call_rcu(&f->rcu, amdgpu_fence_free) ->
> >>>> kmem_cache_free(amdgpu_fence_slab, fence);
> >>>>
> >>>> v2:put the barrier before the kmem_cache_destroy
> >>>>
> >>>> Change-Id: I8dcadd3372f97e72461bf46b41cc26d90f09b8df
> >>>> Signed-off-by: Emily Deng <Emily.Deng at amd.com<mailto:Emily.Deng at amd.com>>
> >>>> ---
> >>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 1 +
> >>>>    1 file changed, 1 insertion(+)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> >>>> index 39ec6b8..42be65b 100644
> >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> >>>> @@ -69,6 +69,7 @@ int amdgpu_fence_slab_init(void)
> >>>>    void amdgpu_fence_slab_fini(void)
> >>>>    {
> >>>>          rcu_barrier();
> >>>> +        rcu_barrier();
> >>> Well, you should have noted that there is already an rcu_barrier
> >>> here and adding another one shouldn't have any additional effect. So
> >>> your explanation and the proposed solution doesn't make to much
> sense.
> >>>
> >>> I think the problem you run into is rather that the fence is
> >>> reference counted and might live longer than the module who created it.
> >>>
> >>> Complicated issue, one possible solution would be to release
> >>> fence->parent earlier in the scheduler fence but that doesn't sound
> >>> fence->like
> >>> a general purpose solution.
> >>>
> >>> Christian.
> >>>
> >>>>          kmem_cache_destroy(amdgpu_fence_slab);
> >>>>    }
> >>>>    /*
> >> _______________________________________________
> >> amd-gfx mailing list
> >> amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>
> >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20180518/5431cd34/attachment.html>


More information about the amd-gfx mailing list