<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">Ah! So we have one RCU grace period
      caused by the scheduler fence and another one from the hardware
      fence.<br>
      <br>
      Thanks for explaining that once more, that was really not obvious
      from reading the code.<br>
      <br>
      But this means we could we also fix it by moving the
      "dma_fence_put(fence->parent);" from drm_sched_fence_free()
      into drm_sched_fence_release_scheduled() directly before the
      call_rcu(), can't we?<br>
      <br>
      I think that would be cleaner, cause otherwise every driver using
      the GPU scheduler would need that workaround.<br>
      <br>
      And drm_sched_fence_release_scheduled() is only called when the
      reference count becomes zero, so when anybody accesses the
      structure after that then that would be a bug as well.<br>
      <br>
      Thanks,<br>
      Christian.<br>
      <br>
      Am 18.05.2018 um 11:56 schrieb Deng, Emily:<br>
    </div>
    <blockquote type="cite"
cite="mid:BN6PR12MB1121DDBFDB283E01AC198B038F900@BN6PR12MB1121.namprd12.prod.outlook.com">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <meta name="Generator" content="Microsoft Exchange Server">
      <!-- converted from rtf -->
      <style><!-- .EmailQuote { margin-left: 1pt; padding-left: 4pt; border-left: #800000 2px solid; } --></style>
      <font face="Calibri" size="2"><span style="font-size:11pt;">
          <div>Hi Christian,</div>
          <div>        When we free an IB fence, we first call one
            call_rcu in drm_sched_fence_release_scheduled as the call
            trace one, then after the call trace one,</div>
          <div> we call the call_rcu second in the  amdgpu_fence_release
            in call trace two, as below.</div>
          <div><font face="Times New Roman"> </font></div>
          <div> The kmem_cache_free(amdgpu_fence_slab, fence)'s call
            trace as below:</div>
          <div>    1.drm_sched_entity_fini -></div>
          <div>    drm_sched_entity_cleanup -></div>
          <div>    dma_fence_put(entity->last_scheduled) -></div>
          <div>    drm_sched_fence_release_finished -></div>
          <div>    drm_sched_fence_release_scheduled -></div>
          <div>    call_rcu(&fence->finished.rcu, <font
              color="red">drm_sched_fence_free</font>)</div>
          <div> </div>
          <div>    2.<font color="red">drm_sched_fence_free</font> -></div>
          <div>    dma_fence_put(fence->parent) -></div>
          <div>    amdgpu_fence_release -></div>
          <div>    call_rcu(&f->rcu, amdgpu_fence_free) -></div>
          <div>    kmem_cache_free(amdgpu_fence_slab, fence);</div>
          <div><font face="Times New Roman"> </font></div>
          <div>> -----Original Message-----</div>
          <div>> From: Koenig, Christian</div>
          <div>> Sent: Friday, May 18, 2018 5:46 PM</div>
          <div>> To: Deng, Emily <a class="moz-txt-link-rfc2396E" href="mailto:Emily.Deng@amd.com"><Emily.Deng@amd.com></a>;
            <a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a></div>
          <div>> Subject: Re: [PATCH] drm/amdgpu: add rcu_barrier
            after entity fini</div>
          <div>> </div>
          <div>> Ok, I'm lost where do we use call_rcu() twice? Cause
            that sounds incorrect in</div>
          <div>> the first place.</div>
          <div>> </div>
          <div>> Christian.</div>
          <div>> </div>
          <div>> Am 18.05.2018 um 11:41 schrieb Deng, Emily:</div>
          <div>> > Ping......</div>
          <div>> ></div>
          <div>> > Best Wishes,</div>
          <div>> > Emily Deng</div>
          <div>> ></div>
          <div>> ></div>
          <div>> ></div>
          <div>> ></div>
          <div>> >> -----Original Message-----</div>
          <div>> >> From: amd-gfx [<a
              href="mailto:amd-gfx-bounces@lists.freedesktop.org"
              moz-do-not-send="true">mailto:amd-gfx-bounces@lists.freedesktop.org</a>]
            On</div>
          <div>> >> Behalf Of Deng, Emily</div>
          <div>> >> Sent: Friday, May 18, 2018 11:20 AM</div>
          <div>> >> To: Koenig, Christian <<a
              href="mailto:Christian.Koenig@amd.com"
              moz-do-not-send="true">Christian.Koenig@amd.com</a>>;
            amd-</div>
          <div>> >> <a href="mailto:gfx@lists.freedesktop.org"
              moz-do-not-send="true">gfx@lists.freedesktop.org</a></div>
          <div>> >> Subject: RE: [PATCH] drm/amdgpu: add
            rcu_barrier after entity fini</div>
          <div>> >></div>
          <div>> >> Hi Christian,</div>
          <div>> >>       Yes, it has already one rcu_barrier,
            but it has called twice</div>
          <div>> >> call_rcu, so the one rcu_barrier just could
            barrier one call_rcu some time.</div>
          <div>> >>      After I added another rcu_barrier, the
            kernel issue will disappear.</div>
          <div>> >></div>
          <div>> >> Best Wishes,</div>
          <div>> >> Emily Deng</div>
          <div>> >></div>
          <div>> >>> -----Original Message-----</div>
          <div>> >>> From: Christian König [<a
              href="mailto:ckoenig.leichtzumerken@gmail.com"
              moz-do-not-send="true">mailto:ckoenig.leichtzumerken@gmail.com</a>]</div>
          <div>> >>> Sent: Thursday, May 17, 2018 7:08 PM</div>
          <div>> >>> To: Deng, Emily <<a
              href="mailto:Emily.Deng@amd.com" moz-do-not-send="true">Emily.Deng@amd.com</a>>;
            <a href="mailto:amd-gfx@lists.freedesktop.org"
              moz-do-not-send="true">amd-</a></div>
          <div>> <a class="moz-txt-link-abbreviated" href="mailto:gfx@lists.freedesktop.org">gfx@lists.freedesktop.org</a></div>
          <div>> >>> Subject: Re: [PATCH] drm/amdgpu: add
            rcu_barrier after entity fini</div>
          <div>> >>></div>
          <div>> >>> Am 17.05.2018 um 12:03 schrieb Emily
            Deng:</div>
          <div>> >>>> To free the fence from the
            amdgpu_fence_slab, need twice call_rcu,</div>
          <div>> >>>> to avoid the amdgpu_fence_slab_fini
            call</div>
          <div>> >>>>
            kmem_cache_destroy(amdgpu_fence_slab) before</div>
          <div>> >>> kmem_cache_free(amdgpu_fence_slab,
            fence), add rcu_barrier after</div>
          <div>> >>> drm_sched_entity_fini.</div>
          <div>> >>>> The
            kmem_cache_free(amdgpu_fence_slab, fence)'s call trace as</div>
          <div>> below:</div>
          <div>> >>>> 1.drm_sched_entity_fini -></div>
          <div>> >>>> drm_sched_entity_cleanup -></div>
          <div>> >>>>
            dma_fence_put(entity->last_scheduled) -></div>
          <div>> >>>> drm_sched_fence_release_finished
            -></div>
          <div>> >>> drm_sched_fence_release_scheduled</div>
          <div>> >>>> ->
            call_rcu(&fence->finished.rcu, drm_sched_fence_free)</div>
          <div>> >>>></div>
          <div>> >>>> 2.drm_sched_fence_free -></div>
          <div>> >>>> dma_fence_put(fence->parent)
            -></div>
          <div>> >>>> amdgpu_fence_release -></div>
          <div>> >>>> call_rcu(&f->rcu,
            amdgpu_fence_free) -></div>
          <div>> >>>> kmem_cache_free(amdgpu_fence_slab,
            fence);</div>
          <div>> >>>></div>
          <div>> >>>> v2:put the barrier before the
            kmem_cache_destroy</div>
          <div>> >>>></div>
          <div>> >>>> Change-Id:
            I8dcadd3372f97e72461bf46b41cc26d90f09b8df</div>
          <div>> >>>> Signed-off-by: Emily Deng <<a
              href="mailto:Emily.Deng@amd.com" moz-do-not-send="true">Emily.Deng@amd.com</a>></div>
          <div>> >>>> ---</div>
          <div>> >>>>   
            drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 1 +</div>
          <div>> >>>>    1 file changed, 1 insertion(+)</div>
          <div>> >>>></div>
          <div>> >>>> diff --git
            a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c</div>
          <div>> >>>>
            b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c</div>
          <div>> >>>> index 39ec6b8..42be65b 100644</div>
          <div>> >>>> ---
            a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c</div>
          <div>> >>>> +++
            b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c</div>
          <div>> >>>> @@ -69,6 +69,7 @@ int
            amdgpu_fence_slab_init(void)</div>
          <div>> >>>>    void
            amdgpu_fence_slab_fini(void)</div>
          <div>> >>>>    {</div>
          <div>> >>>>           rcu_barrier();</div>
          <div>> >>>> +        rcu_barrier();</div>
          <div>> >>> Well, you should have noted that there
            is already an rcu_barrier</div>
          <div>> >>> here and adding another one shouldn't
            have any additional effect. So</div>
          <div>> >>> your explanation and the proposed
            solution doesn't make to much</div>
          <div>> sense.</div>
          <div>> >>></div>
          <div>> >>> I think the problem you run into is
            rather that the fence is</div>
          <div>> >>> reference counted and might live longer
            than the module who created it.</div>
          <div>> >>></div>
          <div>> >>> Complicated issue, one possible
            solution would be to release</div>
          <div>> >>> fence->parent earlier in the
            scheduler fence but that doesn't sound</div>
          <div>> >>> fence->like</div>
          <div>> >>> a general purpose solution.</div>
          <div>> >>></div>
          <div>> >>> Christian.</div>
          <div>> >>></div>
          <div>> >>>>          
            kmem_cache_destroy(amdgpu_fence_slab);</div>
          <div>> >>>>    }</div>
          <div>> >>>>    /*</div>
          <div>> >>
            _______________________________________________</div>
          <div>> >> amd-gfx mailing list</div>
          <div>> >> <a
              href="mailto:amd-gfx@lists.freedesktop.org"
              moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a></div>
          <div>> >> <a
              href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx"
              moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/amd-gfx</a></div>
          <div><font face="Times New Roman"> </font></div>
          <div><font face="Times New Roman"> </font></div>
        </span></font>
    </blockquote>
    <br>
  </body>
</html>