<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">Ah! So we have one RCU grace period
caused by the scheduler fence and another one from the hardware
fence.<br>
<br>
Thanks for explaining that once more, that was really not obvious
from reading the code.<br>
<br>
But this means we could we also fix it by moving the
"dma_fence_put(fence->parent);" from drm_sched_fence_free()
into drm_sched_fence_release_scheduled() directly before the
call_rcu(), can't we?<br>
<br>
I think that would be cleaner, cause otherwise every driver using
the GPU scheduler would need that workaround.<br>
<br>
And drm_sched_fence_release_scheduled() is only called when the
reference count becomes zero, so when anybody accesses the
structure after that then that would be a bug as well.<br>
<br>
Thanks,<br>
Christian.<br>
<br>
Am 18.05.2018 um 11:56 schrieb Deng, Emily:<br>
</div>
<blockquote type="cite"
cite="mid:BN6PR12MB1121DDBFDB283E01AC198B038F900@BN6PR12MB1121.namprd12.prod.outlook.com">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Exchange Server">
<!-- converted from rtf -->
<style><!-- .EmailQuote { margin-left: 1pt; padding-left: 4pt; border-left: #800000 2px solid; } --></style>
<font face="Calibri" size="2"><span style="font-size:11pt;">
<div>Hi Christian,</div>
<div> When we free an IB fence, we first call one
call_rcu in drm_sched_fence_release_scheduled as the call
trace one, then after the call trace one,</div>
<div> we call the call_rcu second in the amdgpu_fence_release
in call trace two, as below.</div>
<div><font face="Times New Roman"> </font></div>
<div> The kmem_cache_free(amdgpu_fence_slab, fence)'s call
trace as below:</div>
<div> 1.drm_sched_entity_fini -></div>
<div> drm_sched_entity_cleanup -></div>
<div> dma_fence_put(entity->last_scheduled) -></div>
<div> drm_sched_fence_release_finished -></div>
<div> drm_sched_fence_release_scheduled -></div>
<div> call_rcu(&fence->finished.rcu, <font
color="red">drm_sched_fence_free</font>)</div>
<div> </div>
<div> 2.<font color="red">drm_sched_fence_free</font> -></div>
<div> dma_fence_put(fence->parent) -></div>
<div> amdgpu_fence_release -></div>
<div> call_rcu(&f->rcu, amdgpu_fence_free) -></div>
<div> kmem_cache_free(amdgpu_fence_slab, fence);</div>
<div><font face="Times New Roman"> </font></div>
<div>> -----Original Message-----</div>
<div>> From: Koenig, Christian</div>
<div>> Sent: Friday, May 18, 2018 5:46 PM</div>
<div>> To: Deng, Emily <a class="moz-txt-link-rfc2396E" href="mailto:Emily.Deng@amd.com"><Emily.Deng@amd.com></a>;
<a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a></div>
<div>> Subject: Re: [PATCH] drm/amdgpu: add rcu_barrier
after entity fini</div>
<div>> </div>
<div>> Ok, I'm lost where do we use call_rcu() twice? Cause
that sounds incorrect in</div>
<div>> the first place.</div>
<div>> </div>
<div>> Christian.</div>
<div>> </div>
<div>> Am 18.05.2018 um 11:41 schrieb Deng, Emily:</div>
<div>> > Ping......</div>
<div>> ></div>
<div>> > Best Wishes,</div>
<div>> > Emily Deng</div>
<div>> ></div>
<div>> ></div>
<div>> ></div>
<div>> ></div>
<div>> >> -----Original Message-----</div>
<div>> >> From: amd-gfx [<a
href="mailto:amd-gfx-bounces@lists.freedesktop.org"
moz-do-not-send="true">mailto:amd-gfx-bounces@lists.freedesktop.org</a>]
On</div>
<div>> >> Behalf Of Deng, Emily</div>
<div>> >> Sent: Friday, May 18, 2018 11:20 AM</div>
<div>> >> To: Koenig, Christian <<a
href="mailto:Christian.Koenig@amd.com"
moz-do-not-send="true">Christian.Koenig@amd.com</a>>;
amd-</div>
<div>> >> <a href="mailto:gfx@lists.freedesktop.org"
moz-do-not-send="true">gfx@lists.freedesktop.org</a></div>
<div>> >> Subject: RE: [PATCH] drm/amdgpu: add
rcu_barrier after entity fini</div>
<div>> >></div>
<div>> >> Hi Christian,</div>
<div>> >> Yes, it has already one rcu_barrier,
but it has called twice</div>
<div>> >> call_rcu, so the one rcu_barrier just could
barrier one call_rcu some time.</div>
<div>> >> After I added another rcu_barrier, the
kernel issue will disappear.</div>
<div>> >></div>
<div>> >> Best Wishes,</div>
<div>> >> Emily Deng</div>
<div>> >></div>
<div>> >>> -----Original Message-----</div>
<div>> >>> From: Christian König [<a
href="mailto:ckoenig.leichtzumerken@gmail.com"
moz-do-not-send="true">mailto:ckoenig.leichtzumerken@gmail.com</a>]</div>
<div>> >>> Sent: Thursday, May 17, 2018 7:08 PM</div>
<div>> >>> To: Deng, Emily <<a
href="mailto:Emily.Deng@amd.com" moz-do-not-send="true">Emily.Deng@amd.com</a>>;
<a href="mailto:amd-gfx@lists.freedesktop.org"
moz-do-not-send="true">amd-</a></div>
<div>> <a class="moz-txt-link-abbreviated" href="mailto:gfx@lists.freedesktop.org">gfx@lists.freedesktop.org</a></div>
<div>> >>> Subject: Re: [PATCH] drm/amdgpu: add
rcu_barrier after entity fini</div>
<div>> >>></div>
<div>> >>> Am 17.05.2018 um 12:03 schrieb Emily
Deng:</div>
<div>> >>>> To free the fence from the
amdgpu_fence_slab, need twice call_rcu,</div>
<div>> >>>> to avoid the amdgpu_fence_slab_fini
call</div>
<div>> >>>>
kmem_cache_destroy(amdgpu_fence_slab) before</div>
<div>> >>> kmem_cache_free(amdgpu_fence_slab,
fence), add rcu_barrier after</div>
<div>> >>> drm_sched_entity_fini.</div>
<div>> >>>> The
kmem_cache_free(amdgpu_fence_slab, fence)'s call trace as</div>
<div>> below:</div>
<div>> >>>> 1.drm_sched_entity_fini -></div>
<div>> >>>> drm_sched_entity_cleanup -></div>
<div>> >>>>
dma_fence_put(entity->last_scheduled) -></div>
<div>> >>>> drm_sched_fence_release_finished
-></div>
<div>> >>> drm_sched_fence_release_scheduled</div>
<div>> >>>> ->
call_rcu(&fence->finished.rcu, drm_sched_fence_free)</div>
<div>> >>>></div>
<div>> >>>> 2.drm_sched_fence_free -></div>
<div>> >>>> dma_fence_put(fence->parent)
-></div>
<div>> >>>> amdgpu_fence_release -></div>
<div>> >>>> call_rcu(&f->rcu,
amdgpu_fence_free) -></div>
<div>> >>>> kmem_cache_free(amdgpu_fence_slab,
fence);</div>
<div>> >>>></div>
<div>> >>>> v2:put the barrier before the
kmem_cache_destroy</div>
<div>> >>>></div>
<div>> >>>> Change-Id:
I8dcadd3372f97e72461bf46b41cc26d90f09b8df</div>
<div>> >>>> Signed-off-by: Emily Deng <<a
href="mailto:Emily.Deng@amd.com" moz-do-not-send="true">Emily.Deng@amd.com</a>></div>
<div>> >>>> ---</div>
<div>> >>>>
drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 1 +</div>
<div>> >>>> 1 file changed, 1 insertion(+)</div>
<div>> >>>></div>
<div>> >>>> diff --git
a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c</div>
<div>> >>>>
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c</div>
<div>> >>>> index 39ec6b8..42be65b 100644</div>
<div>> >>>> ---
a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c</div>
<div>> >>>> +++
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c</div>
<div>> >>>> @@ -69,6 +69,7 @@ int
amdgpu_fence_slab_init(void)</div>
<div>> >>>> void
amdgpu_fence_slab_fini(void)</div>
<div>> >>>> {</div>
<div>> >>>> rcu_barrier();</div>
<div>> >>>> + rcu_barrier();</div>
<div>> >>> Well, you should have noted that there
is already an rcu_barrier</div>
<div>> >>> here and adding another one shouldn't
have any additional effect. So</div>
<div>> >>> your explanation and the proposed
solution doesn't make to much</div>
<div>> sense.</div>
<div>> >>></div>
<div>> >>> I think the problem you run into is
rather that the fence is</div>
<div>> >>> reference counted and might live longer
than the module who created it.</div>
<div>> >>></div>
<div>> >>> Complicated issue, one possible
solution would be to release</div>
<div>> >>> fence->parent earlier in the
scheduler fence but that doesn't sound</div>
<div>> >>> fence->like</div>
<div>> >>> a general purpose solution.</div>
<div>> >>></div>
<div>> >>> Christian.</div>
<div>> >>></div>
<div>> >>>>
kmem_cache_destroy(amdgpu_fence_slab);</div>
<div>> >>>> }</div>
<div>> >>>> /*</div>
<div>> >>
_______________________________________________</div>
<div>> >> amd-gfx mailing list</div>
<div>> >> <a
href="mailto:amd-gfx@lists.freedesktop.org"
moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a></div>
<div>> >> <a
href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx"
moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/amd-gfx</a></div>
<div><font face="Times New Roman"> </font></div>
<div><font face="Times New Roman"> </font></div>
</span></font>
</blockquote>
<br>
</body>
</html>