<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">
<blockquote type="cite">Without correctly kunmap, page table
is corrupted. Page entries point to wrong memory locations. You
might call it completely harmless. But I think this is a severe
bug. Leaking memory is better than a corrupted page table. Think
security.</blockquote>
We are talking about the page tables for the vmalloc area in the
kernel here, so no security problem. Leaking memory is much more
problematic.<br>
<br>
<blockquote type="cite">
<p>Would you provide any document and reference by saying"<span
style="font-family: Calibri, Arial, Helvetica, sans-serif,
EmojiFont, 'Apple Color Emoji', 'Segoe UI Emoji',
NotoColorEmoji, 'Segoe UI Symbol', 'Android Emoji',
EmojiSymbols; font-size: 16px;"> It is impossible to receive
a signal during module load/unload"? For example, if the
unload stuck in a lock, can CTRL+C stop the unload?</span></p>
</blockquote>
No, CTRL+C doesn't abort module load/unload. There have been
patches to changes this a while ago, but IIRC it broke a whole
bunch of driver relying on this.<br>
<br>
<blockquote type="cite">
<p><span style="font-size: 12pt; font-family: Calibri, Arial,
Helvetica, sans-serif;">What about there is some other
return error? What about in future somebody improve <span
style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 16px;">amdgpu_bo_reserve to return other
errors, then function <span>amdgpu_vram_scratch_fini</span>
becomes buggy?</span></span></p>
</blockquote>
Yes, that is indeed an issue. For example -EDEADLK is possible as
well. That's why I said we should use amdgpu_bo_free_kernel()
instead.<br>
<br>
<blockquote type="cite">
<p>While I am thinking whether there is a better way for the
current situation, would you give a real world example that my
patch really not working? Then we can address it.</p>
</blockquote>
I don't think there is because the driver can't receive a signal
during load/unload, but the problem is rather that the patch
doesn't improve the situation at all.<br>
<br>
Regards,<br>
Christian.<br>
<br>
Am 19.04.2017 um 13:37 schrieb Xie, AlexBin:<br>
</div>
<blockquote
cite="mid:MWHPR1201MB0045EC51F6CE7C6C4283888CF2180@MWHPR1201MB0045.namprd12.prod.outlook.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style>
<div id="divtagdefaultwrapper"
style="font-size:12pt;color:#000000;font-family:Calibri,Arial,Helvetica,sans-serif;"
dir="ltr">
<p>Hi Christian,</p>
<p><br>
</p>
<p>Without correctly kunmap, page table is corrupted. Page
entries point to wrong memory locations. You might call it
completely harmless. But I think this is a severe bug. Leaking
memory is better than a corrupted page table. Think security.</p>
<p><br>
</p>
<p>Would you provide any document and reference by saying"<span
style="font-family: Calibri, Arial, Helvetica, sans-serif,
EmojiFont, 'Apple Color Emoji', 'Segoe UI Emoji',
NotoColorEmoji, 'Segoe UI Symbol', 'Android Emoji',
EmojiSymbols; font-size: 16px;"> It is impossible to receive
a signal during module load/unload"? For example, if the
unload stuck in a lock, can CTRL+C stop the unload?</span></p>
<p><span style="font-family: Calibri, Arial, Helvetica,
sans-serif, EmojiFont, 'Apple Color Emoji', 'Segoe UI
Emoji', NotoColorEmoji, 'Segoe UI Symbol', 'Android Emoji',
EmojiSymbols; font-size: 16px;"><br>
</span></p>
<p><span style="font-family: Calibri, Arial, Helvetica,
sans-serif, EmojiFont, 'Apple Color Emoji', 'Segoe UI
Emoji', NotoColorEmoji, 'Segoe UI Symbol', 'Android Emoji',
EmojiSymbols; font-size: 16px;">If "<span
style="font-family: Calibri, Arial, Helvetica, sans-serif,
EmojiFont, 'Apple Color Emoji', 'Segoe UI Emoji',
NotoColorEmoji, 'Segoe UI Symbol', 'Android Emoji',
EmojiSymbols; font-size: 16px;">It is impossible to
receive a signal during module load/unload", interruptible
waiting is fine too, because function </span></span><span
style="font-size: 12pt; font-family: Calibri, Arial,
Helvetica, sans-serif;">amdgpu_bo_reserve will return
successfully.</span></p>
<p><span style="font-size: 12pt; font-family: Calibri, Arial,
Helvetica, sans-serif;"><br>
</span></p>
<p><span style="font-size: 12pt; font-family: Calibri, Arial,
Helvetica, sans-serif;">What about there is some other
return error? What about in future somebody improve <span
style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 16px;">amdgpu_bo_reserve to return other
errors, then function <span>amdgpu_vram_scratch_fini</span>
becomes buggy?</span></span></p>
<p><span style="font-family: Calibri, Arial, Helvetica,
sans-serif, EmojiFont, 'Apple Color Emoji', 'Segoe UI
Emoji', NotoColorEmoji, 'Segoe UI Symbol', 'Android Emoji',
EmojiSymbols; font-size: 16px;"><br>
</span></p>
<p>While I am thinking whether there is a better way for the
current situation, would you give a real world example that my
patch really not working? Then we can address it.</p>
<p><br>
</p>
<p>Thanks,</p>
<p>Alex Bin</p>
<br>
<div style="color: rgb(0, 0, 0);">
<hr tabindex="-1" style="display:inline-block; width:98%">
<div id="divRplyFwdMsg" dir="ltr"><font style="font-size:11pt"
color="#000000" face="Calibri, sans-serif"><b>From:</b>
Christian König <a class="moz-txt-link-rfc2396E" href="mailto:deathsimple@vodafone.de"><deathsimple@vodafone.de></a><br>
<b>Sent:</b> Wednesday, April 19, 2017 2:35 AM<br>
<b>To:</b> Xie, AlexBin; Zhou, David(ChunMing);
<a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a><br>
<b>Subject:</b> Re: [PATCH] dmr/amdgpu: Fix wrongly unref
of BO</font>
<div> </div>
</div>
<div>
<div class="moz-cite-prefix">Hi AlexBin,<br>
<br>
the answer is ttm_bo_kunmap isn't called at all and yes in
the case of an iomap we leak the address space reserved.<br>
<br>
But that is completely harmless on a 64bit system compared
to leaking the memory backing the address space.<br>
<br>
Using <span style=""><span>amdgpu_bo_free_kernel()
instead of openly coding it here is probably a good
idea.<br>
<br>
Additional to that it's probably a good idea to set
the no_intr flag when reserving kernel BOs. It is
impossible to receive a signal during module
load/unload, but it's probably better to document that
in the code as well.<br>
<br>
Regards,<br>
Christian.<br>
</span></span><br>
Am 18.04.2017 um 20:54 schrieb Xie, AlexBin:<br>
</div>
<blockquote type="cite">
<style type="text/css" style="display:none">
<!--
p
{margin-top:0;
margin-bottom:0}
-->
</style>
<div id="divtagdefaultwrapper" dir="ltr"
style="font-size:12pt; color:#000000;
font-family:Calibri,Arial,Helvetica,sans-serif">
Hi Christian,
<div><br>
</div>
<div>Have you found how/where/when? When you said "<span
style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:16px">mapping
will just</span><span
style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:16px"> be released a bit later on", you
must know the answer. </span></div>
<div><br>
</div>
<div>It is difficult to prove something does not exist.
Anyway, I will give it a try to prove such "later on"
does not exist.</div>
<div><br>
</div>
<div><span>Function ttm_bo_kunmap is the only function
to unmap. To prove this, search <span>ttm_bo_map_iomap,
only <span style="">ttm_bo_kunmap use this enum to
correctly kunmap.</span></span></span><br>
</div>
<div><span><span><span style=""><br>
</span></span></span></div>
<div><span><span><span style=""><span>Function ttm_bo_kunmap
is not called by ttm itself. This is a hint
that all TTM delay delete mechanism or unref
mechanism will NOT kunmap BO later on.</span><br>
</span></span></span></div>
<div><span><span><span style=""><span><br>
</span></span></span></span></div>
<div><span><span><span style=""><span><span style="">Function </span><span
style="">ttm_bo_kunmap is called by AMDGPU
function <span>amdgpu_bo_kunmap and <span>amdgpu_gem_prime_vunmap.</span></span></span><br>
</span></span></span></span></div>
<div><br>
</div>
<div>Search AMDGPU for <span style="">amdgpu_bo_kunmap.
All matches do not kunmap for scratch VRAM BO. <span>amdgpu_bo_free_kernel
is a suspect but the answer is still NO.</span></span></div>
<div><span style=""><span><br>
</span></span></div>
<div><span style=""><span>So all possibilities are
searched. Did I miss anything?</span></span></div>
<div><span style=""><span><br>
</span></span></div>
<div>Thanks,</div>
<div>Alex Bin Xie</div>
<div><br>
</div>
</div>
<hr tabindex="-1" style="display:inline-block; width:98%">
<div id="divRplyFwdMsg" dir="ltr"><font
style="font-size:11pt" color="#000000" face="Calibri,
sans-serif"><b>From:</b> Xie, AlexBin<br>
<b>Sent:</b> Tuesday, April 18, 2017 2:04:33 PM<br>
<b>To:</b> Christian König; Zhou, David(ChunMing); <a
moz-do-not-send="true"
class="moz-txt-link-abbreviated"
href="mailto:amd-gfx@lists.freedesktop.org">
amd-gfx@lists.freedesktop.org</a><br>
<b>Subject:</b> Re: [PATCH] dmr/amdgpu: Fix wrongly
unref of BO</font>
<div> </div>
</div>
<div>
<div id="divtagdefaultwrapper" dir="ltr"
style="font-size:12pt; color:#000000;
font-family:Calibri,Arial,Helvetica,sans-serif">
<p>Hi Christian,</p>
<p><br>
</p>
<p>Would you point out where/when will kunmap happen
for this BO when release? <span
style="font-size:12pt">It must be somewhere </span><span
style="font-size:12pt"></span><span
style="font-size:12pt">in some function calls</span><span
style="font-size:12pt">.</span></p>
<p><span style="font-size:12pt"><br>
</span></p>
<p><span style="font-size:12pt">I checked before I
asked for review. But I did not see such obvious
kunmap function call.</span></p>
<p><span style="font-size:12pt"><br>
</span></p>
<p><span style="font-size:12pt">If so, there should
be a comment in function <span>amdgpu_vram_scratch_fini </span>to
avoid future confusion.</span></p>
<br>
Thanks,
<div>Alex Bin Xie<br>
<div style="color:rgb(0,0,0)">
<hr tabindex="-1" style="display:inline-block;
width:98%">
<div id="divRplyFwdMsg" dir="ltr"><font
style="font-size:11pt" color="#000000"
face="Calibri, sans-serif"><b>From:</b>
Christian König
<a moz-do-not-send="true"
class="moz-txt-link-rfc2396E"
href="mailto:deathsimple@vodafone.de"><deathsimple@vodafone.de></a><br>
<b>Sent:</b> Tuesday, April 18, 2017 1:46 PM<br>
<b>To:</b> Xie, AlexBin; Zhou,
David(ChunMing); <a moz-do-not-send="true"
class="moz-txt-link-abbreviated"
href="mailto:amd-gfx@lists.freedesktop.org">
amd-gfx@lists.freedesktop.org</a><br>
<b>Subject:</b> Re: [PATCH] dmr/amdgpu: Fix
wrongly unref of BO</font>
<div> </div>
</div>
<div>
<div class="moz-cite-prefix">Hi AlexBin,<br>
<br>
No, David is right. This is a very common
coding pattern in the kernel module.<br>
<br>
Freeing up a BO while there still exists a
kernel mapping is perfectly ok, the mapping
will just be released a bit later on.<br>
<br>
So this code is actually perfectly ok and just
an optimization, but your patch breaks it and
creates a memory leak.<br>
<br>
Regards,<br>
Christian.<br>
<br>
Am 18.04.2017 um 17:17 schrieb Xie, AlexBin:<br>
</div>
<blockquote type="cite">
<div id="divtagdefaultwrapper" dir="ltr"
style="font-size:12pt; color:#000000;
font-family:Calibri,Arial,Helvetica,sans-serif">
<p>Hi David,</p>
<p><br>
</p>
<p>When <span style="font-size:12pt;
font-family:Calibri,Arial,Helvetica,sans-serif">amdgpu_bo_reserve
return errors, we cannot release the BO.
This is not a memory leak. General
speaking, memory leak is unnoticed and
unintentional.</span></p>
<p><span style="font-size:12pt;
font-family:Calibri,Arial,Helvetica,sans-serif"><br>
</span></p>
<p><span style="font-size:12pt;
font-family:Calibri,Arial,Helvetica,sans-serif">The
caller of function <span>amdgpu_vram_scratch_fini
ignores the return error value...</span></span></p>
<p><span style="font-size:12pt;
font-family:Calibri,Arial,Helvetica,sans-serif"><span><br>
</span></span></p>
<p><span style="font-size:12pt;
font-family:Calibri,Arial,Helvetica,sans-serif"><span>The
"memory leak" is not caused by my
patch. It is caused because reserving
BO fails.</span></span></p>
<p><span
style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt"><br>
</span></p>
<p><span style="font-size:12pt;
font-family:Calibri,Arial,Helvetica,sans-serif"><span></span></span><span
style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">This patch
only aim to make function </span><span
style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt">amdgpu_vram_scratch_fini
behave correctly.</span></p>
<p><span style="font-size:12pt;
font-family:Calibri,Arial,Helvetica,sans-serif"><span><br>
</span></span></p>
<p><span style="font-size:12pt;
font-family:Calibri,Arial,Helvetica,sans-serif"><span>To
follow up, we can add a warning
message when
<span
style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:16px">amdgpu_bo_reserve</span>
error happens in a different patch.</span></span></p>
<p><span style="font-size:12pt;
font-family:Calibri,Arial,Helvetica,sans-serif"><span><br>
</span></span></p>
<p><span style="font-size:12pt;
font-family:Calibri,Arial,Helvetica,sans-serif">If
function call </span><span
style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt">amdgpu_bo_reserve is
changed to uninterruptible, this
changes driver behaviour. Without a
substantial issue, I would be cautious
for such a change.</span></p>
<p><span style="font-size:12pt;
font-family:Calibri,Arial,Helvetica,sans-serif"><br>
</span></p>
<p>Thanks,</p>
<p>Alex Bin Xie</p>
<br>
<div style="color:rgb(0,0,0)">
<hr tabindex="-1"
style="display:inline-block; width:98%">
<div id="divRplyFwdMsg" dir="ltr"><font
style="font-size:11pt" color="#000000"
face="Calibri, sans-serif"><b>From:</b>
Zhou, David(ChunMing)<br>
<b>Sent:</b> Monday, April 17, 2017
10:38 PM<br>
<b>To:</b> Xie, AlexBin; <a
moz-do-not-send="true"
class="moz-txt-link-abbreviated"
href="mailto:amd-gfx@lists.freedesktop.org">
amd-gfx@lists.freedesktop.org</a><br>
<b>Subject:</b> Re: [PATCH]
dmr/amdgpu: Fix wrongly unref of BO</font>
<div> </div>
</div>
<div><br>
<br>
<div class="moz-cite-prefix">On
2017年04月17日 22:54, Xie, AlexBin wrote:<br>
</div>
<blockquote type="cite">
<div id="divtagdefaultwrapper"
dir="ltr" style="font-size:12pt;
color:#000000;
font-family:Calibri,Arial,Helvetica,sans-serif">
<p>Hi David,</p>
<p><br>
</p>
<p>Thanks for the comments. However,
please have look at <span
style="">amdgpu_bo_reserve</span>
definition.</p>
<p><span>static inline int
amdgpu_bo_reserve(struct
amdgpu_bo *bo, bool no_intr)</span><br>
</p>
</div>
</blockquote>
Ah, this is a wired wrapper for
ttm_bo_reserve.<br>
<br>
<blockquote type="cite">
<div id="divtagdefaultwrapper"
dir="ltr" style="font-size:12pt;
color:#000000;
font-family:Calibri,Arial,Helvetica,sans-serif">
<p><span><br>
</span></p>
<p><span>When we call this function
like the following:</span></p>
<p><span><span style=""> r =
amdgpu_bo_reserve(adev->vram_scratch.robj, false);</span><br style="">
</span><span
style="font-size:12pt">The false
means <span>interruptible</span>.</span><span><br>
</span></p>
<p><span style="font-size:12pt"><br>
</span></p>
<p><span style="font-size:12pt">On
the other hand, when <span
style="">amdgpu_bo_reserve
function return error, why do
we unref BO without kunmap and
unpin the BO? This is wrong
implementation when <span
style="">amdgpu_bo_reserve
return any error.</span></span></span></p>
</div>
</blockquote>
Yeah, I see your mean, it's in driver
un-loading, How about changing to no
interruptible? Your patch will make a
memleak if bo_reserve fails, but it
seems not matter. I have no strong
preference.<br>
<br>
Regards,<br>
David Zhou<br>
<blockquote type="cite">
<div id="divtagdefaultwrapper"
dir="ltr" style="font-size:12pt;
color:#000000;
font-family:Calibri,Arial,Helvetica,sans-serif">
<p><br>
</p>
Thanks,
<div>Alex Bin Xie<br>
<br>
<div style="color:rgb(0,0,0)">
<div>
<hr tabindex="-1"
style="display:inline-block;
width:98%">
<div id="x_divRplyFwdMsg"
dir="ltr"><font
style="font-size:11pt"
color="#000000"
face="Calibri, sans-serif"><b>From:</b>
Zhou, David(ChunMing)<br>
<b>Sent:</b> Friday, April
14, 2017 12:00 AM<br>
<b>To:</b> Xie, AlexBin; <a
moz-do-not-send="true"
class="moz-txt-link-abbreviated"
href="mailto:amd-gfx@lists.freedesktop.org">
amd-gfx@lists.freedesktop.org</a><br>
<b>Subject:</b> Re:
[PATCH] dmr/amdgpu: Fix
wrongly unref of BO</font>
<div> </div>
</div>
</div>
<font size="2"><span
style="font-size:10pt">
<div class="PlainText"><br>
<br>
On 2017年04月14日 05:34, Alex
Xie wrote:<br>
> According to comment
of amdgpu_bo_reserve,
amdgpu_bo_reserve<br>
> can return with
-ERESTARTSYS. When this
function was interrupted<br>
> by a signal, BO
should not be unref.
Otherwise the BO might be<br>
> released while is
kmapped and pinned, or BO
MIGHT be deref<br>
> multiple times, etc.<br>
r =
amdgpu_bo_reserve(adev->vram_scratch.robj,
false);<br>
we have specified
interruptible to false, so
-ERESTARTSYS isn't
possible <br>
here.<br>
<br>
Thanks,<br>
David Zhou<br>
><br>
> Change-Id:
If76071a768950a0d3ad9d5da7fcae04881807621<br>
> Signed-off-by: Alex
Xie <a
moz-do-not-send="true"
class="moz-txt-link-rfc2396E"
href="mailto:AlexBin.Xie@amd.com">
<AlexBin.Xie@amd.com></a><br>
> ---<br>
>
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
| 2 +-<br>
> 1 file changed, 1
insertion(+), 1
deletion(-)<br>
><br>
> diff --git
a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
> index
53996e3..1dcc2d1 100644<br>
> ---
a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
> +++
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
> @@ -355,8 +355,8 @@
static void
amdgpu_vram_scratch_fini(struct
amdgpu_device *adev)<br>
>
amdgpu_bo_kunmap(adev->vram_scratch.robj);<br>
>
amdgpu_bo_unpin(adev->vram_scratch.robj);<br>
>
amdgpu_bo_unreserve(adev->vram_scratch.robj);<br>
> +
amdgpu_bo_unref(&adev->vram_scratch.robj);<br>
> }<br>
> -
amdgpu_bo_unref(&adev->vram_scratch.robj);<br>
> }<br>
> <br>
> /**<br>
<br>
</div>
</span></font></div>
</div>
</div>
</blockquote>
<br>
</div>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre>_______________________________________________
amd-gfx mailing list
<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>
<a moz-do-not-send="true" class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx">https://lists.freedesktop.org/mailman/listinfo/amd-gfx</a>
</pre>
</blockquote>
<p><br>
</p>
</div>
</div>
</div>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre>_______________________________________________
amd-gfx mailing list
<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>
<a moz-do-not-send="true" class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx">https://lists.freedesktop.org/mailman/listinfo/amd-gfx</a>
</pre>
</blockquote>
<p><br>
</p>
</div>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
amd-gfx mailing list
<a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx">https://lists.freedesktop.org/mailman/listinfo/amd-gfx</a>
</pre>
</blockquote>
<p><br>
</p>
</body>
</html>