<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">Am 05.02.2018 um 09:22 schrieb Chunming
Zhou:<br>
</div>
<blockquote type="cite"
cite="mid:d2ec14e8-d4cd-a312-5e23-ef2e2198d6ea@amd.com">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
On 2018年01月31日 18:54, Christian König wrote:<br>
<blockquote type="cite"
cite="mid:ff405b17-7f7c-3a9e-030b-28dcfc256546@amd.com">
<meta http-equiv="Content-Type" content="text/html;
charset=utf-8">
<div class="moz-cite-prefix">
<blockquote type="cite">So I think preventing validation on
same place is a simpler way:<br>
process B bo's place is fpfn~lpfn, it will only try to evict
LRU BOs in that range, while eviction, we just prevent those
validation to this range(fpfn~lpfn), if out of this range,
the allocation/validation still can be go on.<br>
<br>
Any negative?<br>
</blockquote>
That won't work either. The most common use of fpfn~lpfn range
is to limit a BO to visible VRAM, the other use cases are to
fullfil hardware limitations.<br>
<br>
So blocking this would result in blocking all normal GTT and
VRAM allocations, adding a mutex to validate would have the
same effect.<br>
</div>
</blockquote>
No, different effect, mutex will make the every allocation
serialized. My approach only effects busy case, that is said, only
when space is used up, the allocation is serialized in that range,
otherwise still in parallel.<br>
</blockquote>
<br>
That would still not allow for concurrent evictions, not as worse as
completely blocking concurrent validation but still not a good idea
as far as I can see.<br>
<br>
Going to give it a try today to use the ww_mutex owner to detect
eviction of already reserved BOs.<br>
<br>
Maybe that can be used instead,<br>
Christian.<br>
<br>
<blockquote type="cite"
cite="mid:d2ec14e8-d4cd-a312-5e23-ef2e2198d6ea@amd.com"> <br>
Regards,<br>
David Zhou<br>
<br>
<br>
<blockquote type="cite"
cite="mid:ff405b17-7f7c-3a9e-030b-28dcfc256546@amd.com">
<div class="moz-cite-prefix"> <br>
Regards,<br>
Christian.<br>
<br>
Am 31.01.2018 um 11:30 schrieb Chunming Zhou:<br>
</div>
<blockquote type="cite"
cite="mid:cdb7726b-a82d-494c-a98c-ca0100f323cc@amd.com">
<p><br>
</p>
<br>
<div class="moz-cite-prefix">On 2018年01月26日 22:35, Christian
König wrote:<br>
</div>
<blockquote type="cite"
cite="mid:a6d78bba-bbe9-769b-f9d3-665cdd8c04da@gmail.com">
<div class="moz-cite-prefix">I just realized that a change
I'm thinking about for a while would solve your problem as
well, but keep concurrent allocation possible.<br>
<br>
See ttm_mem_evict_first() unlocks the BO after evicting
it:<br>
<blockquote type="cite"> ttm_bo_del_from_lru(bo);<br>
spin_unlock(&glob->lru_lock);<br>
<br>
ret = ttm_bo_evict(bo, ctx);<br>
if (locked) {<br>
ttm_bo_unreserve(bo); <-------- here<br>
} else {<br>
spin_lock(&glob->lru_lock);<br>
ttm_bo_add_to_lru(bo);<br>
spin_unlock(&glob->lru_lock);<br>
}<br>
<br>
kref_put(&bo->list_kref,
ttm_bo_release_list);<br>
</blockquote>
<br>
The effect is that in your example process C can not only
beat process B once, but many many times because we run
into a ping/pong situation where B evicts resources while
C moves them back in.<br>
</div>
</blockquote>
For ping/pong case, I want to disable busy placement for
allocation period, only enable it for cs bo validation.<br>
<br>
<blockquote type="cite"
cite="mid:a6d78bba-bbe9-769b-f9d3-665cdd8c04da@gmail.com">
<div class="moz-cite-prefix"> <br>
For a while now I'm thinking about dropping those
reservations only after the original allocation succeeded.<br>
<br>
The effect would be that process C can still beat process
B initially, but sooner or process B would evict some
resources from process C as well and then it can succeed
with its allocation.<br>
</div>
</blockquote>
If it is from process C cs validation, process B still need
evict the resource only after process C command submission
completion.<br>
<br>
<blockquote type="cite"
cite="mid:a6d78bba-bbe9-769b-f9d3-665cdd8c04da@gmail.com">
<div class="moz-cite-prefix"> <br>
The problem is for this approach to work we need to core
change to the ww_mutexes to be able to handle this
efficiently.<br>
</div>
</blockquote>
Yes, ww_mutex doesn't support this net lock, which easily
deadlock without ticket and class.<br>
<br>
So I think preventing validation on same place is a simpler
way:<br>
process B bo's place is fpfn~lpfn, it will only try to evict
LRU BOs in that range, while eviction, we just prevent those
validation to this range(fpfn~lpfn), if out of this range, the
allocation/validation still can be go on.<br>
<br>
Any negative?<br>
<br>
Regards,<br>
David Zhou<br>
<blockquote type="cite"
cite="mid:a6d78bba-bbe9-769b-f9d3-665cdd8c04da@gmail.com">
<div class="moz-cite-prefix"> <br>
Regards,<br>
Christian.<br>
<br>
Am 26.01.2018 um 14:59 schrieb Christian König:<br>
</div>
<blockquote type="cite"
cite="mid:e3ed22b7-ca1a-09d0-47a0-204affef780f@amd.com">
<div class="moz-cite-prefix">I know, but this has the same
effect. You prevent concurrent allocation from
happening.<br>
<br>
What we could do is to pipeline reusing of deleted
memory as well, this makes it less likely to cause the
problem you are seeing because the evicting processes
doesn't need to block for deleted BOs any more.<br>
<br>
But that other processes can grab memory during eviction
is intentional. Otherwise greedy processes would
completely dominate command submission.<br>
<br>
Regards,<br>
Christian.<br>
<br>
Am 26.01.2018 um 14:50 schrieb Zhou, David(ChunMing):<br>
</div>
<blockquote type="cite" cite="mid:smartisan1516974623698">
<meta content="text/html; charset=utf-8">
<div name="smartisanmessageid"
id="smartisan1516974623698"><font color="#333333">I
don't want to prevent all, my new approach is to
prevent the later allocation is trying and ahead of
front to get the memory space that the front made
from eviction. <br>
</font><br>
<br>
<span id="smartisan_signature" style="font-size:0.8em;
display:inline; color:#888888">
<p dir="ltr">发自坚果 Pro</p>
</span>
<style type="text/css">
<!--
* body
{padding:0 16px 30px!important;
margin:0!important;
background-color:#ffffff;
line-height:1.4;
word-wrap:break-word;
word-break:normal}
div
{word-wrap:break-word;
word-break:normal}
p
{word-wrap:break-word;
word-break:normal;
text-indent:0pt!important}
span
{word-wrap:break-word;
word-break:normal}
a
{word-wrap:break-word;
word-break:normal}
td
{word-wrap:break-word;
word-break:break-all}
-->
</style>
<div class="quote">
<div style="margin:0 0px; font-size:105%"><font
style="line-height:1.4" color="#629140"><span>Christian
K鰊ig <a class="moz-txt-link-rfc2396E"
href="mailto:ckoenig.leichtzumerken@gmail.com"
moz-do-not-send="true"><ckoenig.leichtzumerken@gmail.com></a>
于 2018年1月26日 下午9:24写道:</span></font></div>
<br type="attribution">
</div>
</div>
<div>
<div class="moz-cite-prefix">Yes, exactly that's the
problem.<br>
<br>
See when you want to prevent a process B from
allocating the memory process A has evicted, you
need to prevent all concurrent allocation.<br>
<br>
And we don't do that because it causes a major
performance drop.<br>
<br>
Regards,<br>
Christian.<br>
<br>
Am 26.01.2018 um 14:21 schrieb Zhou,
David(ChunMing):<br>
</div>
<blockquote type="cite">
<meta content="text/html; charset=utf-8">
<div name="smartisanmessageid"
id="smartisan1516972869817"><font color="#333333">You
patch will prevent concurrent allocation, and
will result in allocation performance drop much.</font><br>
<br>
<span id="smartisan_signature"
style="font-size:0.8em; display:inline;
color:#888888">
<p dir="ltr">发自坚果 Pro</p>
</span>
<style type="text/css">
<!--
* body
{padding:0 16px 30px!important;
margin:0!important;
background-color:#ffffff;
line-height:1.4;
word-wrap:break-word;
word-break:normal}
div
{word-wrap:break-word;
word-break:normal}
p
{word-wrap:break-word;
word-break:normal;
text-indent:0pt!important}
span
{word-wrap:break-word;
word-break:normal}
a
{word-wrap:break-word;
word-break:normal}
td
{word-wrap:break-word;
word-break:break-all}
-->
</style>
<div class="quote">
<div style="margin:0 0px; font-size:105%"><font
style="line-height:1.4" color="#629140"><span>Christian
K鰊ig <a class="moz-txt-link-rfc2396E"
href="mailto:ckoenig.leichtzumerken@gmail.com"
moz-do-not-send="true"><ckoenig.leichtzumerken@gmail.com></a>
于 2018年1月26日 下午9:04写道:</span></font></div>
<br type="attribution">
</div>
</div>
<div>
<div class="moz-cite-prefix">Attached is what you
actually want to do cleanly implemented. But as
I said this is a NO-GO.<br>
<br>
Regards,<br>
Christian.<br>
<br>
Am 26.01.2018 um 13:43 schrieb Christian König:<br>
</div>
<blockquote type="cite">
<div class="moz-cite-prefix">
<blockquote type="cite">After my
investigation, this issue should be detect
of TTM design self, which breaks scheduling
balance.<br>
</blockquote>
Yeah, but again. This is indented design we
can't change easily.<br>
<br>
Regards,<br>
Christian.<br>
<br>
Am 26.01.2018 um 13:36 schrieb Zhou,
David(ChunMing):<br>
</div>
<blockquote type="cite">
<meta name="Generator" content="Microsoft
Exchange Server">
<style>
<!--
.EmailQuote
{margin-left:1pt;
padding-left:4pt;
border-left:#800000 2px solid}
-->
</style>
<div>
<div name="x_smartisanmessageid"
id="x_smartisan1516970199102"><font
color="#333333">I am off work, so reply
mail by phone, the format could not be
text.<br>
</font><br>
back to topic itself:<br>
the problem indeed happen on amdgpu
driver, someone reports me that
application runs with two instances, the
performance are different.<br>
I also reproduced the issue with unit
test(bo_eviction_test). They always think
our scheduler isn't working as expected.<br>
<br>
After my investigation, this issue should
be detect of TTM design self, which breaks
scheduling balance.<br>
<br>
Further, if we run containers for our gpu,
container A could run high score,
container B runs low score with same
benchmark.<br>
<br>
So this is bug that we need fix.<br>
<br>
Regards,<br>
David Zhou<br>
<br>
<span id="x_smartisan_signature"
style="font-size:0.8em; display:inline;
color:#888888">
<p dir="ltr">发自坚果 Pro</p>
</span>
<style type="text/css">
<!--
* body
{padding:0 16px 30px!important;
margin:0!important;
background-color:#ffffff;
line-height:1.4;
word-wrap:break-word;
word-break:normal}
div
{word-wrap:break-word;
word-break:normal}
p
{word-wrap:break-word;
word-break:normal;
text-indent:0pt!important}
span
{word-wrap:break-word;
word-break:normal}
a
{word-wrap:break-word;
word-break:normal}
td
{word-wrap:break-word;
word-break:break-all}
-->
</style>
<div class="x_quote">
<div style="margin:0 0px;
font-size:105%"><font
style="line-height:1.4"
color="#629140"><span>Christian K鰊ig
<a class="moz-txt-link-rfc2396E"
href="mailto:ckoenig.leichtzumerken@gmail.com"
moz-do-not-send="true"><ckoenig.leichtzumerken@gmail.com></a>
于 2018年1月26日 下午6:31写道:</span></font></div>
<br type="attribution">
</div>
</div>
</div>
<font size="2"><span style="font-size:11pt">
<div class="PlainText">Am 26.01.2018 um
11:22 schrieb Chunming Zhou:<br>
> there is a scheduling balance issue
about get node like:<br>
> a. process A allocates full memory
and use it for submission.<br>
> b. process B tries to allocates
memory, will wait for process A BO idle
in eviction.<br>
> c. process A completes the job,
process B eviction will put process A BO
node,<br>
> but in the meantime, process C is
comming to allocate BO, whill directly
get node successfully, and do
submission,<br>
> process B will again wait for
process C BO idle.<br>
> d. repeat the above setps, process
B could be delayed much more.<br>
><br>
> later allocation must not be ahead
of front in same place.<br>
<br>
Again NAK to the whole approach.<br>
<br>
At least with amdgpu the problem you
described above never occurs <br>
because evictions are pipelined
operations. We could only block for <br>
deleted regions to become free.<br>
<br>
But independent of that incoming memory
requests while we make room for <br>
eviction are intended to be served
first.<br>
<br>
Changing that is certainly a no-go cause
that would favor memory hungry <br>
applications over small clients.<br>
<br>
Regards,<br>
Christian.<br>
<br>
><br>
> Change-Id:
I3daa892e50f82226c552cc008a29e55894a98f18<br>
> Signed-off-by: Chunming Zhou <a
class="moz-txt-link-rfc2396E"
href="mailto:david1.zhou@amd.com"
moz-do-not-send="true">
<david1.zhou@amd.com></a><br>
> ---<br>
> drivers/gpu/drm/ttm/ttm_bo.c |
69
+++++++++++++++++++++++++++++++++++++++--<br>
> include/drm/ttm/ttm_bo_api.h
| 7 +++++<br>
> include/drm/ttm/ttm_bo_driver.h
| 7 +++++<br>
> 3 files changed, 80
insertions(+), 3 deletions(-)<br>
><br>
> diff --git
a/drivers/gpu/drm/ttm/ttm_bo.c
b/drivers/gpu/drm/ttm/ttm_bo.c<br>
> index d33a6bb742a1..558ec2cf465d
100644<br>
> --- a/drivers/gpu/drm/ttm/ttm_bo.c<br>
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c<br>
> @@ -841,6 +841,58 @@ static int
ttm_bo_add_move_fence(struct
ttm_buffer_object *bo,<br>
> return 0;<br>
> }<br>
> <br>
> +static void
ttm_man_init_waiter(struct ttm_bo_waiter
*waiter,<br>
> +
struct ttm_buffer_object *bo,<br>
> + const
struct ttm_place *place)<br>
> +{<br>
> + waiter->tbo = bo;<br>
> + memcpy((void
*)&waiter->place, (void *)place,
sizeof(*place));<br>
> +
INIT_LIST_HEAD(&waiter->list);<br>
> +}<br>
> +<br>
> +static void
ttm_man_add_waiter(struct
ttm_mem_type_manager *man,<br>
> + struct
ttm_bo_waiter *waiter)<br>
> +{<br>
> + if (!waiter)<br>
> + return;<br>
> +
spin_lock(&man->wait_lock);<br>
> +
list_add_tail(&waiter->list,
&man->waiter_list);<br>
> +
spin_unlock(&man->wait_lock);<br>
> +}<br>
> +<br>
> +static void
ttm_man_del_waiter(struct
ttm_mem_type_manager *man,<br>
> + struct
ttm_bo_waiter *waiter)<br>
> +{<br>
> + if (!waiter)<br>
> + return;<br>
> +
spin_lock(&man->wait_lock);<br>
> + if
(!list_empty(&waiter->list))<br>
> +
list_del(&waiter->list);<br>
> +
spin_unlock(&man->wait_lock);<br>
> + kfree(waiter);<br>
> +}<br>
> +<br>
> +int ttm_man_check_bo(struct
ttm_mem_type_manager *man,<br>
> + struct
ttm_buffer_object *bo,<br>
> + const struct
ttm_place *place)<br>
> +{<br>
> + struct ttm_bo_waiter *waiter,
*tmp;<br>
> +<br>
> +
spin_lock(&man->wait_lock);<br>
> +
list_for_each_entry_safe(waiter, tmp,
&man->waiter_list, list) {<br>
> + if ((bo !=
waiter->tbo) &&<br>
> + ((place->fpfn
>= waiter->place.fpfn &&<br>
> + place->fpfn
<= waiter->place.lpfn) ||<br>
> + (place->lpfn
<= waiter->place.lpfn &&
place->lpfn >=<br>
> +
waiter->place.fpfn)))<br>
> + goto later_bo;<br>
> + }<br>
> +
spin_unlock(&man->wait_lock);<br>
> + return true;<br>
> +later_bo:<br>
> +
spin_unlock(&man->wait_lock);<br>
> + return false;<br>
> +}<br>
> /**<br>
> * Repeatedly evict memory from
the LRU for @mem_type until we create
enough<br>
> * space, or we've evicted
everything and there isn't enough space.<br>
> @@ -853,17 +905,26 @@ static int
ttm_bo_mem_force_space(struct
ttm_buffer_object *bo,<br>
> {<br>
> struct ttm_bo_device *bdev =
bo->bdev;<br>
> struct ttm_mem_type_manager
*man = &bdev->man[mem_type];<br>
> + struct ttm_bo_waiter waiter;<br>
> int ret;<br>
> <br>
> +
ttm_man_init_waiter(&waiter, bo,
place);<br>
> + ttm_man_add_waiter(man,
&waiter);<br>
> do {<br>
> ret =
(*man->func->get_node)(man, bo,
place, mem);<br>
> - if (unlikely(ret !=
0))<br>
> + if (unlikely(ret !=
0)) {<br>
> +
ttm_man_del_waiter(man, &waiter);<br>
> return ret;<br>
> - if (mem->mm_node)<br>
> + }<br>
> + if (mem->mm_node)
{<br>
> +
ttm_man_del_waiter(man, &waiter);<br>
> break;<br>
> + }<br>
> ret =
ttm_mem_evict_first(bdev, mem_type,
place, ctx);<br>
> - if (unlikely(ret !=
0))<br>
> + if (unlikely(ret !=
0)) {<br>
> +
ttm_man_del_waiter(man, &waiter);<br>
> return ret;<br>
> + }<br>
> } while (1);<br>
> mem->mem_type = mem_type;<br>
> return
ttm_bo_add_move_fence(bo, man, mem);<br>
> @@ -1450,6 +1511,8 @@ int
ttm_bo_init_mm(struct ttm_bo_device
*bdev, unsigned type,<br>
> man->use_io_reserve_lru =
false;<br>
>
mutex_init(&man->io_reserve_mutex);<br>
>
spin_lock_init(&man->move_lock);<br>
> +
spin_lock_init(&man->wait_lock);<br>
> +
INIT_LIST_HEAD(&man->waiter_list);<br>
>
INIT_LIST_HEAD(&man->io_reserve_lru);<br>
> <br>
> ret =
bdev->driver->init_mem_type(bdev,
type, man);<br>
> diff --git
a/include/drm/ttm/ttm_bo_api.h
b/include/drm/ttm/ttm_bo_api.h<br>
> index 2cd025c2abe7..0fce4dbd02e7
100644<br>
> --- a/include/drm/ttm/ttm_bo_api.h<br>
> +++ b/include/drm/ttm/ttm_bo_api.h<br>
> @@ -40,6 +40,7 @@<br>
> #include <linux/mm.h><br>
> #include <linux/bitmap.h><br>
> #include
<linux/reservation.h><br>
> +#include
<drm/ttm/ttm_placement.h><br>
> <br>
> struct ttm_bo_device;<br>
> <br>
> @@ -232,6 +233,12 @@ struct
ttm_buffer_object {<br>
> struct mutex wu_mutex;<br>
> };<br>
> <br>
> +struct ttm_bo_waiter {<br>
> + struct ttm_buffer_object
*tbo;<br>
> + struct ttm_place place;<br>
> + struct list_head list;<br>
> +};<br>
> +<br>
> /**<br>
> * struct ttm_bo_kmap_obj<br>
> *<br>
> diff --git
a/include/drm/ttm/ttm_bo_driver.h
b/include/drm/ttm/ttm_bo_driver.h<br>
> index 9b417eb2df20..dc6b8b4c9e06
100644<br>
> ---
a/include/drm/ttm/ttm_bo_driver.h<br>
> +++
b/include/drm/ttm/ttm_bo_driver.h<br>
> @@ -293,6 +293,10 @@ struct
ttm_mem_type_manager {<br>
> bool io_reserve_fastpath;<br>
> spinlock_t move_lock;<br>
> <br>
> + /* waiters in list */<br>
> + spinlock_t wait_lock;<br>
> + struct list_head waiter_list;<br>
> +<br>
> /*<br>
> * Protected by
@io_reserve_mutex:<br>
> */<br>
> @@ -748,6 +752,9 @@ int
ttm_bo_mem_space(struct
ttm_buffer_object *bo,<br>
> struct
ttm_mem_reg *mem,<br>
> struct
ttm_operation_ctx *ctx);<br>
> <br>
> +int ttm_man_check_bo(struct
ttm_mem_type_manager *man,<br>
> + struct
ttm_buffer_object *bo,<br>
> + const struct
ttm_place *place);<br>
> void ttm_bo_mem_put(struct
ttm_buffer_object *bo, struct
ttm_mem_reg *mem);<br>
> void ttm_bo_mem_put_locked(struct
ttm_buffer_object *bo,<br>
> struct
ttm_mem_reg *mem);<br>
<br>
</div>
</span></font><br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre>_______________________________________________
amd-gfx mailing list
<a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/amd-gfx</a>
</pre>
</blockquote>
<br>
</blockquote>
<br>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre>_______________________________________________
dri-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a>
</pre>
</blockquote>
<br>
</div>
</blockquote>
<br>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
amd-gfx mailing list
<a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/amd-gfx</a>
</pre>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
amd-gfx mailing list
<a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx">https://lists.freedesktop.org/mailman/listinfo/amd-gfx</a>
</pre>
</blockquote>
<br>
</body>
</html>