<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<p style="font-family:Arial;font-size:10pt;color:#0078D7;margin:15pt;" align="Left">
[AMD Official Use Only - Internal Distribution Only]<br>
</p>
<br>
<div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Hi Felix,</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
There is no big picture unfortunately, just some improvements that I came to when navigating the code. </div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Regarding your suggestion, I have a concern. With the original code in unmap_queues_cpsch(), if amdkfd_fence_wait_timeout() fails, we won't release the runlist ib. I am not sure it is by design or just a small bug. If it is by design (probably for debugging
 when HWS hang), merging pm_send_unmap_queue and pm_release_ib together will break the design. </div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
If we agree to move in that direction, I agree with the part of the name changes because the original names are prone to cause confusion.</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Regards,</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Yong</div>
<div id="appendonsend"></div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Kuehling, Felix <Felix.Kuehling@amd.com><br>
<b>Sent:</b> Friday, November 22, 2019 4:21 PM<br>
<b>To:</b> Zhao, Yong <Yong.Zhao@amd.com>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org><br>
<b>Subject:</b> Re: [PATCH 2/2] drm/amdkfd: Move pm_create_runlist_ib() out of pm_send_runlist()</font>
<div> </div>
</div>
<div class="BodyFragment"><font size="2"><span style="font-size:11pt;">
<div class="PlainText">I'm not sure about this one. Looks like the interface is getting
<br>
needlessly more complicated. Now the caller has to keep track of the <br>
runlist IB address and size just to pass those to another function. I <br>
could understand this if there was a use case that needs to separate the <br>
allocation of the runlist and sending it to the HW. But I don't see that.<br>
<br>
Some background for why I think the interface is the way it is: The <br>
runlist IB is continuously executed by the HWS firmware. If the runlist <br>
is oversubscribed, the HWS firmware will loop through it. So the IB must <br>
remain allocated until pm_send_unmap_queue is called. Currently <br>
pm_send_runlist creates the runlist IB and sends it to the HWS. You're <br>
separating that into creation and sending. Do you see a case where you <br>
need to send the same runlist multiple times? Or do something else <br>
between creating the runlist and sending it to the HWS?<br>
<br>
pm_release_ib releases the runlist IB, assuming that he HWS is no longer <br>
using it. Maybe this could be combined with pm_send_unmap_queue. I'm not <br>
100% sure because there are some filter parameters that may leave some <br>
queues mapped. If the two can be combined, I'd suggest the following <br>
name and interface changes to reflect how I think this is being used today:<br>
<br>
  * pm_send_runlist -> pm_create_and_send_runlist<br>
  * pm_send_unmap_queue + pm_release_ib -> pm_preempt_and_free_runlist<br>
<br>
I see you're doing a lot of cleanup and refactoring in this area of the <br>
code. Is there some bigger picture here, some idea of the end-state <br>
you're trying to get to? Knowing where you're going with this may make <br>
it easier to review the code.<br>
<br>
Regards,<br>
   Felix<br>
<br>
On 2019-11-21 4:26 p.m., Yong Zhao wrote:<br>
> This is consistent with the calling sequence in unmap_queues_cpsch().<br>
><br>
> Change-Id: Ieb6714422c812d4f6ebbece34e339871471e4b5e<br>
> Signed-off-by: Yong Zhao <Yong.Zhao@amd.com><br>
> ---<br>
>   .../drm/amd/amdkfd/kfd_device_queue_manager.c | 18 +++++++++++++++--<br>
>   .../gpu/drm/amd/amdkfd/kfd_packet_manager.c   | 20 +++++--------------<br>
>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |  7 ++++++-<br>
>   3 files changed, 27 insertions(+), 18 deletions(-)<br>
><br>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c<br>
> index 510f2d1bb8bb..fd7d90136b94 100644<br>
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c<br>
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c<br>
> @@ -1302,6 +1302,8 @@ static int unmap_sdma_queues(struct device_queue_manager *dqm)<br>
>   static int map_queues_cpsch(struct device_queue_manager *dqm)<br>
>   {<br>
>        int retval;<br>
> +     uint64_t rl_ib_gpu_addr;<br>
> +     size_t rl_ib_size;<br>
>   <br>
>        if (!dqm->sched_running)<br>
>                return 0;<br>
> @@ -1310,15 +1312,27 @@ static int map_queues_cpsch(struct device_queue_manager *dqm)<br>
>        if (dqm->active_runlist)<br>
>                return 0;<br>
>   <br>
> -     retval = pm_send_runlist(&dqm->packets, &dqm->queues);<br>
> +     retval = pm_create_runlist_ib(&dqm->packets, &dqm->queues,<br>
> +                             &rl_ib_gpu_addr, &rl_ib_size);<br>
> +     if (retval)<br>
> +             goto fail_create_runlist_ib;<br>
> +<br>
> +     pr_debug("runlist IB address: 0x%llX\n", rl_ib_gpu_addr);<br>
> +<br>
> +     retval = pm_send_runlist(&dqm->packets, &dqm->queues,<br>
> +                     rl_ib_gpu_addr, rl_ib_size);<br>
>        pr_debug("%s sent runlist\n", __func__);<br>
>        if (retval) {<br>
>                pr_err("failed to execute runlist\n");<br>
> -             return retval;<br>
> +             goto fail_create_runlist_ib;<br>
>        }<br>
>        dqm->active_runlist = true;<br>
>   <br>
>        return retval;<br>
> +<br>
> +fail_create_runlist_ib:<br>
> +     pm_destroy_runlist_ib(&dqm->packets);<br>
> +     return retval;<br>
>   }<br>
>   <br>
>   /* dqm->lock mutex has to be locked before calling this function */<br>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c<br>
> index 4a9433257428..6ec54e9f9392 100644<br>
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c<br>
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c<br>
> @@ -116,7 +116,7 @@ static int pm_allocate_runlist_ib(struct packet_manager *pm,<br>
>        return retval;<br>
>   }<br>
>   <br>
> -static int pm_create_runlist_ib(struct packet_manager *pm,<br>
> +int pm_create_runlist_ib(struct packet_manager *pm,<br>
>                                struct list_head *queues,<br>
>                                uint64_t *rl_gpu_addr,<br>
>                                size_t *rl_size_bytes)<br>
> @@ -149,7 +149,6 @@ static int pm_create_runlist_ib(struct packet_manager *pm,<br>
>                /* build map process packet */<br>
>                if (proccesses_mapped >= pm->dqm->processes_count) {<br>
>                        pr_debug("Not enough space left in runlist IB\n");<br>
> -                     pm_destroy_runlist_ib(pm);<br>
>                        return -ENOMEM;<br>
>                }<br>
>   <br>
> @@ -299,20 +298,13 @@ int pm_send_set_resources(struct packet_manager *pm,<br>
>        return retval;<br>
>   }<br>
>   <br>
> -int pm_send_runlist(struct packet_manager *pm, struct list_head *dqm_queues)<br>
> +int pm_send_runlist(struct packet_manager *pm, struct list_head *dqm_queues,<br>
> +                     uint64_t rl_ib_gpu_addr, size_t rl_ib_size)<br>
>   {<br>
> -     uint64_t rl_gpu_ib_addr;<br>
>        uint32_t *rl_buffer;<br>
> -     size_t rl_ib_size, packet_size_dwords;<br>
> +     size_t packet_size_dwords;<br>
>        int retval;<br>
>   <br>
> -     retval = pm_create_runlist_ib(pm, dqm_queues, &rl_gpu_ib_addr,<br>
> -                                     &rl_ib_size);<br>
> -     if (retval)<br>
> -             goto fail_create_runlist_ib;<br>
> -<br>
> -     pr_debug("runlist IB address: 0x%llX\n", rl_gpu_ib_addr);<br>
> -<br>
>        packet_size_dwords = pm->pmf->runlist_size / sizeof(uint32_t);<br>
>        mutex_lock(&pm->lock);<br>
>   <br>
> @@ -321,7 +313,7 @@ int pm_send_runlist(struct packet_manager *pm, struct list_head *dqm_queues)<br>
>        if (retval)<br>
>                goto fail_acquire_packet_buffer;<br>
>   <br>
> -     retval = pm->pmf->runlist(pm, rl_buffer, rl_gpu_ib_addr,<br>
> +     retval = pm->pmf->runlist(pm, rl_buffer, rl_ib_gpu_addr,<br>
>                                        rl_ib_size / sizeof(uint32_t), false);<br>
>        if (retval)<br>
>                goto fail_create_runlist;<br>
> @@ -336,8 +328,6 @@ int pm_send_runlist(struct packet_manager *pm, struct list_head *dqm_queues)<br>
>        kq_rollback_packet(pm->priv_queue);<br>
>   fail_acquire_packet_buffer:<br>
>        mutex_unlock(&pm->lock);<br>
> -fail_create_runlist_ib:<br>
> -     pm_destroy_runlist_ib(pm);<br>
>        return retval;<br>
>   }<br>
>   <br>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h<br>
> index 389cda7c8f1a..6accb605b9f0 100644<br>
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h<br>
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h<br>
> @@ -980,7 +980,8 @@ int pm_init(struct packet_manager *pm, struct device_queue_manager *dqm);<br>
>   void pm_uninit(struct packet_manager *pm);<br>
>   int pm_send_set_resources(struct packet_manager *pm,<br>
>                                struct scheduling_resources *res);<br>
> -int pm_send_runlist(struct packet_manager *pm, struct list_head *dqm_queues);<br>
> +int pm_send_runlist(struct packet_manager *pm, struct list_head *dqm_queues,<br>
> +             uint64_t rl_ib_gpu_addr, size_t rl_ib_size);<br>
>   int pm_send_query_status(struct packet_manager *pm, uint64_t fence_address,<br>
>                                uint32_t fence_value);<br>
>   <br>
> @@ -989,6 +990,10 @@ int pm_send_unmap_queue(struct packet_manager *pm, enum kfd_queue_type type,<br>
>                        uint32_t filter_param, bool reset,<br>
>                        unsigned int sdma_engine);<br>
>   <br>
> +int pm_create_runlist_ib(struct packet_manager *pm,<br>
> +                             struct list_head *queues,<br>
> +                             uint64_t *rl_gpu_addr,<br>
> +                             size_t *rl_size_bytes);<br>
>   void pm_destroy_runlist_ib(struct packet_manager *pm);<br>
>   <br>
>   /* Following PM funcs can be shared among VI and AI */<br>
</div>
</span></font></div>
</div>
</body>
</html>