[PATCH 1/2] drm/amdkfd: Fix GWS queue count
Yat Sin, David
David.YatSin at amd.com
Thu Apr 14 15:08:08 UTC 2022
> -----Original Message-----
> From: Kuehling, Felix <Felix.Kuehling at amd.com>
> Sent: Thursday, April 14, 2022 10:52 AM
> To: Yat Sin, David <David.YatSin at amd.com>; amd-gfx at lists.freedesktop.org
> Subject: Re: [PATCH 1/2] drm/amdkfd: Fix GWS queue count
>
>
> Am 2022-04-13 um 11:51 schrieb David Yat Sin:
> > Queue can be inactive during process termination. This would cause
> > dqm->gws_queue_count to not be decremented. There can only be 1 GWS
> > queue per device process so moving the logic out of loop.
> >
> > Signed-off-by: David Yat Sin <david.yatsin at amd.com>
> > ---
> > .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 12 ++++++-----
> -
> > 1 file changed, 6 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> > b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> > index acf4f7975850..7c107b88d944 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> > @@ -1945,17 +1945,17 @@ static int process_termination_cpsch(struct
> device_queue_manager *dqm,
> > else if (q->properties.type ==
> KFD_QUEUE_TYPE_SDMA_XGMI)
> > deallocate_sdma_queue(dqm, q);
> >
> > - if (q->properties.is_active) {
> > + if (q->properties.is_active)
> > decrement_queue_count(dqm, q->properties.type);
> > - if (q->properties.is_gws) {
> > - dqm->gws_queue_count--;
> > - qpd->mapped_gws_queue = false;
> > - }
> > - }
>
> This looks like the original intention was to decrement the counter for inactive
> GWS queues. This makes sense because this counter is used to determine
> whether the runlist is oversubscribed. Inactive queues are not in the runlist,
> so they should not be counted.
>
> I see that the counter is updated when queues are activated and deactivated
> in update_queue. So IMO this patch is both incorrect and unnecessary. Did
> you see an actual problem with the GWS counter during process termination?
> If so, I'd like to know more to understand the root cause.
>
> Regards,
> Felix
Yes, when using a unit test (for example KFDGWSTest.Semaphore),
1. Put a sleep in the middle of the application (after calling hsaKmtAllocQueueGWS)
2. Run application and kill the application which it is in sleep (application never calls queue.Destroy()).
Then inside kernel dqm->gws_queue_count keeps incrementing each time the experiment is repeated and never goes back to 0. This seems to be because the sleep causes the process to be evicted and q->properties.is_active is false so code does not enter that if statement.
Regards,
David
>
>
> >
> > dqm->total_queue_count--;
> > }
> >
> > + if (qpd->mapped_gws_queue) {
> > + dqm->gws_queue_count--;
> > + qpd->mapped_gws_queue = false;
> > + }
> > +
> > /* Unregister process */
> > list_for_each_entry_safe(cur, next_dpn, &dqm->queues, list) {
> > if (qpd == cur->qpd) {
More information about the amd-gfx
mailing list