[PATCH 1/2] drm/amdkfd: Fix GWS queue count

Yat Sin, David David.YatSin at amd.com
Thu Apr 14 15:08:08 UTC 2022



> -----Original Message-----
> From: Kuehling, Felix <Felix.Kuehling at amd.com>
> Sent: Thursday, April 14, 2022 10:52 AM
> To: Yat Sin, David <David.YatSin at amd.com>; amd-gfx at lists.freedesktop.org
> Subject: Re: [PATCH 1/2] drm/amdkfd: Fix GWS queue count
> 
> 
> Am 2022-04-13 um 11:51 schrieb David Yat Sin:
> > Queue can be inactive during process termination. This would cause
> > dqm->gws_queue_count to not be decremented. There can only be 1 GWS
> > queue per device process so moving the logic out of loop.
> >
> > Signed-off-by: David Yat Sin <david.yatsin at amd.com>
> > ---
> >   .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c    | 12 ++++++-----
> -
> >   1 file changed, 6 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> > b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> > index acf4f7975850..7c107b88d944 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> > @@ -1945,17 +1945,17 @@ static int process_termination_cpsch(struct
> device_queue_manager *dqm,
> >   		else if (q->properties.type ==
> KFD_QUEUE_TYPE_SDMA_XGMI)
> >   			deallocate_sdma_queue(dqm, q);
> >
> > -		if (q->properties.is_active) {
> > +		if (q->properties.is_active)
> >   			decrement_queue_count(dqm, q->properties.type);
> > -			if (q->properties.is_gws) {
> > -				dqm->gws_queue_count--;
> > -				qpd->mapped_gws_queue = false;
> > -			}
> > -		}
> 
> This looks like the original intention was to decrement the counter for inactive
> GWS queues. This makes sense because this counter is used to determine
> whether the runlist is oversubscribed. Inactive queues are not in the runlist,
> so they should not be counted.
> 
> I see that the counter is updated when queues are activated and deactivated
> in update_queue. So IMO this patch is both incorrect and unnecessary. Did
> you see an actual problem with the GWS counter during process termination?
> If so, I'd like to know more to understand the root cause.
> 
> Regards,
>    Felix

Yes, when using a unit test (for example KFDGWSTest.Semaphore), 
1. Put a sleep in the middle of the application (after calling hsaKmtAllocQueueGWS)
2. Run application and kill the application which it is in sleep (application never calls queue.Destroy()).

Then inside kernel dqm->gws_queue_count keeps incrementing each time the experiment is repeated and never goes back to 0. This seems to be because the sleep causes the process to be evicted and q->properties.is_active is false so code does not enter that if statement.

Regards,
David

> 
> 
> >
> >   		dqm->total_queue_count--;
> >   	}
> >
> > +	if (qpd->mapped_gws_queue) {
> > +		dqm->gws_queue_count--;
> > +		qpd->mapped_gws_queue = false;
> > +	}
> > +
> >   	/* Unregister process */
> >   	list_for_each_entry_safe(cur, next_dpn, &dqm->queues, list) {
> >   		if (qpd == cur->qpd) {


More information about the amd-gfx mailing list