[PATCH] drm/xe: Decrement client count immediately on file close

Upadhyay, Tejas tejas.upadhyay at intel.com
Wed Sep 18 11:11:10 UTC 2024



> -----Original Message-----
> From: Vivekanandan, Balasubramani
> <balasubramani.vivekanandan at intel.com>
> Sent: Wednesday, September 18, 2024 3:33 PM
> To: Upadhyay, Tejas <tejas.upadhyay at intel.com>; intel-
> xe at lists.freedesktop.org
> Cc: Nerlige Ramappa, Umesh <umesh.nerlige.ramappa at intel.com>;
> Vishwanathapura, Niranjana <niranjana.vishwanathapura at intel.com>; De
> Marchi, Lucas <lucas.demarchi at intel.com>
> Subject: Re: [PATCH] drm/xe: Decrement client count immediately on file close
> 
> On 18.09.2024 15:09, Upadhyay, Tejas wrote:
> >
> >
> > > -----Original Message-----
> > > From: Intel-xe <intel-xe-bounces at lists.freedesktop.org> On Behalf Of
> > > Balasubramani Vivekanandan
> > > Sent: Wednesday, September 18, 2024 1:42 PM
> > > To: intel-xe at lists.freedesktop.org
> > > Cc: Nerlige Ramappa, Umesh <umesh.nerlige.ramappa at intel.com>;
> > > Vishwanathapura, Niranjana <niranjana.vishwanathapura at intel.com>; De
> > > Marchi, Lucas <lucas.demarchi at intel.com>; Vivekanandan,
> > > Balasubramani <balasubramani.vivekanandan at intel.com>
> > > Subject: [PATCH] drm/xe: Decrement client count immediately on file
> > > close
> > >
> > > Decrement the client count immediately on file close. It is not
> > > required to be deferred to the resource cleanup function. Otherwise
> > > there will be a small time window, where there will be a non-zero
> > > client count even after closing all open file handles.
> > > This affects ccs_mode(xe_compute) igt tests as these tests try to
> > > change the ccs_mode immediately after closing all file handles, but
> > > the driver rejects the ccs_mode change request as it sees a non-zero client
> count.
> > >
> > > Fixes: ce8c161cbad4 ("drm/xe: Add ref counting for xe_file")
> > > Signed-off-by: Balasubramani Vivekanandan
> > > <balasubramani.vivekanandan at intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/xe_device.c | 9 ++++-----
> > >  1 file changed, 4 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/xe/xe_device.c
> > > b/drivers/gpu/drm/xe/xe_device.c index 4d3c794f134c..3bccea6212ff
> > > 100644
> > > --- a/drivers/gpu/drm/xe/xe_device.c
> > > +++ b/drivers/gpu/drm/xe/xe_device.c
> > > @@ -107,17 +107,12 @@ static int xe_file_open(struct drm_device
> > > *dev, struct drm_file *file)  static void xe_file_destroy(struct kref *ref)  {
> > >  	struct xe_file *xef = container_of(ref, struct xe_file, refcount);
> > > -	struct xe_device *xe = xef->xe;
> > >
> > >  	xa_destroy(&xef->exec_queue.xa);
> > >  	mutex_destroy(&xef->exec_queue.lock);
> > >  	xa_destroy(&xef->vm.xa);
> > >  	mutex_destroy(&xef->vm.lock);
> > >
> > > -	spin_lock(&xe->clients.lock);
> > > -	xe->clients.count--;
> > > -	spin_unlock(&xe->clients.lock);
> > > -
> > >  	xe_drm_client_put(xef->client);
> > >  	kfree(xef->process_name);
> > >  	kfree(xef);
> > > @@ -178,6 +173,10 @@ static void xe_file_close(struct drm_device
> > > *dev, struct drm_file *file)
> > >
> > >  	xe_file_put(xef);
> > >
> > > +	spin_lock(&xe->clients.lock);
> > > +	xe->clients.count--;
> > > +	spin_unlock(&xe->clients.lock);
> >
> > The file_close here is sychronus and serialized call with respect to
> userspace. Any settings done through sysfs post file_close should not required
> this change as far as I know. Would please explain scenario better?
> 
> In the current code, the client count is decremented in the function
> xe_file_destroy which is not invoked synchronously from xe_file_close.
> It is called when all references to xe_file are lost.
> References to xe_file are held during creation of vm and exec_queues. So the
> somebody might still be holding reference to xe_file while xe_file_close is
> called. Therefore the invocation of xe_file_destroy might be deferred.
> As of result, driver might see a non-zero client count even after all file handles
> are actually closed which is incorrect. We can defer only the freeing of
> resources to xe_file_destroy but the client count can be immediately adjusted
> in xe_file_close.

If whoever has hold ref, might still use ccs, why would we allow changing ccs mode in that case!

Tejas
> 
> Regards,
> Bala
> >
> > Tejas
> > > +
> > >  	xe_pm_runtime_put(xe);
> > >  }
> > >
> > > --
> > > 2.34.1
> >


More information about the Intel-xe mailing list