[PATCH] drm/xe: Decrement client count immediately on file close

Vivekanandan, Balasubramani balasubramani.vivekanandan at intel.com
Tue Oct 8 07:44:27 UTC 2024


On 19.09.2024 06:55, Lucas De Marchi wrote:
> On Thu, Sep 19, 2024 at 01:49:04PM GMT, Vivekanandan, Balasubramani wrote:
> > On 18.09.2024 15:43, Lucas De Marchi wrote:
> > > On Wed, Sep 18, 2024 at 11:11:10AM GMT, Upadhyay, Tejas wrote:
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Vivekanandan, Balasubramani
> > > > > <balasubramani.vivekanandan at intel.com>
> > > > > Sent: Wednesday, September 18, 2024 3:33 PM
> > > > > To: Upadhyay, Tejas <tejas.upadhyay at intel.com>; intel-
> > > > > xe at lists.freedesktop.org
> > > > > Cc: Nerlige Ramappa, Umesh <umesh.nerlige.ramappa at intel.com>;
> > > > > Vishwanathapura, Niranjana <niranjana.vishwanathapura at intel.com>; De
> > > > > Marchi, Lucas <lucas.demarchi at intel.com>
> > > > > Subject: Re: [PATCH] drm/xe: Decrement client count immediately on file close
> > > > >
> > > > > On 18.09.2024 15:09, Upadhyay, Tejas wrote:
> > > > > >
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Intel-xe <intel-xe-bounces at lists.freedesktop.org> On Behalf Of
> > > > > > > Balasubramani Vivekanandan
> > > > > > > Sent: Wednesday, September 18, 2024 1:42 PM
> > > > > > > To: intel-xe at lists.freedesktop.org
> > > > > > > Cc: Nerlige Ramappa, Umesh <umesh.nerlige.ramappa at intel.com>;
> > > > > > > Vishwanathapura, Niranjana <niranjana.vishwanathapura at intel.com>; De
> > > > > > > Marchi, Lucas <lucas.demarchi at intel.com>; Vivekanandan,
> > > > > > > Balasubramani <balasubramani.vivekanandan at intel.com>
> > > > > > > Subject: [PATCH] drm/xe: Decrement client count immediately on file
> > > > > > > close
> > > > > > >
> > > > > > > Decrement the client count immediately on file close. It is not
> > > > > > > required to be deferred to the resource cleanup function. Otherwise
> > > > > > > there will be a small time window, where there will be a non-zero
> > > > > > > client count even after closing all open file handles.
> > > > > > > This affects ccs_mode(xe_compute) igt tests as these tests try to
> > > > > > > change the ccs_mode immediately after closing all file handles, but
> > > > > > > the driver rejects the ccs_mode change request as it sees a non-zero client
> > > > > count.
> > > > > > >
> > > > > > > Fixes: ce8c161cbad4 ("drm/xe: Add ref counting for xe_file")
> > > > > > > Signed-off-by: Balasubramani Vivekanandan
> > > > > > > <balasubramani.vivekanandan at intel.com>
> > > > > > > ---
> > > > > > >  drivers/gpu/drm/xe/xe_device.c | 9 ++++-----
> > > > > > >  1 file changed, 4 insertions(+), 5 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/gpu/drm/xe/xe_device.c
> > > > > > > b/drivers/gpu/drm/xe/xe_device.c index 4d3c794f134c..3bccea6212ff
> > > > > > > 100644
> > > > > > > --- a/drivers/gpu/drm/xe/xe_device.c
> > > > > > > +++ b/drivers/gpu/drm/xe/xe_device.c
> > > > > > > @@ -107,17 +107,12 @@ static int xe_file_open(struct drm_device
> > > > > > > *dev, struct drm_file *file)  static void xe_file_destroy(struct kref *ref)  {
> > > > > > >  	struct xe_file *xef = container_of(ref, struct xe_file, refcount);
> > > > > > > -	struct xe_device *xe = xef->xe;
> > > > > > >
> > > > > > >  	xa_destroy(&xef->exec_queue.xa);
> > > > > > >  	mutex_destroy(&xef->exec_queue.lock);
> > > > > > >  	xa_destroy(&xef->vm.xa);
> > > > > > >  	mutex_destroy(&xef->vm.lock);
> > > > > > >
> > > > > > > -	spin_lock(&xe->clients.lock);
> > > > > > > -	xe->clients.count--;
> > > > > > > -	spin_unlock(&xe->clients.lock);
> > > > > > > -
> > > > > > >  	xe_drm_client_put(xef->client);
> > > > > > >  	kfree(xef->process_name);
> > > > > > >  	kfree(xef);
> > > > > > > @@ -178,6 +173,10 @@ static void xe_file_close(struct drm_device
> > > > > > > *dev, struct drm_file *file)
> > > > > > >
> > > > > > >  	xe_file_put(xef);
> > > > > > >
> > > > > > > +	spin_lock(&xe->clients.lock);
> > > > > > > +	xe->clients.count--;
> > > > > > > +	spin_unlock(&xe->clients.lock);
> > > > > >
> > > > > > The file_close here is sychronus and serialized call with respect to
> > > > > userspace. Any settings done through sysfs post file_close should not required
> > > > > this change as far as I know. Would please explain scenario better?
> > > > >
> > > > > In the current code, the client count is decremented in the function
> > > > > xe_file_destroy which is not invoked synchronously from xe_file_close.
> > > > > It is called when all references to xe_file are lost.
> > > > > References to xe_file are held during creation of vm and exec_queues. So the
> > > > > somebody might still be holding reference to xe_file while xe_file_close is
> > > > > called. Therefore the invocation of xe_file_destroy might be deferred.
> > > > > As of result, driver might see a non-zero client count even after all file handles
> > > > > are actually closed which is incorrect. We can defer only the freeing of
> > > > > resources to xe_file_destroy but the client count can be immediately adjusted
> > > > > in xe_file_close.
> > > >
> > > > If whoever has hold ref, might still use ccs, why would we allow changing ccs mode in that case!
> > > 
> > > if the file is closed, there wouldn't be any submission at all. ref is
> > > kept alive for the data to be available for updates on the sw side only.
> > > 
> > > but I'm wondering on the value of this client tracking... this is so
> > > uncommon that we maybe should rely just on the list of open files being
> > > empty?
> > 
> > Using this client count also blocks changing the ccs mode even when any
> > display activity increments the client count. During module load, fbdev
> > initialization increments the client count. ccs_mode change should be
> > independent of display.
> 
> hum... so also kernel clients? But I'm not sure it's required for kernel
> clients as kernel itself is aware of when/if it's changing.
> 
> > 
> > I am thinking does it make sense to confirm if the all ccs engines are
> > idle before attempting the ccs_mode change.
> > @Niranjana your thoughts?
> 
> 
> I think the problem is.... it changes the number of CCS engines. What
> happens if a client doesn't have something active on CCS, but was
> expecting it to match the value from a few seconds ago, before someone
> changed CCS_MODE.

Posted a new series replacing this one -
https://patchwork.freedesktop.org/series/139679/.
Please ignore this series.

Regards,
Bala

> 
> Lucas De Marchi
> 
> > 
> > Regards,
> > Bala
> > 
> > > 
> > > Lucas De Marchi
> > > 
> > > >
> > > > Tejas
> > > > >
> > > > > Regards,
> > > > > Bala
> > > > > >
> > > > > > Tejas
> > > > > > > +
> > > > > > >  	xe_pm_runtime_put(xe);
> > > > > > >  }
> > > > > > >
> > > > > > > --
> > > > > > > 2.34.1
> > > > > >


More information about the Intel-xe mailing list