[PATCH] drm/xe: Decrement client count immediately on file close

Lucas De Marchi lucas.demarchi at intel.com
Wed Sep 18 20:43:19 UTC 2024


On Wed, Sep 18, 2024 at 11:11:10AM GMT, Upadhyay, Tejas wrote:
>
>
>> -----Original Message-----
>> From: Vivekanandan, Balasubramani
>> <balasubramani.vivekanandan at intel.com>
>> Sent: Wednesday, September 18, 2024 3:33 PM
>> To: Upadhyay, Tejas <tejas.upadhyay at intel.com>; intel-
>> xe at lists.freedesktop.org
>> Cc: Nerlige Ramappa, Umesh <umesh.nerlige.ramappa at intel.com>;
>> Vishwanathapura, Niranjana <niranjana.vishwanathapura at intel.com>; De
>> Marchi, Lucas <lucas.demarchi at intel.com>
>> Subject: Re: [PATCH] drm/xe: Decrement client count immediately on file close
>>
>> On 18.09.2024 15:09, Upadhyay, Tejas wrote:
>> >
>> >
>> > > -----Original Message-----
>> > > From: Intel-xe <intel-xe-bounces at lists.freedesktop.org> On Behalf Of
>> > > Balasubramani Vivekanandan
>> > > Sent: Wednesday, September 18, 2024 1:42 PM
>> > > To: intel-xe at lists.freedesktop.org
>> > > Cc: Nerlige Ramappa, Umesh <umesh.nerlige.ramappa at intel.com>;
>> > > Vishwanathapura, Niranjana <niranjana.vishwanathapura at intel.com>; De
>> > > Marchi, Lucas <lucas.demarchi at intel.com>; Vivekanandan,
>> > > Balasubramani <balasubramani.vivekanandan at intel.com>
>> > > Subject: [PATCH] drm/xe: Decrement client count immediately on file
>> > > close
>> > >
>> > > Decrement the client count immediately on file close. It is not
>> > > required to be deferred to the resource cleanup function. Otherwise
>> > > there will be a small time window, where there will be a non-zero
>> > > client count even after closing all open file handles.
>> > > This affects ccs_mode(xe_compute) igt tests as these tests try to
>> > > change the ccs_mode immediately after closing all file handles, but
>> > > the driver rejects the ccs_mode change request as it sees a non-zero client
>> count.
>> > >
>> > > Fixes: ce8c161cbad4 ("drm/xe: Add ref counting for xe_file")
>> > > Signed-off-by: Balasubramani Vivekanandan
>> > > <balasubramani.vivekanandan at intel.com>
>> > > ---
>> > >  drivers/gpu/drm/xe/xe_device.c | 9 ++++-----
>> > >  1 file changed, 4 insertions(+), 5 deletions(-)
>> > >
>> > > diff --git a/drivers/gpu/drm/xe/xe_device.c
>> > > b/drivers/gpu/drm/xe/xe_device.c index 4d3c794f134c..3bccea6212ff
>> > > 100644
>> > > --- a/drivers/gpu/drm/xe/xe_device.c
>> > > +++ b/drivers/gpu/drm/xe/xe_device.c
>> > > @@ -107,17 +107,12 @@ static int xe_file_open(struct drm_device
>> > > *dev, struct drm_file *file)  static void xe_file_destroy(struct kref *ref)  {
>> > >  	struct xe_file *xef = container_of(ref, struct xe_file, refcount);
>> > > -	struct xe_device *xe = xef->xe;
>> > >
>> > >  	xa_destroy(&xef->exec_queue.xa);
>> > >  	mutex_destroy(&xef->exec_queue.lock);
>> > >  	xa_destroy(&xef->vm.xa);
>> > >  	mutex_destroy(&xef->vm.lock);
>> > >
>> > > -	spin_lock(&xe->clients.lock);
>> > > -	xe->clients.count--;
>> > > -	spin_unlock(&xe->clients.lock);
>> > > -
>> > >  	xe_drm_client_put(xef->client);
>> > >  	kfree(xef->process_name);
>> > >  	kfree(xef);
>> > > @@ -178,6 +173,10 @@ static void xe_file_close(struct drm_device
>> > > *dev, struct drm_file *file)
>> > >
>> > >  	xe_file_put(xef);
>> > >
>> > > +	spin_lock(&xe->clients.lock);
>> > > +	xe->clients.count--;
>> > > +	spin_unlock(&xe->clients.lock);
>> >
>> > The file_close here is sychronus and serialized call with respect to
>> userspace. Any settings done through sysfs post file_close should not required
>> this change as far as I know. Would please explain scenario better?
>>
>> In the current code, the client count is decremented in the function
>> xe_file_destroy which is not invoked synchronously from xe_file_close.
>> It is called when all references to xe_file are lost.
>> References to xe_file are held during creation of vm and exec_queues. So the
>> somebody might still be holding reference to xe_file while xe_file_close is
>> called. Therefore the invocation of xe_file_destroy might be deferred.
>> As of result, driver might see a non-zero client count even after all file handles
>> are actually closed which is incorrect. We can defer only the freeing of
>> resources to xe_file_destroy but the client count can be immediately adjusted
>> in xe_file_close.
>
>If whoever has hold ref, might still use ccs, why would we allow changing ccs mode in that case!

if the file is closed, there wouldn't be any submission at all. ref is
kept alive for the data to be available for updates on the sw side only.

but I'm wondering on the value of this client tracking... this is so
uncommon that we maybe should rely just on the list of open files being
empty?

Lucas De Marchi

>
>Tejas
>>
>> Regards,
>> Bala
>> >
>> > Tejas
>> > > +
>> > >  	xe_pm_runtime_put(xe);
>> > >  }
>> > >
>> > > --
>> > > 2.34.1
>> >


More information about the Intel-xe mailing list