[PATCH] drm/xe: Decrement client count immediately on file close

Lucas De Marchi lucas.demarchi at intel.com
Thu Sep 19 11:55:35 UTC 2024


On Thu, Sep 19, 2024 at 01:49:04PM GMT, Vivekanandan, Balasubramani wrote:
>On 18.09.2024 15:43, Lucas De Marchi wrote:
>> On Wed, Sep 18, 2024 at 11:11:10AM GMT, Upadhyay, Tejas wrote:
>> >
>> >
>> > > -----Original Message-----
>> > > From: Vivekanandan, Balasubramani
>> > > <balasubramani.vivekanandan at intel.com>
>> > > Sent: Wednesday, September 18, 2024 3:33 PM
>> > > To: Upadhyay, Tejas <tejas.upadhyay at intel.com>; intel-
>> > > xe at lists.freedesktop.org
>> > > Cc: Nerlige Ramappa, Umesh <umesh.nerlige.ramappa at intel.com>;
>> > > Vishwanathapura, Niranjana <niranjana.vishwanathapura at intel.com>; De
>> > > Marchi, Lucas <lucas.demarchi at intel.com>
>> > > Subject: Re: [PATCH] drm/xe: Decrement client count immediately on file close
>> > >
>> > > On 18.09.2024 15:09, Upadhyay, Tejas wrote:
>> > > >
>> > > >
>> > > > > -----Original Message-----
>> > > > > From: Intel-xe <intel-xe-bounces at lists.freedesktop.org> On Behalf Of
>> > > > > Balasubramani Vivekanandan
>> > > > > Sent: Wednesday, September 18, 2024 1:42 PM
>> > > > > To: intel-xe at lists.freedesktop.org
>> > > > > Cc: Nerlige Ramappa, Umesh <umesh.nerlige.ramappa at intel.com>;
>> > > > > Vishwanathapura, Niranjana <niranjana.vishwanathapura at intel.com>; De
>> > > > > Marchi, Lucas <lucas.demarchi at intel.com>; Vivekanandan,
>> > > > > Balasubramani <balasubramani.vivekanandan at intel.com>
>> > > > > Subject: [PATCH] drm/xe: Decrement client count immediately on file
>> > > > > close
>> > > > >
>> > > > > Decrement the client count immediately on file close. It is not
>> > > > > required to be deferred to the resource cleanup function. Otherwise
>> > > > > there will be a small time window, where there will be a non-zero
>> > > > > client count even after closing all open file handles.
>> > > > > This affects ccs_mode(xe_compute) igt tests as these tests try to
>> > > > > change the ccs_mode immediately after closing all file handles, but
>> > > > > the driver rejects the ccs_mode change request as it sees a non-zero client
>> > > count.
>> > > > >
>> > > > > Fixes: ce8c161cbad4 ("drm/xe: Add ref counting for xe_file")
>> > > > > Signed-off-by: Balasubramani Vivekanandan
>> > > > > <balasubramani.vivekanandan at intel.com>
>> > > > > ---
>> > > > >  drivers/gpu/drm/xe/xe_device.c | 9 ++++-----
>> > > > >  1 file changed, 4 insertions(+), 5 deletions(-)
>> > > > >
>> > > > > diff --git a/drivers/gpu/drm/xe/xe_device.c
>> > > > > b/drivers/gpu/drm/xe/xe_device.c index 4d3c794f134c..3bccea6212ff
>> > > > > 100644
>> > > > > --- a/drivers/gpu/drm/xe/xe_device.c
>> > > > > +++ b/drivers/gpu/drm/xe/xe_device.c
>> > > > > @@ -107,17 +107,12 @@ static int xe_file_open(struct drm_device
>> > > > > *dev, struct drm_file *file)  static void xe_file_destroy(struct kref *ref)  {
>> > > > >  	struct xe_file *xef = container_of(ref, struct xe_file, refcount);
>> > > > > -	struct xe_device *xe = xef->xe;
>> > > > >
>> > > > >  	xa_destroy(&xef->exec_queue.xa);
>> > > > >  	mutex_destroy(&xef->exec_queue.lock);
>> > > > >  	xa_destroy(&xef->vm.xa);
>> > > > >  	mutex_destroy(&xef->vm.lock);
>> > > > >
>> > > > > -	spin_lock(&xe->clients.lock);
>> > > > > -	xe->clients.count--;
>> > > > > -	spin_unlock(&xe->clients.lock);
>> > > > > -
>> > > > >  	xe_drm_client_put(xef->client);
>> > > > >  	kfree(xef->process_name);
>> > > > >  	kfree(xef);
>> > > > > @@ -178,6 +173,10 @@ static void xe_file_close(struct drm_device
>> > > > > *dev, struct drm_file *file)
>> > > > >
>> > > > >  	xe_file_put(xef);
>> > > > >
>> > > > > +	spin_lock(&xe->clients.lock);
>> > > > > +	xe->clients.count--;
>> > > > > +	spin_unlock(&xe->clients.lock);
>> > > >
>> > > > The file_close here is sychronus and serialized call with respect to
>> > > userspace. Any settings done through sysfs post file_close should not required
>> > > this change as far as I know. Would please explain scenario better?
>> > >
>> > > In the current code, the client count is decremented in the function
>> > > xe_file_destroy which is not invoked synchronously from xe_file_close.
>> > > It is called when all references to xe_file are lost.
>> > > References to xe_file are held during creation of vm and exec_queues. So the
>> > > somebody might still be holding reference to xe_file while xe_file_close is
>> > > called. Therefore the invocation of xe_file_destroy might be deferred.
>> > > As of result, driver might see a non-zero client count even after all file handles
>> > > are actually closed which is incorrect. We can defer only the freeing of
>> > > resources to xe_file_destroy but the client count can be immediately adjusted
>> > > in xe_file_close.
>> >
>> > If whoever has hold ref, might still use ccs, why would we allow changing ccs mode in that case!
>>
>> if the file is closed, there wouldn't be any submission at all. ref is
>> kept alive for the data to be available for updates on the sw side only.
>>
>> but I'm wondering on the value of this client tracking... this is so
>> uncommon that we maybe should rely just on the list of open files being
>> empty?
>
>Using this client count also blocks changing the ccs mode even when any
>display activity increments the client count. During module load, fbdev
>initialization increments the client count. ccs_mode change should be
>independent of display.

hum... so also kernel clients? But I'm not sure it's required for kernel
clients as kernel itself is aware of when/if it's changing.

>
>I am thinking does it make sense to confirm if the all ccs engines are
>idle before attempting the ccs_mode change.
>@Niranjana your thoughts?


I think the problem is.... it changes the number of CCS engines. What
happens if a client doesn't have something active on CCS, but was
expecting it to match the value from a few seconds ago, before someone
changed CCS_MODE.

Lucas De Marchi

>
>Regards,
>Bala
>
>>
>> Lucas De Marchi
>>
>> >
>> > Tejas
>> > >
>> > > Regards,
>> > > Bala
>> > > >
>> > > > Tejas
>> > > > > +
>> > > > >  	xe_pm_runtime_put(xe);
>> > > > >  }
>> > > > >
>> > > > > --
>> > > > > 2.34.1
>> > > >


More information about the Intel-xe mailing list