[PATCH v2 2/2] drm/xe: Use the filelist from drm for ccs_mode change

Matthew Brost matthew.brost at intel.com
Fri Nov 1 14:23:43 UTC 2024


On Fri, Nov 01, 2024 at 08:11:05AM -0500, Lucas De Marchi wrote:
> On Sat, Oct 12, 2024 at 06:45:57PM -0500, Lucas De Marchi wrote:
> > It looks like I forgot to Cc, so let's try again.
> > 
> > Lucas De Marchi
> > 
> > On Sat, Oct 12, 2024 at 04:43:16AM +0000, Matthew Brost wrote:
> > > On Thu, Oct 10, 2024 at 11:02:56AM -0500, Lucas De Marchi wrote:
> > > > On Thu, Oct 10, 2024 at 04:33:08PM +0530, Vivekanandan, Balasubramani wrote:
> > > > > On 08.10.2024 18:00, Matthew Brost wrote:
> > > > > > On Tue, Oct 08, 2024 at 05:24:32PM +0000, Matthew Brost wrote:
> > > > > > > On Tue, Oct 08, 2024 at 11:55:03AM -0500, Lucas De Marchi wrote:
> > > > > > > > On Tue, Oct 08, 2024 at 03:42:09PM +0000, Matthew Brost wrote:
> > > > > > > > > On Tue, Oct 08, 2024 at 10:20:15AM -0500, Lucas De Marchi wrote:
> > > > > > > > > > On Tue, Oct 08, 2024 at 02:54:19PM +0000, Matthew Brost wrote:
> > > > > > > > > > > On Tue, Oct 08, 2024 at 01:06:28PM +0530, Balasubramani Vivekanandan wrote:
> > > > > > > > > > > > Drop the exclusive client count tracking and use the filelist from the
> > > > > > > > > > > > drm to track the active clients. This also ensures the clients created
> > > > > > > > > > > > internally by the driver won't block changing the ccs mode.
> > > > > > > > > > > >
> > > > > > > > > > > > Fixes: ce8c161cbad4 ("drm/xe: Add ref counting for xe_file")
> > > > > > > > > > >
> > > > > > > > > > > Is this really fixing anything. As far as I can tell nothing upstream
> > > > > > > > > > > opens a file internally (i.e. xe_file_open) is never called directly.
> > > > > > > > > >
> > > > > > > > > > should fix this case:
> > > > > > > > > >
> > > > > > > > > > 	open()
> > > > > > > > > > 	close()
> > > > > > > > > > 			<---- race here
> > > > > > > > > > 	change_ccs_mode()
> > > > > > > > > >
> > > > > > > > > > because the close is not completely sync - the cleanup where the
> > > > > > > > > > previous number of clients is decremented is executed too late and
> > > > > > > > > > subject to a race fixed here.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Ah, ok. But then IMO just move the clients.count decrement to
> > > > > > > > > xe_file_close. I try to preach solid locking / layering and this seems
> > > > > > > > > to go against this idea.
> > > > > > > >
> > > > > > > > why do you want to track the exact same thing in 2 ways, one in drm and
> > > > > > > > the other in xe? "Are there clients connected?" is something drm_file
> > > > > > > > answer and doesn't need to be duplicated in the individual drivers.
> > > > > > > >
> > > > > > >
> > > > > > > Well layering, making assumptions about the file list means, and not
> > > > > > > randomly deciding what we can and can't do with locks we don't own.
> > > > > > >
> > > > > >
> > > > > > I looked at this a bit more, if we want to do it this way then I think
> > > > > > at a minimum we need a drm helper for this.
> > > > > >
> > > > > > e.g.,
> > > > > >
> > > > > > bool drm_device_user_clients_open(struct drm_device *dev)
> > > > > > {
> > > > > > 	lockdep_assert(&dev->filelist_mutex);
> > > > > >
> > > > > > 	return !!list_empty(&dev->filelist);
> > > > > > }
> > > > > 
> > > > > Still the driver would be required to use the filelist_mutex outside the
> > > > > drm layer. So will not solve the layering problem completely.
> > > > > Otherwise we can define two helper functions in drm, the first one would
> > > > > check there are no open files and returns true holding the mutex.
> > > > > The second function can be used to release the mutex.
> > > > > Does it sound too invasive?
> > > 
> > > I'd say helper to acquire / release the file lock indicating it is ok
> > > for other layers to use the lock. Then add helpers with annotaions (like
> > > my example) for internal access to that layer. Maybe a bit much, I find
> > > myself writing code in this style these days.
> > > 
> > > > 
> > > > yes, and I'd rather prefer not adding these helpers.  If someone wants
> > > > to add helpers later and convert the current use cases, that's a
> > > > cleanup/refactor that can be done on top. For me, these helpers are not
> > > > really helping.
> > > > 
> > > > Rodrigo / Thomas, any thought?
> > > 
> > > Another opinion would be good. I know this seems a bit pedantic but
> > > locking is about the most important thing a driver and certainly the
> > > most difficult to unwind is we do it incorrectly.
> 
> I chatted with Thomas today regarding this. He is also of the opinion
> that it'd be better to have wrappers in the drm layer. However since
> there are already users, it's ok to add this as is and prep a conversion
> on top.
> 
> From a quick look, we'd need to add:
> 
> 	drm_device_user_clients_lock()
> 	drm_device_user_clients_lock_interruptible()
> 	drm_device_user_clients_unlock()
> 	drm_device_user_clients_empty_locked()
> 	drm_device_user_clients_empty()
> 

This seem reasonsible and makes it clear drivers how drivers can use these.

> and discuss it in dri-devel to convert the current users. Maybe we will
> also need some renames or doc updates since drm_device has:
> 
> 	filelist, protected by filelist_mutex
> 	filelist_internal, protected by filelist_mutex
> 	clientlist, clientlist_mutex
>

Some doc updates would be good.
 
> which would point to actually add the helpers as
> drm_device_filelist_* rather than user_clients
>
> 
> For now, applied both patches to drm-xe-next.
>

Thanks for following up here. Will be on the lookup for dridevel patches.

Matt

> 
> Lucas De Marchi
> 
> > > 
> > > Matt
> > > 
> > > > 
> > > > Lucas De Marchi
> > > > 
> > > > > 
> > > > > Regards,
> > > > > Bala
> > > > > 
> > > > > >
> > > > > > Matt
> > > > > >
> > > > > > > > At least it's also used for similar reasons in amdgpu and vmwgfx:
> > > > > > > >
> > > > > > > > $ git grep -l -e "->filelist" -- drivers/gpu/drm/
> > > > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> > > > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> > > > > > > > drivers/gpu/drm/drm_client.c
> > > > > > > > drivers/gpu/drm/drm_debugfs.c
> > > > > > > > drivers/gpu/drm/drm_drv.c
> > > > > > > > drivers/gpu/drm/drm_file.c
> > > > > > > > drivers/gpu/drm/vmwgfx/vmwgfx_gem.c
> > > > > > > >
> > > > > > >
> > > > > > > The AMD / VMWGFX usage certainly looks wrong to me. If the file list was
> > > > > > > meant to be traversed by drivers there should be an exported iterator
> > > > > > > IMO.
> > > > > > >
> > > > > > > Matt
> > > > > > >
> > > > > > > >
> > > > > > > > Lucas De Marchi


More information about the Intel-xe mailing list