[PATCH v2 2/2] drm/xe: Use the filelist from drm for ccs_mode change

Lucas De Marchi lucas.demarchi at intel.com
Sat Oct 12 23:45:57 UTC 2024


It looks like I forgot to Cc, so let's try again.

Lucas De Marchi

On Sat, Oct 12, 2024 at 04:43:16AM +0000, Matthew Brost wrote:
>On Thu, Oct 10, 2024 at 11:02:56AM -0500, Lucas De Marchi wrote:
>> On Thu, Oct 10, 2024 at 04:33:08PM +0530, Vivekanandan, Balasubramani wrote:
>> > On 08.10.2024 18:00, Matthew Brost wrote:
>> > > On Tue, Oct 08, 2024 at 05:24:32PM +0000, Matthew Brost wrote:
>> > > > On Tue, Oct 08, 2024 at 11:55:03AM -0500, Lucas De Marchi wrote:
>> > > > > On Tue, Oct 08, 2024 at 03:42:09PM +0000, Matthew Brost wrote:
>> > > > > > On Tue, Oct 08, 2024 at 10:20:15AM -0500, Lucas De Marchi wrote:
>> > > > > > > On Tue, Oct 08, 2024 at 02:54:19PM +0000, Matthew Brost wrote:
>> > > > > > > > On Tue, Oct 08, 2024 at 01:06:28PM +0530, Balasubramani Vivekanandan wrote:
>> > > > > > > > > Drop the exclusive client count tracking and use the filelist from the
>> > > > > > > > > drm to track the active clients. This also ensures the clients created
>> > > > > > > > > internally by the driver won't block changing the ccs mode.
>> > > > > > > > >
>> > > > > > > > > Fixes: ce8c161cbad4 ("drm/xe: Add ref counting for xe_file")
>> > > > > > > >
>> > > > > > > > Is this really fixing anything. As far as I can tell nothing upstream
>> > > > > > > > opens a file internally (i.e. xe_file_open) is never called directly.
>> > > > > > >
>> > > > > > > should fix this case:
>> > > > > > >
>> > > > > > > 	open()
>> > > > > > > 	close()
>> > > > > > > 			<---- race here
>> > > > > > > 	change_ccs_mode()
>> > > > > > >
>> > > > > > > because the close is not completely sync - the cleanup where the
>> > > > > > > previous number of clients is decremented is executed too late and
>> > > > > > > subject to a race fixed here.
>> > > > > > >
>> > > > > >
>> > > > > > Ah, ok. But then IMO just move the clients.count decrement to
>> > > > > > xe_file_close. I try to preach solid locking / layering and this seems
>> > > > > > to go against this idea.
>> > > > >
>> > > > > why do you want to track the exact same thing in 2 ways, one in drm and
>> > > > > the other in xe? "Are there clients connected?" is something drm_file
>> > > > > answer and doesn't need to be duplicated in the individual drivers.
>> > > > >
>> > > >
>> > > > Well layering, making assumptions about the file list means, and not
>> > > > randomly deciding what we can and can't do with locks we don't own.
>> > > >
>> > >
>> > > I looked at this a bit more, if we want to do it this way then I think
>> > > at a minimum we need a drm helper for this.
>> > >
>> > > e.g.,
>> > >
>> > > bool drm_device_user_clients_open(struct drm_device *dev)
>> > > {
>> > > 	lockdep_assert(&dev->filelist_mutex);
>> > >
>> > > 	return !!list_empty(&dev->filelist);
>> > > }
>> >
>> > Still the driver would be required to use the filelist_mutex outside the
>> > drm layer. So will not solve the layering problem completely.
>> > Otherwise we can define two helper functions in drm, the first one would
>> > check there are no open files and returns true holding the mutex.
>> > The second function can be used to release the mutex.
>> > Does it sound too invasive?
>
>I'd say helper to acquire / release the file lock indicating it is ok
>for other layers to use the lock. Then add helpers with annotaions (like
>my example) for internal access to that layer. Maybe a bit much, I find
>myself writing code in this style these days.
>
>>
>> yes, and I'd rather prefer not adding these helpers.  If someone wants
>> to add helpers later and convert the current use cases, that's a
>> cleanup/refactor that can be done on top. For me, these helpers are not
>> really helping.
>>
>> Rodrigo / Thomas, any thought?
>
>Another opinion would be good. I know this seems a bit pedantic but
>locking is about the most important thing a driver and certainly the
>most difficult to unwind is we do it incorrectly.
>
>Matt
>
>>
>> Lucas De Marchi
>>
>> >
>> > Regards,
>> > Bala
>> >
>> > >
>> > > Matt
>> > >
>> > > > > At least it's also used for similar reasons in amdgpu and vmwgfx:
>> > > > >
>> > > > > $ git grep -l -e "->filelist" -- drivers/gpu/drm/
>> > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>> > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> > > > > drivers/gpu/drm/drm_client.c
>> > > > > drivers/gpu/drm/drm_debugfs.c
>> > > > > drivers/gpu/drm/drm_drv.c
>> > > > > drivers/gpu/drm/drm_file.c
>> > > > > drivers/gpu/drm/vmwgfx/vmwgfx_gem.c
>> > > > >
>> > > >
>> > > > The AMD / VMWGFX usage certainly looks wrong to me. If the file list was
>> > > > meant to be traversed by drivers there should be an exported iterator
>> > > > IMO.
>> > > >
>> > > > Matt
>> > > >
>> > > > >
>> > > > > Lucas De Marchi


More information about the Intel-xe mailing list