Try to address the drm_debugfs issues

Christian König ckoenig.leichtzumerken at gmail.com
Fri Feb 10 12:07:39 UTC 2023


Am 09.02.23 um 19:48 schrieb Maxime Ripard:
> On Thu, Feb 09, 2023 at 04:52:54PM +0100, Christian König wrote:
>> Am 09.02.23 um 15:19 schrieb Maxime Ripard:
>>> On Thu, Feb 09, 2023 at 03:06:10PM +0100, Christian König wrote:
>>>> Am 09.02.23 um 14:06 schrieb Maíra Canal:
>>>>> On 2/9/23 09:13, Christian König wrote:
>>>>>> Am 09.02.23 um 12:23 schrieb Maíra Canal:
>>>>>>> On 2/9/23 05:18, Christian König wrote:
>>>>>>>> Hello everyone,
>>>>>>>>
>>>>>>>> the drm_debugfs has a couple of well known design problems.
>>>>>>>>
>>>>>>>> Especially it wasn't possible to add files between
>>>>>>>> initializing and registering
>>>>>>>> of DRM devices since the underlying debugfs directory wasn't
>>>>>>>> created yet.
>>>>>>>>
>>>>>>>> The resulting necessity of the driver->debugfs_init()
>>>>>>>> callback function is a
>>>>>>>> mid-layering which is really frowned on since it creates a horrible
>>>>>>>> driver->DRM->driver design layering.
>>>>>>>>
>>>>>>>> The recent patch "drm/debugfs: create device-centered
>>>>>>>> debugfs functions" tried
>>>>>>>> to address those problem, but doesn't seem to work
>>>>>>>> correctly. This looks like
>>>>>>>> a misunderstanding of the call flow around
>>>>>>>> drm_debugfs_init(), which is called
>>>>>>>> multiple times, once for the primary and once for the render node.
>>>>>>>>
>>>>>>>> So what happens now is the following:
>>>>>>>>
>>>>>>>> 1. drm_dev_init() initially allocates the drm_minor objects.
>>>>>>>> 2. ... back to the driver ...
>>>>>>>> 3. drm_dev_register() is called.
>>>>>>>>
>>>>>>>> 4. drm_debugfs_init() is called for the primary node.
>>>>>>>> 5. drm_framebuffer_debugfs_init(), drm_client_debugfs_init() and
>>>>>>>>       drm_atomic_debugfs_init() call drm_debugfs_add_file(s)()
>>>>>>>> to add the files
>>>>>>>>       for the primary node.
>>>>>>>> 6. The driver->debugfs_init() callback is called to add
>>>>>>>> debugfs files for the
>>>>>>>>       primary node.
>>>>>>>> 7. The added files are consumed and added to the primary
>>>>>>>> node debugfs directory.
>>>>>>>>
>>>>>>>> 8. drm_debugfs_init() is called for the render node.
>>>>>>>> 9. drm_framebuffer_debugfs_init(), drm_client_debugfs_init() and
>>>>>>>>       drm_atomic_debugfs_init() call drm_debugfs_add_file(s)()
>>>>>>>> to add the files
>>>>>>>>       again for the render node.
>>>>>>>> 10. The driver->debugfs_init() callback is called to add
>>>>>>>> debugfs files for the
>>>>>>>>        render node.
>>>>>>>> 11. The added files are consumed and added to the render
>>>>>>>> node debugfs directory.
>>>>>>>>
>>>>>>>> 12. Some more files are added through drm_debugfs_add_file().
>>>>>>>> 13. drm_debugfs_late_register() add the files once more to
>>>>>>>> the primary node
>>>>>>>>        debugfs directory.
>>>>>>>> 14. From this point on files added through
>>>>>>>> drm_debugfs_add_file() are simply ignored.
>>>>>>>> 15. ... back to the driver ...
>>>>>>>>
>>>>>>>> Because of this the dev->debugfs_mutex lock is also
>>>>>>>> completely pointless since
>>>>>>>> any concurrent use of the interface would just randomly
>>>>>>>> either add the files to
>>>>>>>> the primary or render node or just not at all.
>>>>>>>>
>>>>>>>> Even worse is that this implementation nails the coffin for
>>>>>>>> removing the
>>>>>>>> driver->debugfs_init() mid-layering because otherwise
>>>>>>>> drivers can't control
>>>>>>>> where their debugfs (primary/render node) are actually added.
>>>>>>>>
>>>>>>>> This patch set here now tries to clean this up a bit, but
>>>>>>>> most likely isn't
>>>>>>>> fully complete either since I didn't audit every driver/call path.
>>>>>>> I tested the patchset on the v3d, vc4 and vkms and all the files
>>>>>>> are generated
>>>>>>> as expected, but I'm getting the following errors on dmesg:
>>>>>>>
>>>>>>> [    3.872026] debugfs: File 'v3d_ident' in directory '0'
>>>>>>> already present!
>>>>>>> [    3.872064] debugfs: File 'v3d_ident' in directory '128'
>>>>>>> already present!
>>>>>>> [    3.872078] debugfs: File 'v3d_regs' in directory '0' already
>>>>>>> present!
>>>>>>> [    3.872087] debugfs: File 'v3d_regs' in directory '128'
>>>>>>> already present!
>>>>>>> [    3.872097] debugfs: File 'measure_clock' in directory '0'
>>>>>>> already present!
>>>>>>> [    3.872105] debugfs: File 'measure_clock' in directory '128'
>>>>>>> already present!
>>>>>>> [    3.872116] debugfs: File 'bo_stats' in directory '0' already
>>>>>>> present!
>>>>>>> [    3.872124] debugfs: File 'bo_stats' in directory '128'
>>>>>>> already present!
>>>>>>>
>>>>>>> It looks like the render node is being added twice, since this
>>>>>>> doesn't happen
>>>>>>> for vc4 and vkms.
>>>>>> Thanks for the feedback and yes that's exactly what I meant with
>>>>>> that I haven't looked into all code paths.
>>>>>>
>>>>>> Could it be that v3d registers it's debugfs files from the
>>>>>> debugfs_init callback?
>>>>> Although this is true, I'm not sure if this is the reason why the files
>>>>> are
>>>>> being registered twice, as this doesn't happen to vc4, and it also uses
>>>>> the
>>>>> debugfs_init callback. I believe it is somewhat related to the fact that
>>>>> v3d is the primary node and the render node.
>>>> I see. Thanks for the hint.
>>>>
>>>>> Best Regards,
>>>>> - Maíra Canal
>>>>>
>>>>>> One alternative would be to just completely nuke support for
>>>>>> separate render node debugfs files and only add a symlink to the
>>>>>> primary node. Opinions?
>>>> What do you think of this approach? I can't come up with any reason why we
>>>> should have separate debugfs files for render nodes and I think it is pretty
>>>> much the same reason you came up with the patch for per device debugfs files
>>>> instead of per minor.
>>> They are two entirely separate devices and drivers, it doesn't make much
>>> sense to move their debugfs files to one or the other.
>> Well exactly that isn't true.
> Ok.
>
>> The primary and render node are just two file under /dev for the same
>> hardware device and driver.
>>
>> We just offer different functionality through the two interfaces, but
>> essentially there isn't any information we could expose for one which
>> isn't true for the other as well.
> I'd like to know what criteria you're using to say that they are the
> same hardware device then, because they don't share the same MMIO
> mappings, interrupts, clocks, IOMMUs, power domains, etc. They can also
> operate independently.

Well you don't seem to understand what I'm talking about.

This is about the primary and render node under /dev/dri/, not some 
separate hw device.

So you really have only one hardware device. E.g. clocks, IOMMU, power 
etc... is all the same. It's just one physical device which only one 
drm_device structure.

Regards,
Christian.

>
> So unless that criteria is that they share the RAM, they cannot be
> considered the same hardware device.
>
> Maxime



More information about the dri-devel mailing list