Try to address the drm_debugfs issues

Maxime Ripard maxime at cerno.tech
Fri Feb 10 12:18:58 UTC 2023


On Fri, Feb 10, 2023 at 01:07:39PM +0100, Christian König wrote:
> Am 09.02.23 um 19:48 schrieb Maxime Ripard:
> > On Thu, Feb 09, 2023 at 04:52:54PM +0100, Christian König wrote:
> > > Am 09.02.23 um 15:19 schrieb Maxime Ripard:
> > > > On Thu, Feb 09, 2023 at 03:06:10PM +0100, Christian König wrote:
> > > > > Am 09.02.23 um 14:06 schrieb Maíra Canal:
> > > > > > On 2/9/23 09:13, Christian König wrote:
> > > > > > > Am 09.02.23 um 12:23 schrieb Maíra Canal:
> > > > > > > > On 2/9/23 05:18, Christian König wrote:
> > > > > > > > > Hello everyone,
> > > > > > > > > 
> > > > > > > > > the drm_debugfs has a couple of well known design problems.
> > > > > > > > > 
> > > > > > > > > Especially it wasn't possible to add files between
> > > > > > > > > initializing and registering
> > > > > > > > > of DRM devices since the underlying debugfs directory wasn't
> > > > > > > > > created yet.
> > > > > > > > > 
> > > > > > > > > The resulting necessity of the driver->debugfs_init()
> > > > > > > > > callback function is a
> > > > > > > > > mid-layering which is really frowned on since it creates a horrible
> > > > > > > > > driver->DRM->driver design layering.
> > > > > > > > > 
> > > > > > > > > The recent patch "drm/debugfs: create device-centered
> > > > > > > > > debugfs functions" tried
> > > > > > > > > to address those problem, but doesn't seem to work
> > > > > > > > > correctly. This looks like
> > > > > > > > > a misunderstanding of the call flow around
> > > > > > > > > drm_debugfs_init(), which is called
> > > > > > > > > multiple times, once for the primary and once for the render node.
> > > > > > > > > 
> > > > > > > > > So what happens now is the following:
> > > > > > > > > 
> > > > > > > > > 1. drm_dev_init() initially allocates the drm_minor objects.
> > > > > > > > > 2. ... back to the driver ...
> > > > > > > > > 3. drm_dev_register() is called.
> > > > > > > > > 
> > > > > > > > > 4. drm_debugfs_init() is called for the primary node.
> > > > > > > > > 5. drm_framebuffer_debugfs_init(), drm_client_debugfs_init() and
> > > > > > > > >       drm_atomic_debugfs_init() call drm_debugfs_add_file(s)()
> > > > > > > > > to add the files
> > > > > > > > >       for the primary node.
> > > > > > > > > 6. The driver->debugfs_init() callback is called to add
> > > > > > > > > debugfs files for the
> > > > > > > > >       primary node.
> > > > > > > > > 7. The added files are consumed and added to the primary
> > > > > > > > > node debugfs directory.
> > > > > > > > > 
> > > > > > > > > 8. drm_debugfs_init() is called for the render node.
> > > > > > > > > 9. drm_framebuffer_debugfs_init(), drm_client_debugfs_init() and
> > > > > > > > >       drm_atomic_debugfs_init() call drm_debugfs_add_file(s)()
> > > > > > > > > to add the files
> > > > > > > > >       again for the render node.
> > > > > > > > > 10. The driver->debugfs_init() callback is called to add
> > > > > > > > > debugfs files for the
> > > > > > > > >        render node.
> > > > > > > > > 11. The added files are consumed and added to the render
> > > > > > > > > node debugfs directory.
> > > > > > > > > 
> > > > > > > > > 12. Some more files are added through drm_debugfs_add_file().
> > > > > > > > > 13. drm_debugfs_late_register() add the files once more to
> > > > > > > > > the primary node
> > > > > > > > >        debugfs directory.
> > > > > > > > > 14. From this point on files added through
> > > > > > > > > drm_debugfs_add_file() are simply ignored.
> > > > > > > > > 15. ... back to the driver ...
> > > > > > > > > 
> > > > > > > > > Because of this the dev->debugfs_mutex lock is also
> > > > > > > > > completely pointless since
> > > > > > > > > any concurrent use of the interface would just randomly
> > > > > > > > > either add the files to
> > > > > > > > > the primary or render node or just not at all.
> > > > > > > > > 
> > > > > > > > > Even worse is that this implementation nails the coffin for
> > > > > > > > > removing the
> > > > > > > > > driver->debugfs_init() mid-layering because otherwise
> > > > > > > > > drivers can't control
> > > > > > > > > where their debugfs (primary/render node) are actually added.
> > > > > > > > > 
> > > > > > > > > This patch set here now tries to clean this up a bit, but
> > > > > > > > > most likely isn't
> > > > > > > > > fully complete either since I didn't audit every driver/call path.
> > > > > > > > I tested the patchset on the v3d, vc4 and vkms and all the files
> > > > > > > > are generated
> > > > > > > > as expected, but I'm getting the following errors on dmesg:
> > > > > > > > 
> > > > > > > > [    3.872026] debugfs: File 'v3d_ident' in directory '0'
> > > > > > > > already present!
> > > > > > > > [    3.872064] debugfs: File 'v3d_ident' in directory '128'
> > > > > > > > already present!
> > > > > > > > [    3.872078] debugfs: File 'v3d_regs' in directory '0' already
> > > > > > > > present!
> > > > > > > > [    3.872087] debugfs: File 'v3d_regs' in directory '128'
> > > > > > > > already present!
> > > > > > > > [    3.872097] debugfs: File 'measure_clock' in directory '0'
> > > > > > > > already present!
> > > > > > > > [    3.872105] debugfs: File 'measure_clock' in directory '128'
> > > > > > > > already present!
> > > > > > > > [    3.872116] debugfs: File 'bo_stats' in directory '0' already
> > > > > > > > present!
> > > > > > > > [    3.872124] debugfs: File 'bo_stats' in directory '128'
> > > > > > > > already present!
> > > > > > > > 
> > > > > > > > It looks like the render node is being added twice, since this
> > > > > > > > doesn't happen
> > > > > > > > for vc4 and vkms.
> > > > > > > Thanks for the feedback and yes that's exactly what I meant with
> > > > > > > that I haven't looked into all code paths.
> > > > > > > 
> > > > > > > Could it be that v3d registers it's debugfs files from the
> > > > > > > debugfs_init callback?
> > > > > > Although this is true, I'm not sure if this is the reason why the files
> > > > > > are
> > > > > > being registered twice, as this doesn't happen to vc4, and it also uses
> > > > > > the
> > > > > > debugfs_init callback. I believe it is somewhat related to the fact that
> > > > > > v3d is the primary node and the render node.
> > > > > I see. Thanks for the hint.
> > > > > 
> > > > > > Best Regards,
> > > > > > - Maíra Canal
> > > > > > 
> > > > > > > One alternative would be to just completely nuke support for
> > > > > > > separate render node debugfs files and only add a symlink to the
> > > > > > > primary node. Opinions?
> > > > > What do you think of this approach? I can't come up with any reason why we
> > > > > should have separate debugfs files for render nodes and I think it is pretty
> > > > > much the same reason you came up with the patch for per device debugfs files
> > > > > instead of per minor.
> > > > They are two entirely separate devices and drivers, it doesn't make much
> > > > sense to move their debugfs files to one or the other.
> > > Well exactly that isn't true.
> > Ok.
> > 
> > > The primary and render node are just two file under /dev for the same
> > > hardware device and driver.
> > > 
> > > We just offer different functionality through the two interfaces, but
> > > essentially there isn't any information we could expose for one which
> > > isn't true for the other as well.
> > I'd like to know what criteria you're using to say that they are the
> > same hardware device then, because they don't share the same MMIO
> > mappings, interrupts, clocks, IOMMUs, power domains, etc. They can also
> > operate independently.
> 
> Well you don't seem to understand what I'm talking about.

I would certainly like you to stop making those kind of statements.
Apart from creating unnecessary tension, they don't bring anything to
the discussion.

> This is about the primary and render node under /dev/dri/, not some
> separate hw device.

The thing is, vc4 and v3d are both different nodes under /dev/dri and
separate hw devices.

> So you really have only one hardware device. E.g. clocks, IOMMU, power
> etc... is all the same.

Well, I mean, you can claim that all you want, but they certainly aren't
the same hardware device. Just like on virtually any !x86 SoC, the GPU
and display engines aren't the same device, and most of the time don't
even come from the same vendor.

Going back to the initial issue, one of the files exposed by the v3d
driver is the v3d registers content. It makes no sense to expose the v3d
registers into the primary (vc4) node when the hardware doesn't match,
and v3d has its own node.

Maxime


More information about the dri-devel mailing list