[Intel-gfx] [PATCH 03/52] drm: add managed resources tied to drm_device
Daniel Vetter
daniel.vetter at ffwll.ch
Wed Feb 19 16:22:38 UTC 2020
On Wed, Feb 19, 2020 at 5:09 PM Emil Velikov <emil.l.velikov at gmail.com> wrote:
>
> On Wed, 19 Feb 2020 at 14:23, Daniel Vetter <daniel.vetter at ffwll.ch> wrote:
> >
> > On Wed, Feb 19, 2020 at 2:33 PM Greg Kroah-Hartman
> > <gregkh at linuxfoundation.org> wrote:
> > >
> > > On Wed, Feb 19, 2020 at 03:28:47PM +0200, Laurent Pinchart wrote:
> > > > Hi Daniel,
> > > >
> > > > Thank you for the patch.
> > > >
> > > > On Wed, Feb 19, 2020 at 11:20:33AM +0100, Daniel Vetter wrote:
> > > > > We have lots of these. And the cleanup code tends to be of dubious
> > > > > quality. The biggest wrong pattern is that developers use devm_, which
> > > > > ties the release action to the underlying struct device, whereas
> > > > > all the userspace visible stuff attached to a drm_device can long
> > > > > outlive that one (e.g. after a hotunplug while userspace has open
> > > > > files and mmap'ed buffers). Give people what they want, but with more
> > > > > correctness.
> > > > >
> > > > > Mostly copied from devres.c, with types adjusted to fit drm_device and
> > > > > a few simplifications - I didn't (yet) copy over everything. Since
> > > > > the types don't match code sharing looked like a hopeless endeavour.
> > > > >
> > > > > For now it's only super simplified, no groups, you can't remove
> > > > > actions (but kfree exists, we'll need that soon). Plus all specific to
> > > > > drm_device ofc, including the logging. Which I didn't bother to make
> > > > > compile-time optional, since none of the other drm logging is compile
> > > > > time optional either.
> > > > >
> > > > > One tricky bit here is the chicken&egg between allocating your
> > > > > drm_device structure and initiliazing it with drm_dev_init. For
> > > > > perfect onion unwinding we'd need to have the action to kfree the
> > > > > allocation registered before drm_dev_init registers any of its own
> > > > > release handlers. But drm_dev_init doesn't know where exactly the
> > > > > drm_device is emebedded into the overall structure, and by the time it
> > > > > returns it'll all be too late. And forcing drivers to be able clean up
> > > > > everything except the one kzalloc is silly.
> > > > >
> > > > > Work around this by having a very special final_kfree pointer. This
> > > > > also avoids troubles with the list head possibly disappearing from
> > > > > underneath us when we release all resources attached to the
> > > > > drm_device.
> > > >
> > > > This is all a very good idea ! Many subsystems are plagged by drivers
> > > > using devm_k*alloc to allocate data accessible by userspace. Since the
> > > > introduction of devm_*, we've likely reduced the number of memory leaks,
> > > > but I'm pretty sure we've increased the risk of crashes as I've seen
> > > > some drivers that used .release() callbacks correctly being naively
> > > > converted to incorrect devm_* usage :-(
> > > >
> > > > This leads me to a question: if other subsystems have the same problem,
> > > > could we turn this implementation into something more generic ? It
> > > > doesn't have to be done right away and shouldn't block merging this
> > > > series, but I think it would be very useful.
> > >
> > > It shouldn't be that hard to tie this into a drv_m() type of a thing
> > > (driver_memory?)
> > >
> > > And yes, I think it's much better than devm_* for the obvious reasons of
> > > this being needed here.
> >
> > There's two reasons I went with copypasta instead of trying to share code:
> > - Type checking, I definitely don't want people to mix up devm_ with
> > drmm_. But even if we do a drv_m that subsystems could embed we do
> > have quite a few different types of component drivers (and with
> > drm_panel and drm_bridge even standardized), and I don't want people
> > to be able to pass the wrong kind of struct to e.g. a managed
> > drmm_connector_init - it really needs to be the drm_device, not a
> > panel or bridge or something else.
> >
> > - We could still share the code as a kind of implementation/backend
> > library. But it's not much, and with embedding I could use the drm
> > device logging stuff which is kinda nice. But if there's more demand
> > for this I can definitely see the point in sharing this, as Laurent
> > pointed out with the tiny optimization with not allocating a NULL void
> > * that I've done (and screwed up) it's not entirely trivial code.
> >
>
> My 2c as they say, although closer to a brain dump :-)
>
> On one hand the drm_device has an embedded struct device. On the other
> drm_device preserves state which outlives the embedded struct device.
>
> Would it make sense to keep drm_device better related to the
> underlying device? Effectively moving the $misc state to drm_driver.
> This idea does raise another question - struct drm_driver unlike many
> other struct $foo_driver, does not embedded device_driver :-(
> So if one is to cover the above two, then the embedding concerns will
> be elevated.
drm_driver isn't a bus device driver in the linux driver model sense,
but an uapi thing that sits on top of some underlying device. So maybe
better to rename drm_driver to drm_interface_driver, and drm_device to
drm_interface. But that would be giantic churn and probably lots of
confusion. We do require a link between drm_device->struct device
nowadays, but that's just to guarantee userspace can find the
drm_device in sysfs somewhere and make sense of what it actually
drives.
That's also why the lifetimes for the two things are totally
different. The device driver an all it's resources are tied to the
underlying physical device, and resources can be released when that
driver<->device link is broken (either unbind or hotunplug). That's
what devm_ does. The drm_driver/drm_device otoh is tied to the
userspace api, and can only disappear once all the userspace handles
have been cleaned up and released. And we have an enormous amount of
those, with all the mmaps, and shared fd for dma-buf, sync_file,
synobj and whatever else. The drm_device can only be cleaned up once
userspace has closed all these things, or we'll go boom somewhere. The
only connection is that the userspace interface drives the underlying
hw (as long as it's still there) and the hw side holds a reference on
the uapi side (drm_dev_get/put) to make sure the userspace side
doesn't go poof and disappear when no one has the /dev node open :-)
But aside from these links they're completely separate worlds, and
mixing up the lifetimes results in all kinds of bad things happening.
Ofc normally these two things exist at the same time, but hotunplug
makes things very interesting here. And traditionally we've handled it
badly, if at all in drm.
> WRT type safety, with the embedded work sorted, one could introduce
> trivial helpers for drmm_connector_init and friends.
>
> In another email you've also raised the question of API diversity and
> reviews, I believe. IMHO one could start with a bare minimum set and
> extend as needed.
> Based on the prompt response from Greg, I suspect review won't be an issue.
The drmm_ stuff in here is the bare minimum we need to get started. I
expect lots of stuff will be added, but those are all just going to be
convenience functions on top of the drmm_add_action primitive.
> If people agree with my analysis and considering the size/complexity
> of drm_device <> drm_driver reshuffle, we could add a TODO task.
> I suspect the underlying work will be larger than the current 52 patch
> set, so doing it in one go will be PITA.
I'm not following what you want to shuffle. drm_driver is entirely
static and kinda global, drm_device is the per-instance structure we
have. And here we mean per-userspace uapi interface instance. So I
guess I'm confused what you want to do?
-Daniel
>
> HTH
> Emil
>
> * Based on the following quick greps
> $git grep -W "struct [a-zA-Z0-9-]*_driver {" -- include/ | grep -w
> "struct device_driver\>.*;" | wc -l
> 56
> $git cgrep "struct [a-zA-Z0-9-]*_driver {" -- include/ | wc -l
> 71
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
More information about the Intel-gfx
mailing list