[Intel-gfx] [PATCH 03/51] drm: add managed resources tied to drm_device
Daniel Vetter
daniel at ffwll.ch
Wed Feb 26 10:21:18 UTC 2020
On Wed, Feb 26, 2020 at 10:21:17AM +0100, Andrzej Hajda wrote:
> On 25.02.2020 16:03, Daniel Vetter wrote:
> > On Tue, Feb 25, 2020 at 11:27 AM Andrzej Hajda <a.hajda at samsung.com> wrote:
> >> Hi Daniel,
> >>
> >>
> >> The patchset looks interesting.
> >>
> >>
> >> On 21.02.2020 22:02, Daniel Vetter wrote:
> >>> We have lots of these. And the cleanup code tends to be of dubious
> >>> quality. The biggest wrong pattern is that developers use devm_, which
> >>> ties the release action to the underlying struct device, whereas
> >>> all the userspace visible stuff attached to a drm_device can long
> >>> outlive that one (e.g. after a hotunplug while userspace has open
> >>> files and mmap'ed buffers). Give people what they want, but with more
> >>> correctness.
> >>
> >> I am not familiar with this stuff, so forgive me stupid questions.
> >>
> >> Is it documented how uapi should behave in such case?
> >>
> >> I guess the general rule is to return errors on most ioctls (ENODEV,
> >> EIO?), and wait until userspace releases everything, as there is not
> >> much more to do.
> >>
> >> If that is true what is the point of keeping these structs anyway -
> >> trivial functions with small context data should do the job.
> >>
> >> I suspect I am missing something but I do not know what :)
> > We could do the above (also needs unmapping of all mmaps, so userspace
> > then gets SIGSEGV everywhere) and watch userspace crash&burn.
> > Essentially if the kernel can't do this properly, then there's no hope
> > that userspace will be any better.
>
>
> We do not want to crash userspace. We just need to tell userspace that
> the kernel objects userspace has references to are not valid.
>
> For this two mechanism should be enough:
>
> - signal hot-unplug,
>
> - report error (ENODEV for example) on any userspace requests (ioctls)
> on invalid objects.
>
> Expecting from userspace properly handling ioctl errors seems to be fair.
The trouble is that maybe it's fair, practice says it's just not going to
happen.
> Regarding mmap I am not sure how to properly handle disappearing
> devices, but this is common problem regardless which solution we use.
signal handler wrapped around every mmap access. Which doesn't compose
across libraries, so is essentially impossible.
Note that e.g. GL's robustness extensions works exactly like this here
too: GPU dies, kernel kills all your objects and contexts and everything.
But the driver keeps "working". The only way to get information that
everything is actually dead is by querying the robustness extension, which
then will tell you what's happened.
Again this is because it's impossible to make sure userspace actually
checks error codes every where. It's also prohibitively expensive. vk goes
as far as outright removing all error validation (at least as much as
possible).
> > Hence the idea is that we keep everything userspace facing still
> > around, except it doesn't do much anymore. So connectors still there,
> > but they look disconnected.
>
>
> It looks like lying to userspace that physical connectors still exists.
> If we want to lie we need good reason for that. What is that reason?
>
> Why not just tell connectors are gone?
Userspace sucks at handling hotunplugged connectors. Most of it is special
case code for DP MST connectors only.
> > Userspace can then hopefully eventually
> > get around to processing the sysfs hotunplug event and remove the
> > device from all its list. So the long-term idea is that a lot of stuff
> > keeps working, except the driver doesn't talk to the hardware anymore.
> > And we just sit around waiting for userspace to clean things up.
>
>
> What does it mean "lot of stuff keeps working"? What drm driver can do
> without hardware? Could you show some examples?
Nothing will "work", the goal is simply for userspace to not explode in
fire and take the entire desktop down with it.
> > I guess once we have a bunch of the panel/usb drivers converted over
> > we could indeed document how this is all supposed to work from an uapi
> > pov. But right now a lot of this is all rather aspirational, I think
> > only the recent simple display pipe based drivers implement this as
> > described above.
> >
> >>> Mostly copied from devres.c, with types adjusted to fit drm_device and
> >>> a few simplifications - I didn't (yet) copy over everything. Since
> >>> the types don't match code sharing looked like a hopeless endeavour.
> >>>
> >>> For now it's only super simplified, no groups, you can't remove
> >>> actions (but kfree exists, we'll need that soon). Plus all specific to
> >>> drm_device ofc, including the logging. Which I didn't bother to make
> >>> compile-time optional, since none of the other drm logging is compile
> >>> time optional either.
> >>
> >> I saw in v1 thread that copy/paste is OK and merging back devres and
> >> drmres can be done later, but experience shows that after short time
> >> things get de-synchronized and merging process becomes quite painful.
> >>
> >> On the other side I guess it shouldn't be difficult to split devres into
> >> consumer agnostic core and "struct device" helpers and then use the core
> >> in drm.
> >>
> >> For example currently devres uses two fields from struct device:
> >>
> >> spinlock_t devres_lock;
> >> struct list_head devres_head;
> >>
> >> Lets put it into separate struct:
> >>
> >> struct devres {
> >>
> >> spinlock_t lock;
> >> struct list_head head;
> >>
> >> };
> >>
> >> And embed this struct into "struct device".
> >>
> >> Then convert all core devres functions to take "struct devres *"
> >> argument instead of "struct device *" and then these core functions can
> >> be usable in drm.
> >>
> >> Looks quite simple separation of abstraction (devres) and its consumer
> >> (struct device).
> >>
> >> After such split one could think about changing name devres to something
> >> more reliable.
> > There was a long discussion on v1 exactly about this, Greg's
> > suggestion was to "just share a struct device". So we're not going to
> > do this here, and the struct device seems like slight overkill and not
> > a good enough fit here.
>
>
> But my proposition is different, I want to get rid of "struct device"
> from devres core - devres has nothing to do with device, it was bound to
> it probably because it was convenient as device was the only client of
> devres (I guess). Now if we want to have more devres clients abstracting
> out devres from device seems quite natural. This way we will have proper
> abstractions without code duplication.
>
> Examples of devres related code according to my proposition:
>
> // devres core
>
> void devres_add(struct devres_head *dh, void *res)
> {
>
> struct devres *dr = container_of(res, struct devres, data);
>
> unsigned long flags;
>
> spin_lock_irqsave(&dh->lock, flags);
> add_dr(dev, &dr->node);
> spin_unlock_irqrestore(&dh->lock, flags);
> }
>
> // device devres helper (non core)
>
> struct clk *devm_clk_get(struct device *dev, const char *id)
> {
> struct clk **ptr, *clk;
>
> ptr = devres_alloc(devm_clk_release, sizeof(*ptr), GFP_KERNEL);
> if (!ptr)
> return ERR_PTR(-ENOMEM);
>
> clk = clk_get(dev, id);
> if (!IS_ERR(clk)) {
> *ptr = clk;
> devres_add(&dev->devres, ptr);
> } else {
> devres_free(ptr);
> }
>
> return clk;
> }
>
>
> Changes are cosmetic. But then you can easily add devres to drmdev:
>
> struct drm_device {
>
> ...
>
> + struct devres_head devres;
>
> };
>
> // then copy/modify from your patch:
>
> +void *drmm_kmalloc(struct drm_device *dev, size_t size, gfp_t gfp)
> +{
> + struct drmres *dr;
> +
> + dr = alloc_dr(NULL, size, gfp, dev_to_node(dev->dev));
> + if (!dr)
> + return NULL;
> + dr->node.name = "kmalloc";
> +
> + devres_add(&dev->devres, dr); // the only change is here
> +
> + return dr->data;
> +}
>
>
> Btw, reimplemented add_dr is different of original add_dr and is similar
> to original devres_add, so your implementation differs already from
> original one, merging back these two will be painfull :)
Oh I know, I guess I could go more into details about why exactly. One
reason is that I want type-checking, so struct drm_device * instead of
something else. At least for the userspace callbacks. That's going to be
tough with your approach - kmalloc is easy, it's the _add_action which
gets nasty with the type checking.
The other is that we can use drm debugging, which gives us some nice
consistency within drm at least.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
More information about the Intel-gfx
mailing list