[Intel-gfx] [PATCH 19/51] drm: Cleanups after drmm_add_final_kfree rollout
Daniel Vetter
daniel.vetter at ffwll.ch
Thu Apr 2 09:50:00 UTC 2020
On Thu, Apr 2, 2020 at 11:39 AM Laurent Pinchart
<laurent.pinchart at ideasonboard.com> wrote:
>
> Hi Daniel,
>
> On Thu, Apr 02, 2020 at 07:17:40AM +0200, Daniel Vetter wrote:
> > On Thu, Apr 2, 2020 at 2:50 AM Laurent Pinchart wrote:
> > > On Mon, Mar 23, 2020 at 03:49:18PM +0100, Daniel Vetter wrote:
> > > > A few things:
> > > > - Update the example driver in the documentation.
> > > > - We can drop the old kfree in drm_dev_release.
> > > > - Add a WARN_ON check in drm_dev_register to make sure everyone calls
> > > > drmm_add_final_kfree and there's no leaks.
> > > >
> > > > v2: Restore the full cleanup, I accidentally left some moved code
> > > > behind when fixing the bisectability of the series.
> > > >
> > > > Acked-by: Sam Ravnborg <sam at ravnborg.org>
> > > > Acked-by: Thomas Zimmermann <tzimmermann at suse.de>
> > > > Cc: Sam Ravnborg <sam at ravnborg.org>
> > > > Cc: Dan Carpenter <dan.carpenter at oracle.com>
> > > > Signed-off-by: Daniel Vetter <daniel.vetter at intel.com>
> > > > ---
> > > > drivers/gpu/drm/drm_drv.c | 12 +++++-------
> > > > 1 file changed, 5 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> > > > index 877ded348b6e..7f9d7ea543a0 100644
> > > > --- a/drivers/gpu/drm/drm_drv.c
> > > > +++ b/drivers/gpu/drm/drm_drv.c
> > > > @@ -297,8 +297,6 @@ void drm_minor_release(struct drm_minor *minor)
> > > > *
> > > > * drm_mode_config_cleanup(drm);
> > > > * drm_dev_fini(drm);
> > > > - * kfree(priv->userspace_facing);
> > > > - * kfree(priv);
> > > > * }
> > > > *
> > > > * static struct drm_driver driver_drm_driver = {
> > > > @@ -326,10 +324,11 @@ void drm_minor_release(struct drm_minor *minor)
> > > > * kfree(drm);
> > > > * return ret;
> > > > * }
> > > > + * drmm_add_final_kfree(drm, priv);
> > > > *
> > > > * drm_mode_config_init(drm);
> > > > *
> > > > - * priv->userspace_facing = kzalloc(..., GFP_KERNEL);
> > > > + * priv->userspace_facing = drmm_kzalloc(..., GFP_KERNEL);
> > > > * if (!priv->userspace_facing)
> > > > * return -ENOMEM;
> > > > *
> > > > @@ -837,10 +836,7 @@ static void drm_dev_release(struct kref *ref)
> > > >
> > > > drm_managed_release(dev);
> > > >
> > > > - if (!dev->driver->release && !dev->managed.final_kfree) {
> > > > - WARN_ON(!list_empty(&dev->managed.resources));
> > > > - kfree(dev);
> > > > - } else if (dev->managed.final_kfree)
> > > > + if (dev->managed.final_kfree)
> > > > kfree(dev->managed.final_kfree);
> > > > }
> > > >
> > > > @@ -961,6 +957,8 @@ int drm_dev_register(struct drm_device *dev, unsigned long flags)
> > > > if (!driver->load)
> > > > drm_mode_config_validate(dev);
> > > >
> > > > + WARN_ON(!dev->managed.final_kfree);
> > >
> > > That's too aggressive. Driver freeing their private object in .release()
> > > isn't wrong. I'd even go as far as saying that it should be the norm,
> > > until we manage to find a better way to handle that (which this series
> > > doesn't achieve). Many drivers need to allocate resources at probe time
> > > before they get a chance to init the drm device. Those resources must be
> > > released in the error handling paths of probe. By requiring
> > > drmm_add_final_kfree(), you're making that much more complex. I can't
> > > release those resources in the error path anymore after calling
> > > drmm_add_final_kfree(), or they will be released twice. And I can't rely
> > > on drmm_* to release them in all cases, as the error path may be hit
> > > before touching anything drm-related.
> > >
> > > Until we figure out a good way forward and test it on a significant
> > > number of drivers, let's not add WARN_ON() that will be hit with the
> > > majority of drivers, forcing them to be converted to something that is
> > > clearly half-baked.
> >
> > Hm, is this conjecture, or did you actually hit this WARN_ON with a
> > driver? Because I did audit them all, none should hit this, all are
> > fixed up.
>
> I'm sorry, I should have been clear about that. I hit the issue
> yesterday after rebasing the Xilinx ZynqMP DPSUB driver. I took Sam's
> suggestion to embed struct drm_device instead of allocating it
> dynamically, and then hit the WARN_ON. You're of course not responsible
> for a driver that is still out-of-tree. I then looked at how to convert
> other drivers I work on in a similar way (rcar-du and omapdrm in
> particular), and realized it could actually make cleanup more complex to
> always enforce usage of managed memory for everything.
>
> I apologize for the harsh tone of the previous e-mail, you certainly
> didn't deserve that (even more so as I've only reviewed the initial
> version of the series).
>
> > Also, I'm now actually going through all the places where I've added
> > the drmm_add_final_kfree and remove it again, they are _all_ about 5
> > lines after the kzalloc that allocates the driver structure which has
> > drm_device embedded.
> >
> > So I'd like to understand where you get your seemingly rather sure
> > convinction from that this is a horrible mistake here ...
>
> Overall this features simplifies lots of drivers, and, even more
> importantly, remove lots of actual or potential bugs, so it's far from
> horrible. My words were too harsh, and I apologize for that again.
>
> I however still think that before enforcing a model where everything has
> to be managed, we need to try and deploy it to more drivers, especially
> ones that initialize the drm_device fairly late in the probe process.
> That's where it gets painful, as the unwind-style cleanup path needs to
> handle memory free, but as soon as drmm_add_final_kfree() is called,
> some of the code right at the bottom of the unwind stack suddenly needs
> to be skipped. In some cases we can rearrange the code to initialize the
> drm_device earlier, before doing much other initialization that would
> need a cleanup unwind, but it's not always possible. I'm thinking in
> particular about drivers that would expose multiple interfaces and want
> to embed the data structures that correspond to all of them, or about
> drivers based on the component framework (or similar systems). For these
> drivers a manual .release() is needed, and while the current
> implementation of the managed helpers doesn't prevent that, it forbits
> embedding drm_device in situations where there nothing to final_kfree.
I'd need to look in detail at your code, but a few things I've seen
from all other drivers:
- The unroll code shouldn't ever get more complicated. Before you call
drm_dev_init you have to explicitly kfree() your own allocation that
contains drm_device. After that call you have to use drm_dev_put. The
addition of drmm_add_final_kfree has changed nothing of that. Whether
using drmm_add_final_kfree, or having an explicit kfree in your
drm_driver->release callback doesn't change that, in both cases this
kfree will happen when the final drm_dev_put() is called.
- Wrt why is this mandatory? If you unload your driver with KASAN
enabled and have not set the final_kfree pointer, but instead free the
drm_device at the least step in your drm_driver->release hook, you'll
splat. That's why my patch series was so tedious and had to change
everything in a multi-step process, and why I didn't want to blow it
up to 100 patches to also include the removal of drmm_add_final_kfree.
I'm working on that right now, it's somewhere between 40-50 patches on
top (ok so not quite all of them are required, I've done a handful of
drive-by cleanups in some drivers too). So yeah hopefully real soon
the drmm_add_final_kfree should be gone again.
Hope this explains a bit what's going on here, I'm happy to look at
your driver code specifically and come up with ideas how to structure
it. Thus far (I think about 25 drivers in with my devm_drm_dev_alloc
roll-out, which will clean this all up for good) I've not encountered
any surprises.
-Daniel
>
> > > > +
> > > > if (drm_dev_needs_global_mutex(dev))
> > > > mutex_lock(&drm_global_mutex);
> > > >
>
> --
> Regards,
>
> Laurent Pinchart
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
More information about the Intel-gfx
mailing list