[RFT PATCH 03/15] drm/ingenic: Call drm_atomic_helper_shutdown() at shutdown time

Doug Anderson dianders at chromium.org
Wed Sep 13 16:23:29 UTC 2023


Hi,

On Wed, Sep 6, 2023 at 1:39 AM Maxime Ripard <mripard at kernel.org> wrote:
>
> On Tue, Sep 05, 2023 at 01:16:08PM -0700, Doug Anderson wrote:
> > > > ---
> > > > This commit is only compile-time tested.
> > > >
> > > > NOTE: this patch touches a lot more than other similar patches since
> > > > the bind() function is long and we want to make sure that we unset
> > > > the
> > > > drvdata if bind() fails.
> > > >
> > > > While making this patch, I noticed that the bind() function of this
> > > > driver is using "devm" and thus assumes it doesn't need to do much
> > > > explicit error handling. That's actually a bug. As per kernel docs
> > > > [1]
> > > > "the lifetime of the aggregate driver does not align with any of the
> > > > underlying struct device instances. Therefore devm cannot be used and
> > > > all resources acquired or allocated in this callback must be
> > > > explicitly released in the unbind callback". Fixing that is outside
> > > > the scope of this commit.
> > > >
> > > > [1] https://docs.kernel.org/driver-api/component.html
> > > >
> > >
> > > Noted, thanks.
> >
> > FWIW, I think that at least a few other DRM drivers handle this by
> > doing some of their resource allocation / acquiring in the probe()
> > function and then only doing things in the bind() that absolutely need
> > to be in the bind. ;-)
>
> That doesn't change much. The fundamental issue is that the DRM device
> sticks around until the last application that has an open fd to it
> closes it.
>
> So it doesn't have any relationship with the unbind/remove timing, and
> for all we know it can be there indefinitely, while the application
> continues to interact with the driver.

I spent some time thinking about similar issues recently and, assuming
my understanding is correct, I'd at least partially disagree.

Specifically, I _think_ the only thing that's truly required to remain
valid until userspace closes the last open "fd" is the memory for the
"struct drm_device" itself, right? My understanding is that this is
similar to how "struct device" works. The memory backing a "struct
device" has to live until the last client releases a reference to it
even if everything else about a device has gone away. So if it was all
working perfectly then if the Linux driver backing the "struct
drm_device" goes away then we'd release resources and NULL out a bunch
of stuff in the "struct drm_device" but still keep the actual "struct
drm_device" around since userspace still has a reference. Pretty much
all userspace calls would fail, but at least they wouldn't crash. Is
that roughly the gist?

Assuming that's correct, then _most_ of the resource acquiring /
memory allocation can still happen in the device probe() routine and
can still use devm as long as we do something to ensure that any
resources released are no longer pointed to by anything in the "struct
drm_device".

To make it concrete, I think we want this (feel free to correct). For
simplicity, I'm assuming a driver that _doesn't_ use the component
framework:

a) Linux driver probe() happens. The "struct drm_device" is allocated
in probe() by devm_drm_dev_alloc(). This takes a reference to the
"struct drm_device". The device also acquires resources / allocates
memory.

b) Userspace acquires a reference to the "struct drm_device". Refcount
is now 2 (one from userspace, one from the Linux driver).

c) The Linux driver unbinds, presumably because userspace requested
it. From earlier I think we decided that we can't (by design) block
unbind. Once unbind happens then we shouldn't try to keep operating
the device and the driver should stop running. As part of the unbind,
the remove() is called and also "devm" resources are deallocated. If
any of the things freed are pointed to by the "struct drm_device" then
the code needs to NULL them out at this time. Also we should make sure
that any callback functions that userspace could cause to be invoked
return errors. Our code could go away at any point here since
userspace could "rmmod" our module.

d) Eventually userspace releases the reference and the "struct
drm_device" memory gets automatically freed because it was allocated
by devm_drm_dev_alloc()


NOTE: potentially some things could be allocated / managed by
drmm_xyz() function, like drmm_kmalloc() and that could simplify some
things. However, it's not a panacea for everything. Specifically once
the Linux driver unbind finishes then the device isn't functional
anymore.



-Doug


More information about the dri-devel mailing list