[PATCH 5/9] drm/i915: Associate ACPI connector nodes with connector entries

Wed May 5 10:02:35 UTC 2021

On Wed, May 5, 2021 at 12:28 PM Hans de Goede <hdegoede at redhat.com> wrote:
> On 5/5/21 11:17 AM, Andy Shevchenko wrote:
> > On Wed, May 5, 2021 at 12:07 PM Hans de Goede <hdegoede at redhat.com> wrote:
> >> On 5/4/21 9:52 AM, Andy Shevchenko wrote:
> >>> On Monday, May 3, 2021, Hans de Goede <hdegoede at redhat.com <mailto:hdegoede at redhat.com>> wrote:
> >
> > ...
> >
> >>>     +               fwnode = device_get_next_child_node(kdev, fwnode);
> >
> >>> Who is dropping reference counting on fwnode ?
> >>
> >> We are dealing with ACPI fwnode-s here and those are not ref-counted, they
> >> are embedded inside a struct acpi_device and their lifetime is tied to
> >> that struct. They should probably still be ref-counted (with the count
> >> never dropping to 0) so that the generic fwnode functions behave the same
> >> anywhere but atm the ACPI nodes are not refcounted, see: acpi_get_next_subnode()
> >> in drivers/acpi/property.c which is the get_next_child_node() implementation
> >> for ACPI fwnode-s.
> >
> > Yes, ACPI currently is exceptional, but fwnode API is not.
> > If you may guarantee that this case won't ever be outside of ACPI
>
> Yes I can guarantee that currently this code (which is for the i915
> driver only) only deals with ACPI fwnodes.
>
> > and
> > even though if ACPI won't ever gain a reference counting for fwnodes,
> > we can leave it as is.
>
> Would it not be better to add fake ref-counting to the ACPI fwnode
> next_child_node() op though. I believe just getting a reference
> on the return value there should work fine; and then all fwnode
> implementations would be consistent ?

But it's already there by absent put/get callbacks. On fwnode level it
is like you described. So, talking for a good pattern we have to call
the fwnode_handle_put() independently and always for for_each_child
and get_next_child usages.

> (note I did not check that the of and swnode code do return
> a reference but I would assume so).

Yes, it's only ACPI that survives w/o reference counting.

> >>> I’m in the middle of a pile of fixes for fwnode refcounting when for_each_child or get_next_child is used. So, please double check you drop a reference.
> >>
> >> The kdoc comments on device_get_next_child_node() / fwnode_get_next_child_node()
> >> do not mention anything about these functions returning a reference.
> >
> > It's possible. I dunno if it had to be done earlier. Sakari?
> >
> >> So I think we need to first make up our mind here how we want this all to
> >> work and then fix the actual implementation and docs before fixing callers.
> >
> > We have already issues, so I prefer not to wait for a documentation
> > update, because for old kernels it will still be an issue.
>
> I wonder if we really have issues though, in practice fwnodes are
> generated from an devicetree or ACPI tables (or by platform codes
> adding swnodes) and then these pretty much stick around for ever.

Overlays. Not for ever.

> IOW the initial refcount of 1 is never dropped at least for of-nodes
> and ACPI nodes.

>  I know there are some exceptions like device-tree
> overlays which I guess may also be dynamically removed again, but those
> exceptions are not widely used.

ACPI overlays are quite used (at least by two people I know and a few
more that asked questions about them here and there), but luckily it
doesn't require refcounting (yet?).

> And if we forget to drop a reference in the worst case we have a small
> non-re-occuring (so not growing) memleak.

And is it good to provoke all kinds of tools (kmemleak, *SANs, etc)? I
do not think so. If we are writing good code it should be good enough.

> Where as if we start adding
> put() calls everywhere we may end up freeing things which are still
> in use; or dropping refcounts below 0 triggering WARNs in various
> places (IIRC).

Which is good. Then we will discover real issues.

> So it seems the cure is potentially worse then the disease in this
> case.

I tend to disagree with you. How in this case we can go below 0 in
case we know that we took a counter? If somewhere else the code will
do that, it is a problem that has to be fixed on case-by-case basis.

> So if you want to work on this, then IMHO it would be best to first make
> sure that all the fwnode implementations behave in the same way wrt
> ref-counting, before adding the missing put() calls in various
> places.
>
> And once the behavior is consistent

It's consistent now independently of the beneath layer from fwnode API p.o.v.

> then we can also document this
> properly making it easier for other people to do the right thing
> when using these functions.

-- 
With Best Regards,
Andy Shevchenko