etnaviv fails probe early, but succeeds after rmmod && modprobe
Saravana Kannan
saravanak at google.com
Mon Apr 19 18:36:49 UTC 2021
On Mon, Apr 19, 2021 at 11:19 AM John Stultz <john.stultz at linaro.org> wrote:
>
> On Sat, Apr 17, 2021 at 3:52 AM Ing. Josua Mayer <josua.mayer at jm0.eu> wrote:
> > Am 17.04.21 um 05:46 schrieb John Stultz:
> > > On Fri, Apr 16, 2021 at 4:13 AM Lucas Stach <l.stach at pengutronix.de> wrote:
> > >> Am Freitag, dem 16.04.2021 um 12:57 +0200 schrieb Ing. Josua Mayer:
> > >>> Hi Lucas,
> > >>>
> > >>> anatop_regulator is indeed a module currently,
> > >>> this is one of the changes introduced in their jump from kernel 5.9 to
> > >>> 5.10 - and has even landed in buster through backports ...
> > >>>
> > >>> I wonder how / where those timeouts are specified.
> > >>> Regarding the order of module loading there is not much I can do, it is
> > >>> already the second module inserted by the initramfs.
> > >>
> > >> This looks like a kernel bug to me. If no timeout is given on the
> > >> command line, the status is immediately considered as timed-out after
> > >> the initcalls are done, as the code doesn't differentiate between "no
> > >> timeout given" and "timeout expired" at that point.
> > >>
> > >> CC'ing John Stultz, who touched that code last.
> > >
> > > Yea, sadly my attempts to try to stretch the default timeouts so this
> > > wouldn't happen ended up causing problems for the "optional links"
> > > case, where folks want the driver core to stop deferring and return an
> > > error for the bits that aren't present. So we had to back out most of
> > > those changes (so yes, I touched it last, but unfortunately had to to
> > > put things mostly back the way it was).
> > >
> > Thank you for your comments!
> > I am sad to hear that adjusting timeouts was not a path of success ...
> >
> > > Personally, I think the implicit optional link concept in dts was a
> > > mistake, as I think having some explicit notation would have made
> > > things work a lot better since the timeout solution does not feel
> > > ideal for anyone, but I also am (happily) not the expert there, so I
> > > probably shouldn't judge. :)
> > >
> > > In the end, it seems the fw_devlink logic Saravana is working on is
> > > really the better solution. I know he's getting closer to being able
> > > to set it as the default, so you might check that out?
> > >
> > > thanks
> > > -john
> > >
> >
> > So from all these pointers I finally played with the
> > deferred_probe_timeout kernel parameter. Maybe something just needed
> > more time? Well ... for some reason passing 20 made it so that etnaviv
> > eventually probes successfully!
> >
> > I am attaching the full dmesg for reference - note that all modules are
> > in initramfs only for debugging purposes, this is not the default debian
> > split.
> >
> > Is there really a difference between specifying and not specifying
> > deferred_probe_timeout which is described as a debugging feature?
>
> The timeout value just specifies how long after init starts that
> modules with missing dependency links will return EPROBE_DEFER. After
> which, the missing links will return EPROBE_TIMEOUT and if the link is
> optional the driver will be able to load ok, but if the link is not
> optional, the module load will fail.
>
> Unfortunately with modules, it's easy for a dependency to be loaded
> late or from storage mounted after init, so the timeout just allows
> more time for the dependencies to be loaded. However, it also means
> modules with optional links have to wait around a bit longer before
> they give up on optional dependencies that will never show up (and in
> some cases, that delay can cause other problems - which is why the
> default timeout couldn't be extended to something more reasonable).
>
> So again, the better solution is Saravana's fw_devlink which uses the
> DTS graph to help load modules in the proper order to avoid missing
> dependencies.
John,
Thanks for adding me. Yeah, fw_devlink=on is the default in
driver-core-next. I've fixed some minor issues that came up and also
improved deferred_probe_timeout to be smarter (without touching that
code much). No new issues have come up so far. So, hopefully, it'll
land this time.
Josua,
If you are using 5.12, you should be able to pass fw_devlink=on for it
to kick in and do its thing in terms of ordering probes correctly. You
might also want to set fw_devlink.strict=1 if all your iommu and dma
dependencies mentioned in your DT should be treated as mandatory.
Other than that, it should just work if your device is the average
case. You should not need to set any timeouts. I've also handled some
additional corner cases and improved the deferred_probe_timeout to be
smarter in driver-core-next. So if you still have issues on 5.12, try
testing linux-next and as an absolute last resort, try timeout. The
timeout is only needed if you'll have suppliers that'll never have any
modules loaded for them but you still want some of their consumers to
load and work.
Hope that helps.
-Saravana
More information about the etnaviv
mailing list