Alternative approach to solve the deferred probe

Wed Oct 21 10:20:42 PDT 2015

On Wed, Oct 21, 2015 at 07:55:29PM +0300, Grygorii Strashko wrote:
> On 10/21/2015 06:36 PM, Frank Rowand wrote:
> > The above is currently the last point for probe to succeed or defer
> > (until possibly, as you mentioned, module loading resolves the defer).
> > If a probe defers above, it will defer again below.  The set of defers
> > should be exactly the same above and below.
> > 
> 
> Unfortunately this is not "the last point for probe to succeed or defer".

Of course it isn't.  Being pedantic, there's actually no such thing,
because the point that the kernel as finished booting can never actually
be determined with things like modules being present.  That's something
I've acknowledged from the start of this.

> There are still a bunch of drivers in Kernel which will be probed at late_initcall() level.
> (like ./drivers/net/ethernet/ti/cpsw.c => late_initcall(cpsw_init);
> Yes - they probably need to be updated to use module_init(), but that's what
> we have now). Those drivers will re-trigger deferred device probing if their
> probe succeeded.

Maybe this particular late_initcall() which triggers off the deferred
probing should be moved to its own really_late_initcall() which happens
as the very last thing - I think this is intended to run after everything
else has had a chance to probe once.

> As result, it is impossible to say when will it happen the 
> "final round of deferred device probing" :( and final list of drivers
> which was "deferred forever" will be know only when kernel exits to
> User space  ("deferred forever" - before loading modules).
> 
> May be, we also can consider adding debug_fs entry which can be used to
> display actual state of deferred_probe_pending_list? 

There are complaints in this thread about the existing deferred probing
implementation being hard to debug - where it's known that a device
has deferred, but it's not known why that happened.

That would be solved by my proposal, as this final round of probing
before entering userspace after _all_ normal device probes have been
attempted once and then we've tried to satisfy the deferred probe
(okay, that's what it's _supposed_ to be - and as it takes three lines
to write it, you'll excuse me if I just use the abbreviated "final
round of deferred probe" which is much shorter - but remember that
the long version is what I actually mean) would produce a list of
not only the devices that failed to probe, but also the cause of the
deferred probes.

My proposal would ensure that subsystems are happier to add these
prints, because in the normal scenario where we have deferred probing,
we're not littering the console log with lots of useless failure
messages which make people stop and think "now did device X probe?"
It also means scripts in our boot farms can more effectively analyse
the log and determine whether the boot was actually successful and
contained no errors.

Merely printing the list of devices which have been deferred is next
to useless.  The next question will always be "why did device X defer?"
and if that can't be answered, it means people having to spend a long
time adding lots of printks to the kernel at lots of -EPROBE_DEFER
returning sites or in the relevant drivers, tracing through the code
back towards the -EPROBE_DEFER sites to try and track it down.

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.