[PATCH v5 10/12] drm/i915: Defer probe if gmux is present but its driver isn't

Thu Feb 18 23:53:51 UTC 2016

> -----Original Message-----
> From: daniel.vetter at ffwll.ch [mailto:daniel.vetter at ffwll.ch] On Behalf Of
> Daniel Vetter
> Sent: Thursday, February 18, 2016 6:11 PM
> To: Lukas Wunner
> Cc: dri-devel; platform-driver-x86 at vger.kernel.org; intel-gfx; Ben Skeggs;
> Deucher, Alexander
> Subject: Re: [PATCH v5 10/12] drm/i915: Defer probe if gmux is present but
> its driver isn't
> 
> On Thu, Feb 18, 2016 at 11:20 PM, Lukas Wunner <lukas at wunner.de> wrote:
> > Hi,
> >
> > On Thu, Feb 18, 2016 at 10:39:05PM +0100, Daniel Vetter wrote:
> >> On Thu, Feb 18, 2016 at 9:34 PM, Lukas Wunner <lukas at wunner.de>
> wrote:
> >> >
> >> >> Ok, makes sense. I still think adding the check to the client_register
> >> >> function would be good, just as a safety measure.
> >> >
> >> > Hm, the idea of calling vga_switcheroo_client_probe_defer() twice
> >> > causes me pain in the stomach. It's surprising for drivers which
> >> > just don't need it at the moment (amdgpu and snd_hda_intel) and
> >> > it feels like overengineering and pampering driver developers
> >> > beyond reasonable measure. Also while the single existing check is
> >> > cheap, we might later on add checks that take more time and slow
> >> > things down.
> >>
> >> It's motivated by Rusty's API Manifesto:
> >>
> >> http://sweng.the-davies.net/Home/rustys-api-design-manifesto
> >
> > Interesting, thank you.
> >
> >
> >> With the mandatory check in _register we reach level 5 - it'll blow up
> >> at runtime when we try to register it.
> >
> > The manifesto says "5. Do it right or it will always break at runtime".
> >
> > However even if we add a
> WARN_ON(vga_switcheroo_client_probe_defer(pdev))
> > to register_client(), it will not *always* spew a stacktrace but only on
> > the machines this concerns (MacBook Pros). Since failure to defer breaks
> > GPU switching, level 5 is already reached. Chances are this won't go
> > unnoticed by the user.
> 
> If we fail the register hopefully the driver checks for that and might
> blow up somewhere in untested error handling code. But there's a good
> chance it'll fail (we can encourage that more by adding must_check to
> the function declaration). In that case you get a nice bug report with
> splat from users hitting this.
> 
> Without this it'll silently work, and all the reports you get is
> "linux is shit, gpu switching doesn't work".
> 
> In both cases it can sometimes succeed, which is not great indeed. But
> I'm trying to fix that by injection EDEFER points artificially
> somehow. Not yet figured out that one.
> 
> But irrespective of the precise failure mode making the defer check
> mandatory by just including it in _register() is better since it makes
> it impossible to forget to call it when its needed. So imo clearly the
> more robust API. And that's my metric for evaluating new API - how
> easy/hard is it to abuse/get wrong.
> 
> >> For more context: We have tons of fun with EPROBE_DEFER handling
> >> between i915 and snd-hda
> >
> > I don't understand, there is currently not a single occurrence of
> > EPROBE_DEFER in i915, apart from the one I added.
> >
> > In sound/ there are 88 occurrences of EPROBE_DEFER in soc/, plus 1 in
> > ppc/ and that's it. So not a single one in pci/hda/ where hda_intel.c
> > resides.
> >
> > Is the fun with EPROBE_DEFER handling caused by the lack thereof?
> 
> Yes, there's one instance where i915 shoudl defer missing. The real
> trouble is that snd-hda has some really close ties with i915, and
> resolves those with probe-defer. And blows up all the time since we
> started using this, and with hdmi/dp you really always have to test
> both together in CI, snd-hda is pretty much a part of the intel gfx
> driver nowadays. Deferred probing is ime real trouble.

To further complicate things, AMD dGPUs have HDA audio on board as well.

Alex