[Intel-gfx] [PATCH v5 10/12] drm/i915: Defer probe if gmux is present but its driver isn't

Daniel Vetter daniel at ffwll.ch
Thu Feb 18 23:11:23 UTC 2016


On Thu, Feb 18, 2016 at 11:20 PM, Lukas Wunner <lukas at wunner.de> wrote:
> Hi,
>
> On Thu, Feb 18, 2016 at 10:39:05PM +0100, Daniel Vetter wrote:
>> On Thu, Feb 18, 2016 at 9:34 PM, Lukas Wunner <lukas at wunner.de> wrote:
>> >
>> >> Ok, makes sense. I still think adding the check to the client_register
>> >> function would be good, just as a safety measure.
>> >
>> > Hm, the idea of calling vga_switcheroo_client_probe_defer() twice
>> > causes me pain in the stomach. It's surprising for drivers which
>> > just don't need it at the moment (amdgpu and snd_hda_intel) and
>> > it feels like overengineering and pampering driver developers
>> > beyond reasonable measure. Also while the single existing check is
>> > cheap, we might later on add checks that take more time and slow
>> > things down.
>>
>> It's motivated by Rusty's API Manifesto:
>>
>> http://sweng.the-davies.net/Home/rustys-api-design-manifesto
>
> Interesting, thank you.
>
>
>> With the mandatory check in _register we reach level 5 - it'll blow up
>> at runtime when we try to register it.
>
> The manifesto says "5. Do it right or it will always break at runtime".
>
> However even if we add a WARN_ON(vga_switcheroo_client_probe_defer(pdev))
> to register_client(), it will not *always* spew a stacktrace but only on
> the machines this concerns (MacBook Pros). Since failure to defer breaks
> GPU switching, level 5 is already reached. Chances are this won't go
> unnoticed by the user.

If we fail the register hopefully the driver checks for that and might
blow up somewhere in untested error handling code. But there's a good
chance it'll fail (we can encourage that more by adding must_check to
the function declaration). In that case you get a nice bug report with
splat from users hitting this.

Without this it'll silently work, and all the reports you get is
"linux is shit, gpu switching doesn't work".

In both cases it can sometimes succeed, which is not great indeed. But
I'm trying to fix that by injection EDEFER points artificially
somehow. Not yet figured out that one.

But irrespective of the precise failure mode making the defer check
mandatory by just including it in _register() is better since it makes
it impossible to forget to call it when its needed. So imo clearly the
more robust API. And that's my metric for evaluating new API - how
easy/hard is it to abuse/get wrong.

>> For more context: We have tons of fun with EPROBE_DEFER handling
>> between i915 and snd-hda
>
> I don't understand, there is currently not a single occurrence of
> EPROBE_DEFER in i915, apart from the one I added.
>
> In sound/ there are 88 occurrences of EPROBE_DEFER in soc/, plus 1 in
> ppc/ and that's it. So not a single one in pci/hda/ where hda_intel.c
> resides.
>
> Is the fun with EPROBE_DEFER handling caused by the lack thereof?

Yes, there's one instance where i915 shoudl defer missing. The real
trouble is that snd-hda has some really close ties with i915, and
resolves those with probe-defer. And blows up all the time since we
started using this, and with hdmi/dp you really always have to test
both together in CI, snd-hda is pretty much a part of the intel gfx
driver nowadays. Deferred probing is ime real trouble.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch


More information about the Intel-gfx mailing list