[PATCH v2 00/22] Enable gpu switching on the MacBook Pro

Wed Aug 26 07:01:43 PDT 2015

Hi Daniel,

On Tue, Aug 25, 2015 at 10:21:20AM +0200, Daniel Vetter wrote:
> On Tue, Aug 25, 2015 at 09:36:47AM +0200, Lukas Wunner wrote:
> > I've overhauled locking once more because previously all EDID reads
> > were serialized even on machines which don't use vga_switcheroo at all.
> > Seems it's impossible to fix this with mutexes and still be race-free
> > and deadlock-free, so I've changed vgasr_mutex to an rwlock:
> > https://github.com/l1k/linux/commit/ffa4546d85244a53541eed6799f93e8eea7f29e8
> > 
> > There are a number of vga_switcheroo functions added by Takashi Iwai
> > and yourself which don't lock anything at all, I believe this was in
> > part to avoid deadlocks (vga_switcheroo_runtime_resume_hdmi_audio locks
> > vgasr_mutex, wakes up the GPU with vga_switcheroo_runtime_resume
> > which locks vgasr_mutex once again). After changing vgasr_mutex to an
> > rwlock we can safely use locking in those functions as well:
> > https://github.com/l1k/linux/commit/cdea8f5c92039329dde47cadf20b67c7d9af0d62
> > 
> > With this change, on machines which don't use vga_switcheroo, the
> > overhead in drm_get_edid() is exactly 1 read_lock + 1 read_unlock
> > and EDID reads can happen in parallel. Thierry Reding and Alex Deucher
> > are in favor of adding a separate drm_get_edid_switcheroo() and changing
> > the relevant drivers (i915, nouveau, radeon, amdgpu) to use that in lieu
> > of drm_get_edid() so that drivers which don't use vga_switcheroo avoid
> > the lock_ddc/unlock_ddc calls. Performance-wise these additional calls
> > should be negligible, so I think the motivation can only be better
> > readability/clarity. One should also keep in mind that drivers which
> > don't use vga_switcheroo currently might do so in the future, e.g.
> > if mobile GPUs use a big.LITTLE concept like ARM CPUs already do.
> 
> Just a quick aside: Switching to rwlocks to make lockdep happy is a
> fallacy - lockdep unfortunately doesn't accurately track read lock
> depencies. Which means very likely you didn't fix the locking inversions
> but just made lockdep shut up about them. [...]
> I'd highly suggest you try fixing the vga-switcheroo locking without
> resorting to rw lock primitives - that just means we need to manually
> prove the locking for many cases which is tons of work.

mutex is not the right tool for the job:

To make DDC switching bullet proof you need to block the handler from
unregistering between lock_ddc and unlock_ddc. Solution: ddc_lock grabs
a mutex, ddc_unlock releases it, unregister_handler grabs that same lock.

Downside: All EDID reads are serialized, you can't probe EDID in parallel
if you have multiple displays. Which is ugly.

One could be tempted to "solve" this for non-muxed machines by immediately
releasing the lock in lock_ddc if there's no handler, and by checking in
unlock_ddc if mutex_is_locked before releasing it. But on muxed machines
this would be racy: GPU driver A calls lock_ddc, acquires the mutex, sees
there's no handler registered, releases the mutex. Between this and the
call to unlock_ddc, a handler registers and GPU driver B calls lock_ddc.
This time the mutex is actually locked and when driver A calls unlock_ddc,
it will release it, even though the other driver had acquired it.

Of course one could come up with all sorts of ugly hacks like remembering
for which client the mutex was acquired, but this wouldn't be atomic and
100% bullet proof, unlike rwlocks.

So it seems there's no way with mutexes to achieve parallel EDID reads
and still be race-free and deadlock-free.

rwlocks have the right semantics for this use case: Registering and
unregistering requires write access to the lock, thus happens exclusively.
In lock_ddc we acquire read access to the rwlock and in unlock_ddc we
release it. So the handler can't disappear while we've switched DDC.
We use a mutex ddc_lock to lock the DDC lines but we only acquire that
lock if there's actually a handler. So on non-muxed machines, EDID reads
*can* happen in parallel, there's just the (negligible) overhead of
1 read_lock + 1 read_unlock:
https://github.com/l1k/linux/commit/d0b6f659ae8f4b8b94f4b3ded9fc80e93950d0b3

Best regards,

Lukas