[PATCH] drm/nouveau/fbcon: fix deadlock with FBIOPUT_CON2FBMAP
Peter Wu
peter at lekensteyn.nl
Fri Jul 15 11:10:51 UTC 2016
On Wed, Jul 13, 2016 at 04:57:19PM +0200, Daniel Vetter wrote:
> On Wed, Jul 13, 2016 at 02:40:50PM +0200, Peter Wu wrote:
> > On Wed, Jul 13, 2016 at 11:54:49AM +0200, Daniel Vetter wrote:
> > > On Tue, Jul 12, 2016 at 06:49:34PM +0200, Peter Wu wrote:
> > > > The FBIOPUT_CON2FBMAP ioctl takes a console_lock(). When this is called
> > > > while nouveau was runtime suspended, a deadlock would occur due to
> > > > nouveau_fbcon_set_suspend also trying to obtain console_lock().
> > > >
> > > > Fix this by delaying the drm_fb_helper_set_suspend call. Based on the
> > > > i915 code (which was done for performance reasons though).
> > > >
> > > > Cc: Chris Wilson <chris at chris-wilson.co.uk>
> > > > Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
> > > > Signed-off-by: Peter Wu <peter at lekensteyn.nl>
> > > > ---
> > > > Tested on top of v4.7-rc5, the deadlock is gone.
> > >
> > > If we bother with this, it should imo be moved into the drm_fb_helper.c
> > > function drm_fb_helper_set_suspend(). But this also smells like some kind
> > > of bad duct-tape. I think Lukas is working on some other rpm vs. fbdev
> > > deadlocks, maybe we could fix them all with one proper fix? I've made some
> > > comments on Lukas' last patch series.
> >
> > This patch is only needed for drivers that use console_lock (for
> > drm_fb_helper_set_suspend) in their runtime resume functions.
> > Lukas posted fixes for runtime PM reference leaks, those are different
> > from this deadlock (see
> > https://lists.freedesktop.org/archives/dri-devel/2016-July/113005.html
> > for a backtrace for this issue).
> >
> > The deadlock could also be avoided if the device backing the fbcon is
> > somehow runtime-resumed outside the lock, but that feels like a larger
> > hack that does not seem easy.
> >
> > The i915 patch was done to reduce resume time (due to console_lock
> > contention), that feature seems useful to all other drivers too even if
> > the deadlock is fixed in a different way.
>
> I might have imagined something, but I thought Lukas is already working on
> some rpm vs. vga_switcheroo inversions. But looking again this was on the
> audio side.
>
> I think the proper solution for fbcon would be for the fbdev emulation to
> also hold a runtime pm references when the console is in use.
nouveau does this by adding a fb_open and fb_release function that calls
pm_runtime_get and pm_runtime_put respectively (and is the only driver
doing this). So that is why it causes the console_lock issue for
nouveau, but not for others.
> This should already happen correctly for vblank, the more tricky part
> is fbdev mmap and fbcon:
>
> - I have no idea, but there should be a way to intercept fbdev userspace
> mmaps and make sure that as long as an app has the fbdev mmapped we
> don't runtime suspend. No one really should be doing that (at least for
> normal setups), hence this shouldn't result in unsafe.
mmap normally needs a fd right? When the chardev /dev/fbX is opened, the
fb_open function will be called. So nouveau should not have this issue
with mmap/read/write to a fb while the device is suspended.
(This RPM thing was added in f231976c2e89 ("drm/nouveau/fbcon: take
runpm reference when userspace has an open fd"), maybe it is not a bad
idea for other drivers with RPM support either.)
> - For fbcon, we could suspend it in the dpms off callbacks (maybe with a
> timer), and resume it only when enabling fbcon again - fbcon needs to
> redraw anyway on dpms on.
Would this guarantee that the fb is suspended before/during suspend (of
the graphics device) and resumed somewhere during/after resume?
Suspending the fb also has the effect that reads/writes to /dev/fbN
fail, maybe that is a bit too restricted since the framebuffer is just
accessible until the device is suspended.
(Hmm, fb_read/fb_write does not seem to do any locking...)
> Another solution for fbcon might be to untangle the suspend/resume stuff
> and protect it by something else than console_lock. But that means
> fixing up fbcon locking horror shows.
console_lock seems needed for some code down the call stack, removing it
risks some blow ups.
Some archaeology: this locking problem was introduced with 054430e773c9
("fbcon: fix locking harder"). In the past fb_set_suspend also took the
fb_info lock but that was removed in 9e769ff3f585 ("fb: avoid possible
deadlock caused by fb_set_suspend").
Peter
> > My current plan is to move stuff out of the lock and allow (just)
> > resuming the console to be delayed. Some drivers (nouveau,
> > radeon/amdgpu, i915) do unnecessary stuff under the console lock:
> >
> > - nouveau: I *think* that cleraing/setting FBINFO_HWACCEL_DISABLED
> > (nouveau_fbcon_accel_restore) is safe outside the lock as the fb is
> > already suspended before clearing/after setting the flag.
> > - radeon: since the console is suspended, I don't think that that all
> > of the code is radeon_resume_kms is really needed.
> > - amdgpu: same as radeon. Btw, console_lock is leaked on an error path.
> > - i915: I think that clearing the fb memory can be done outside the
> > lock too as the console is suspended.
> >
> > Please correct me if my assumptions are flawed.
>
> Yeah, fixing this independent issues should definitely help, irrespective
> of how we fix fb_set_suspend.
>
> > > Besides this, when fixing a deadlock pls provide more details about the
> > > precise callchain and the locks involved in the deadlock. If you
> > > discovered this using lockdep, then just add the entire lockdep splat to
> > > the commit message. Otherwise there's lots of guesswork involved here.
> > > -Daniel
> >
> > There was no lockdep splat, it was triggered via the ioctl in the commit
> > message. I'll include the verbose trace from the previous mail in the
> > next proposed patch to reduce hunting though.
>
> Sounds good too.
> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
More information about the dri-devel
mailing list