simpledrm, running display servers, and drivers replacing simpledrm while the display server is running

Jonas Ådahl jadahl at gmail.com
Fri May 10 13:11:13 UTC 2024


On Fri, May 10, 2024 at 02:45:48PM +0200, Thomas Zimmermann wrote:
> Hi
> 
> > (This was discussed on #dri-devel, but I'll reiterate here as well).
> > 
> > There are two problems at hand; one is the race condition during boot
> > when the login screen (or whatever display server appears first) is
> > launched with simpledrm, only some moments later having the real GPU
> > driver appear.
> > 
> > The other is general purpose GPU hotplugging, including the unplugging
> > the GPU decided by the compositor to be the primary one.
> 
> The situation of booting with simpledrm (problem 2) is a special case of
> problem 1. From the kernel's perspective, unloading simpledrm is the same as
> what you call general purpose GPU hotplugging. Even through there is not a
> full GPU, but a trivial scanout buffer. In userspace, you see the same
> sequence of events as in the general case.

Sure, in a way it is, but the consequence and frequency of occurence is
quite different, so I think it makes sense to think of them as different
problems, since they need different solutions. One is about fixing
userspace components support for arbitrary hotplugging, the other for
mitigating the race condition that caused this discussion to begin with.

> 
> > 
> > The latter is something that should be handled in userspace, by
> > compositors, etc, I agree.
> > 
> > The former, however, is not properly solved by userspace learning how to
> > deal with primary GPU unplugging and switching to using a real GPU
> > driver, as it'd break the booting and login experience.
> > 
> > When it works, i.e. the race condition is not hit, is this:
> > 
> >   * System boots
> >   * Plymouth shows a "splash" screen
> >   * The login screen display server is launched with the real GPU driver
> >   * The login screen interface is smoothly animating using hardware
> >     accelerating, presenting "advanced" graphical content depending on
> >     hardware capabilities (e.g. high color bit depth, HDR, and so on)
> > 
> > If the race condition is hit, with a compositor supporting primary GPU
> > hotplugging, it'll work like this:
> > 
> >   * System boots
> >   * Plymouth shows a "splash" screen
> >   * The login screen display server is launched with simpledrm
> >   * Due to using simpldrm, the login screen interface is not animated and
> >     just plops up, and no "advanced" graphical content is enabled due to
> >     apparent missing hardware capabilities
> >   * The real GPU driver appears, the login screen now starts to become
> >     animated, and may suddenly change appearance due to capabilties
> >     having changed
> > 
> > Thus, by just supporting hotplugging the primary GPU in userspace, we'll
> > still end up with a glitchy boot experience, and it forces userspace to
> > add things like sleep(10) to work around this.
> > 
> > In other words, fixing userspace is *not* a correct solution to the
> > problem, it's a work around (albeit a behaivor we want for other
> > reasons) for the race condition.
> 
> To really fix the flickering, you need to read the old DRM device's atomic
> state and apply it to the new device. Then tell the desktop and applications
> to re-init their rendering stack.
> 
> Depending on the DRM driver and its hardware, it might be possible to do
> this without flickering. The key is to not loose the original scanout
> buffer, while not probing the new device driver. But that needs work in each
> individual DRM driver.

This doesn't sound like it'll fix any flickering as I describe them.
First, the loss of initial animation when the login interface appears is
not something one can "fix", since it has already happened.

Avoiding flickering when switching to the new driver is only possible
if one limits oneself to what simpledrm was capable of doing, i.e. no
HDR signaling etc.

> 
> > 
> > Arguably, the only place a more educated guess about whether to wait or
> > not, and if so how long, is the kernel.
> 
> As I said before, driver modules come and go and hardware devices come and
> go.
> 
> To detect if there might be a native driver waiting to be loaded, you can
> test for
> 
> - 'nomodeset' on the command line -> no native driver

Makes sense to not wait here, and just assume simpledrm forever.

> - 'systemd-load-modules' not started -> maybe wait
> - look for drivers under /lib/modules/<version>/kernel/drivers/gpu/drm/ ->
> maybe wait

I suspect this is not useful for general purpose distributions. I have
43 kernel GPU modules there, on a F40 installation.

> - maybe udev can tell you more
> - it might for detection help that recently simpledrm devices refer to their
> parent PCI device
> - maybe systemd tracks the probed devices

If the kernel already plumbs enough state so userspace components can
make a decent decision, instead of just sleeping for an arbitrary amount
of time, then great. This is to some degree what
https://github.com/systemd/systemd/issues/32509 is about.


Jonas

> 
> Best regards
> Thomas
> 
> > 
> > 
> > Jonas
> > 
> > > The next best solution is to keep the final DRM device open until a new one
> > > shows up. All DRM graphics drivers with hotplugging support are required to
> > > accept commands after their hardware has been unplugged. They simply won't
> > > display anything.
> > > 
> > > Best regards
> > > Thomas
> > > 
> > > 
> > > > Thanks
> > > > 
> > > -- 
> > > --
> > > Thomas Zimmermann
> > > Graphics Driver Developer
> > > SUSE Software Solutions Germany GmbH
> > > Frankenstrasse 146, 90461 Nuernberg, Germany
> > > GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
> > > HRB 36809 (AG Nuernberg)
> > > 
> 
> -- 
> --
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Frankenstrasse 146, 90461 Nuernberg, Germany
> GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
> HRB 36809 (AG Nuernberg)
> 


More information about the dri-devel mailing list