KMS timings (Re: [PATCH 6/8] drm/bochs: phase 3: provide a custom ->atomic_commit implementation)

Tue Jul 21 00:06:09 PDT 2015

On Mon, 20 Jul 2015 10:32:31 -0700
Stéphane Marchesin <stephane.marchesin at gmail.com> wrote:

> On Mon, Jul 20, 2015 at 7:21 AM, Daniel Vetter <daniel at ffwll.ch> wrote:
> > On Mon, Jul 20, 2015 at 12:35:48PM +0300, Pekka Paalanen wrote:
> >> On Mon, 20 Jul 2015 01:58:33 -0700
> >> Stéphane Marchesin <stephane.marchesin at gmail.com> wrote:
> >>
> >> > On Mon, Jul 20, 2015 at 12:46 AM, Pekka Paalanen <ppaalanen at gmail.com> wrote:
> >> > > On Sun, 19 Jul 2015 17:20:32 -0700
> >> > > Stéphane Marchesin <stephane.marchesin at gmail.com> wrote:
> >> > >
> >> > >> On Thu, Jul 16, 2015 at 11:08 PM, Pekka Paalanen <ppaalanen at gmail.com> wrote:
> >> > >> >
> >> > >> > On Thu, 16 Jul 2015 20:20:39 +0800
> >> > >> > John Hunter <zhjwpku at gmail.com> wrote:
> >> > >> >
> >> > >> > > From: Zhao Junwang <zhjwpku at gmail.com>
> >> > >> > >
> >> > >> > > This supports the asynchronous commits, required for page-flipping
> >> > >> > > Since it's virtual hw it's ok to commit async stuff right away, we
> >> > >> > > never have to wait for vblank.
> >> > >> >
> >> > >> > Hi,
> >> > >> >
> >> > >> > in theory, yes. This is what a patch to bochs implemented not too long
> >> > >> > ago, so AFAIK you are only replicating the existing behaviour.
> >> > >> >
> >> > >> > However, if userspace doing an async commit (or sync, I suppose) does
> >> > >> > not incur any waits in the kernel in e.g. sending the page flip event,
> >> > >> > then flip driven programs (e.g. a Wayland compositor, say, Weston)
> >> > >> > will be running its rendering loop as a busy-loop, because the kernel
> >> > >> > does not throttle it to the (virtual) display refresh rate.
> >> > >> >
> >> > >> > This will cause maximal CPU usage and poor user experience as
> >> > >> > everything else needs to fight for CPU time and event dispatch to get
> >> > >> > through, like input.
> >> > >> >
> >> > >> > I would hope someone could do a follow-up to implement a refresh cycle
> >> > >> > emulation based on a clock. Userspace expects page flips to happen at
> >> > >> > most at refresh rate when asking for vblank-synced flips. It's only
> >> > >> > natural for userspace to drive its rendering loop based on the vblank
> >> > >> > cycle.
> >> > >>
> >> > >>
> >> > >> I've been asking myself the same question (for the UDL driver) and I'm
> >> > >> not sure if this policy should go in the kernel. After all, there
> >> > >> could be legitimate reasons for user space to render lots of frames
> >> > >> per second. It seems to me that if user space doesn't want too many
> >> > >> fps, it should just throttle itself.
> >> > >
> >> > > If userspace wants to render lots of frames per second, IMO it should
> >> > > not be using vblank-synced operations in a way that may throttle it.
> >> > > The lots of frames use case is already non-working for the majority of
> >> > > the drivers without DRM_MODE_PAGE_FLIP_ASYNC, right?
> >> > >
> >> > > The problem here I see is that one DRM driver decides to work different
> >> > > to other DRM drivers. All real-hardware DRM drivers, when asked to do
> >> > > vblank-synced update, actually do throttle to the vblank AFAIK.
> >> >
> >> > udl is an exception here. It is (arguably) real hardware but doesn't throttle.
> >> >
> >> > > Is it
> >> > > too much to assume, that the video mode set in a driver (refresh rate)
> >> > > corresponds to the vblank rate which implicitly delays the completion
> >> > > of vblank-sync'd operations to at least the next vblank boundary?
> >> >
> >> > I think it's wrong to make user space think that a vsynced display
> >> > always matches the refresh rate in a world where:
> >> >
> >> > - some displays have variable refresh rates (not just the fancy new
> >> > stuff like g-sync, look for lvds_downclock in the intel driver for
> >> > example, also consider DSI displays)
> >> >
> >> > - some displays have no refresh rate (the ones we are talking about
> >> > here: udl, bochs...)

> > Imo aiming for vrefresh to be accurate is good. For gsync and friends I
> > think we should have an explicit range or flag to make userspace aware of
> > what's going on.
> 
> I think the concept of vrefresh is flawed and not really future-proof
> (I gave a few examples in my previous email). I agree we should keep
> it as legacy, but we should add something else for the more advanced
> cases.

Right, so let's add something new for new hardware features and keep
the existing behavior existing.

I suppose the problem is that the existing behavior is not really
documented so we have to resort to screaming users?

If one does not ask for ASYNC with a page flip, does it mean flipping
on the next vblank, or flipping such that it cannot tear but allowing
techiques like scanline waits?

It used to be reasonable to assume a constant refresh rate apart from
explicit mode changes. Should we keep this assumption the default and
add API to say different?

> >> > > I think, if the driver cannot implement proper semantics (which IMO
> >> > > includes the throttling) for vblank-sync'd operations and it does not
> >> > > want to fake them with a clock, it should just refuse vblank-synced
> >> > > operations.
> >> >
> >> > Yes refusing vsynced flips for these drivers sounds reasonable. But
> >> > please let's not bake in another assumption in the API (or rather,
> >> > let's try to un-bake it).
> >>
> >> Could you be more specific on everything, please?
> >>
> >> What should drivers do in different situations, what guarantees we do
> >> have, and how does userspace predict the earliest possible flip time?
> >> How do you define flip time to begin with, if it's not tied to the
> >> scanout cycle (vblank)?
> >>
> >> How should a compositor schedule eveything, and what can it tell to the
> >> clients about the timings in the immediate future?
> >>
> >> You gave me the feeling that everything I thought I knew and relied on
> >> is wrong.
> >
> > I guess we either kick out page_flip for all drivers who fake it. And if
> > that's causing regressions then we probably want to fake it with a timer.
> > Unpretty, but such is the game of backwards compat forever. But I'm not
> > sure whether we established that we have a problem already, at least I'm
> > missing users screaming about udl/bochs & friends.

I suppose not hearing users scream is that X.org is not affected,
because it simply doesn't work in a way it would be affected? Or the
various DDX'en. The combination of affected userspace with drivers like
bochs is even more rare.

I first heard about Weston having an abysmal user experience on
drm/bochs when a co-worker was looking at testing Weston in
QEMU/stdvga, which according to others was supposed to be the best
working QEMU output / DRM KMS driver combination. Might have something
to do with needing it on a virtual ARM cpu.

Weston in a VM has not been too attractive before, because of the
missing EGL platform support for swrast, but that has been fixed
recently. I've also heard users (RebeccaBlackOS) to just revert to
Weston's fbdev backend, when the DRM backend just doesn't work right in
a VM.

So I would say it is a known problem with Weston, but users tend to
just dismiss it rather than start pushing it forward. Whether you count
Weston as an existing user in the first place is up to you, I suppose.

I asked Jasper about Mutter:
< pq> Jasper, trying to find users of the KMS API who would work badly,
      if page flips were signalled always immediately
< Jasper> pq, oh, god, us.
< Jasper> pq, we'd spin ourselves in a paint loop to death

I think there might be two different problems here: a) signalling page
flips immediately, and b) "variable refresh rate" systems / on-demand
updates like g-sync, UDL, etc.

My immediate concern is to outlaw immediate signalling of operations
that are intended to be vblank-synced (those that have
traditionally taken time to complete on real hardware, if you don't
like the definition of vblank-synced).

The problem with variable refresh rate systems is unpredictability,
which is different. Submitting updates will take some time in any case,
so you cannot really loop to death (I hope). There I would be happy to
just know that predictions based on perpetual refresh rate are invalid.
That would be enough for Wayland at first.

Stéphane also raised the concern that scanout downclocking etc. may
cause flips to be quite late from predicted. This is less of a problem,
rendering can also take arbitrary times and software is usually written
to deal with missing deadlines. It's a problem that happens all the
time anyway. As it is a problem with prediction accuracy, we could just
wave it off by setting the imaginary "no constant refresh rate" flag.
IMHO this is a problem that should be solved at another occasion, if
necessary.

> Yeah I don't think I care about the old interface, it is what it is.
> But we should design something which works well for the future use
> cases.

Are you referring to the atomic API as the old thing?

In the end, the question is how paranoid should display servers be
about the timings? Do we need to fix Weston, Mutter and likely others
to deal with page flips being signalled immediately, or is it more
appropriate to fix the DRM drivers to keep up the illusion of providing
what the API appears to suggest?

Thanks,
pq