KMS timings (Re: [PATCH 6/8] drm/bochs: phase 3: provide a custom ->atomic_commit implementation)

Tue Jul 21 02:02:58 PDT 2015

On Tue, Jul 21, 2015 at 10:06:09AM +0300, Pekka Paalanen wrote:
> On Mon, 20 Jul 2015 10:32:31 -0700
> Stéphane Marchesin <stephane.marchesin at gmail.com> wrote:
> 
> > On Mon, Jul 20, 2015 at 7:21 AM, Daniel Vetter <daniel at ffwll.ch> wrote:
> > > On Mon, Jul 20, 2015 at 12:35:48PM +0300, Pekka Paalanen wrote:
> > >> On Mon, 20 Jul 2015 01:58:33 -0700
> > >> Stéphane Marchesin <stephane.marchesin at gmail.com> wrote:
> > >>
> > >> > On Mon, Jul 20, 2015 at 12:46 AM, Pekka Paalanen <ppaalanen at gmail.com> wrote:
> > >> > > On Sun, 19 Jul 2015 17:20:32 -0700
> > >> > > Stéphane Marchesin <stephane.marchesin at gmail.com> wrote:
> > >> > >
> > >> > >> On Thu, Jul 16, 2015 at 11:08 PM, Pekka Paalanen <ppaalanen at gmail.com> wrote:
> > >> > >> >
> > >> > >> > On Thu, 16 Jul 2015 20:20:39 +0800
> > >> > >> > John Hunter <zhjwpku at gmail.com> wrote:
> > >> > >> >
> > >> > >> > > From: Zhao Junwang <zhjwpku at gmail.com>
> > >> > >> > >
> > >> > >> > > This supports the asynchronous commits, required for page-flipping
> > >> > >> > > Since it's virtual hw it's ok to commit async stuff right away, we
> > >> > >> > > never have to wait for vblank.
> > >> > >> >
> > >> > >> > Hi,
> > >> > >> >
> > >> > >> > in theory, yes. This is what a patch to bochs implemented not too long
> > >> > >> > ago, so AFAIK you are only replicating the existing behaviour.
> > >> > >> >
> > >> > >> > However, if userspace doing an async commit (or sync, I suppose) does
> > >> > >> > not incur any waits in the kernel in e.g. sending the page flip event,
> > >> > >> > then flip driven programs (e.g. a Wayland compositor, say, Weston)
> > >> > >> > will be running its rendering loop as a busy-loop, because the kernel
> > >> > >> > does not throttle it to the (virtual) display refresh rate.
> > >> > >> >
> > >> > >> > This will cause maximal CPU usage and poor user experience as
> > >> > >> > everything else needs to fight for CPU time and event dispatch to get
> > >> > >> > through, like input.
> > >> > >> >
> > >> > >> > I would hope someone could do a follow-up to implement a refresh cycle
> > >> > >> > emulation based on a clock. Userspace expects page flips to happen at
> > >> > >> > most at refresh rate when asking for vblank-synced flips. It's only
> > >> > >> > natural for userspace to drive its rendering loop based on the vblank
> > >> > >> > cycle.
> > >> > >>
> > >> > >>
> > >> > >> I've been asking myself the same question (for the UDL driver) and I'm
> > >> > >> not sure if this policy should go in the kernel. After all, there
> > >> > >> could be legitimate reasons for user space to render lots of frames
> > >> > >> per second. It seems to me that if user space doesn't want too many
> > >> > >> fps, it should just throttle itself.
> > >> > >
> > >> > > If userspace wants to render lots of frames per second, IMO it should
> > >> > > not be using vblank-synced operations in a way that may throttle it.
> > >> > > The lots of frames use case is already non-working for the majority of
> > >> > > the drivers without DRM_MODE_PAGE_FLIP_ASYNC, right?
> > >> > >
> > >> > > The problem here I see is that one DRM driver decides to work different
> > >> > > to other DRM drivers. All real-hardware DRM drivers, when asked to do
> > >> > > vblank-synced update, actually do throttle to the vblank AFAIK.
> > >> >
> > >> > udl is an exception here. It is (arguably) real hardware but doesn't throttle.
> > >> >
> > >> > > Is it
> > >> > > too much to assume, that the video mode set in a driver (refresh rate)
> > >> > > corresponds to the vblank rate which implicitly delays the completion
> > >> > > of vblank-sync'd operations to at least the next vblank boundary?
> > >> >
> > >> > I think it's wrong to make user space think that a vsynced display
> > >> > always matches the refresh rate in a world where:
> > >> >
> > >> > - some displays have variable refresh rates (not just the fancy new
> > >> > stuff like g-sync, look for lvds_downclock in the intel driver for
> > >> > example, also consider DSI displays)
> > >> >
> > >> > - some displays have no refresh rate (the ones we are talking about
> > >> > here: udl, bochs...)
> 
> > > Imo aiming for vrefresh to be accurate is good. For gsync and friends I
> > > think we should have an explicit range or flag to make userspace aware of
> > > what's going on.
> > 
> > I think the concept of vrefresh is flawed and not really future-proof
> > (I gave a few examples in my previous email). I agree we should keep
> > it as legacy, but we should add something else for the more advanced
> > cases.
> 
> Right, so let's add something new for new hardware features and keep
> the existing behavior existing.
> 
> I suppose the problem is that the existing behavior is not really
> documented so we have to resort to screaming users?
> 
> If one does not ask for ASYNC with a page flip, does it mean flipping
> on the next vblank, or flipping such that it cannot tear but allowing
> techiques like scanline waits?

Since legacy page_flip is always for the full primary plane you can't do
scanline waits - it covers everything anyway.

> It used to be reasonable to assume a constant refresh rate apart from
> explicit mode changes. Should we keep this assumption the default and
> add API to say different?
> 
> > >> > > I think, if the driver cannot implement proper semantics (which IMO
> > >> > > includes the throttling) for vblank-sync'd operations and it does not
> > >> > > want to fake them with a clock, it should just refuse vblank-synced
> > >> > > operations.
> > >> >
> > >> > Yes refusing vsynced flips for these drivers sounds reasonable. But
> > >> > please let's not bake in another assumption in the API (or rather,
> > >> > let's try to un-bake it).
> > >>
> > >> Could you be more specific on everything, please?
> > >>
> > >> What should drivers do in different situations, what guarantees we do
> > >> have, and how does userspace predict the earliest possible flip time?
> > >> How do you define flip time to begin with, if it's not tied to the
> > >> scanout cycle (vblank)?
> > >>
> > >> How should a compositor schedule eveything, and what can it tell to the
> > >> clients about the timings in the immediate future?
> > >>
> > >> You gave me the feeling that everything I thought I knew and relied on
> > >> is wrong.
> > >
> > > I guess we either kick out page_flip for all drivers who fake it. And if
> > > that's causing regressions then we probably want to fake it with a timer.
> > > Unpretty, but such is the game of backwards compat forever. But I'm not
> > > sure whether we established that we have a problem already, at least I'm
> > > missing users screaming about udl/bochs & friends.
> 
> I suppose not hearing users scream is that X.org is not affected,
> because it simply doesn't work in a way it would be affected? Or the
> various DDX'en. The combination of affected userspace with drivers like
> bochs is even more rare.
> 
> I first heard about Weston having an abysmal user experience on
> drm/bochs when a co-worker was looking at testing Weston in
> QEMU/stdvga, which according to others was supposed to be the best
> working QEMU output / DRM KMS driver combination. Might have something
> to do with needing it on a virtual ARM cpu.
> 
> Weston in a VM has not been too attractive before, because of the
> missing EGL platform support for swrast, but that has been fixed
> recently. I've also heard users (RebeccaBlackOS) to just revert to
> Weston's fbdev backend, when the DRM backend just doesn't work right in
> a VM.
> 
> So I would say it is a known problem with Weston, but users tend to
> just dismiss it rather than start pushing it forward. Whether you count
> Weston as an existing user in the first place is up to you, I suppose.
> 
> I asked Jasper about Mutter:
> < pq> Jasper, trying to find users of the KMS API who would work badly,
>       if page flips were signalled always immediately
> < Jasper> pq, oh, god, us.
> < Jasper> pq, we'd spin ourselves in a paint loop to death
> 
> I think there might be two different problems here: a) signalling page
> flips immediately, and b) "variable refresh rate" systems / on-demand
> updates like g-sync, UDL, etc.
> 
> My immediate concern is to outlaw immediate signalling of operations
> that are intended to be vblank-synced (those that have
> traditionally taken time to complete on real hardware, if you don't
> like the definition of vblank-synced).
> 
> The problem with variable refresh rate systems is unpredictability,
> which is different. Submitting updates will take some time in any case,
> so you cannot really loop to death (I hope). There I would be happy to
> just know that predictions based on perpetual refresh rate are invalid.
> That would be enough for Wayland at first.
> 
> Stéphane also raised the concern that scanout downclocking etc. may
> cause flips to be quite late from predicted. This is less of a problem,
> rendering can also take arbitrary times and software is usually written
> to deal with missing deadlines. It's a problem that happens all the
> time anyway. As it is a problem with prediction accuracy, we could just
> wave it off by setting the imaginary "no constant refresh rate" flag.
> IMHO this is a problem that should be solved at another occasion, if
> necessary.

Downclocking should only be able to delay the very first frame in a
sequence (video playback, animation). Imo we can shrug that off as "clock
skew" ;-)

Imo once someone updates frames regularly vblank timestamps should be
evenly spaced and pageflips not instant (at least by default).

> > Yeah I don't think I care about the old interface, it is what it is.
> > But we should design something which works well for the future use
> > cases.
> 
> Are you referring to the atomic API as the old thing?
> 
> In the end, the question is how paranoid should display servers be
> about the timings? Do we need to fix Weston, Mutter and likely others
> to deal with page flips being signalled immediately, or is it more
> appropriate to fix the DRM drivers to keep up the illusion of providing
> what the API appears to suggest?

I guess they can't assume too much about vblanks (too many drivers don't
even bother with precise/irq-delay-correct timestamps), but I think
assuming that doing a page_flip completion event based renderer shouldn't
result in spinning is reasonable.

I guess for bochs/udl and others we could create a small drm driver which
keeps track of the last vblank ts (we have those already) and suitable
delays the even/timestamp to keep up the illusion. Or we just rip out
pageflip support for those drivers. But if weston&co can't cope with that
that would be worse.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch