KMS timings (Re: [PATCH 6/8] drm/bochs: phase 3: provide a custom ->atomic_commit implementation)

Mon Jul 20 07:21:00 PDT 2015

On Mon, Jul 20, 2015 at 12:35:48PM +0300, Pekka Paalanen wrote:
> On Mon, 20 Jul 2015 01:58:33 -0700
> Stéphane Marchesin <stephane.marchesin at gmail.com> wrote:
> 
> > On Mon, Jul 20, 2015 at 12:46 AM, Pekka Paalanen <ppaalanen at gmail.com> wrote:
> > > On Sun, 19 Jul 2015 17:20:32 -0700
> > > Stéphane Marchesin <stephane.marchesin at gmail.com> wrote:
> > >
> > >> On Thu, Jul 16, 2015 at 11:08 PM, Pekka Paalanen <ppaalanen at gmail.com> wrote:
> > >> >
> > >> > On Thu, 16 Jul 2015 20:20:39 +0800
> > >> > John Hunter <zhjwpku at gmail.com> wrote:
> > >> >
> > >> > > From: Zhao Junwang <zhjwpku at gmail.com>
> > >> > >
> > >> > > This supports the asynchronous commits, required for page-flipping
> > >> > > Since it's virtual hw it's ok to commit async stuff right away, we
> > >> > > never have to wait for vblank.
> > >> >
> > >> > Hi,
> > >> >
> > >> > in theory, yes. This is what a patch to bochs implemented not too long
> > >> > ago, so AFAIK you are only replicating the existing behaviour.
> > >> >
> > >> > However, if userspace doing an async commit (or sync, I suppose) does
> > >> > not incur any waits in the kernel in e.g. sending the page flip event,
> > >> > then flip driven programs (e.g. a Wayland compositor, say, Weston)
> > >> > will be running its rendering loop as a busy-loop, because the kernel
> > >> > does not throttle it to the (virtual) display refresh rate.
> > >> >
> > >> > This will cause maximal CPU usage and poor user experience as
> > >> > everything else needs to fight for CPU time and event dispatch to get
> > >> > through, like input.
> > >> >
> > >> > I would hope someone could do a follow-up to implement a refresh cycle
> > >> > emulation based on a clock. Userspace expects page flips to happen at
> > >> > most at refresh rate when asking for vblank-synced flips. It's only
> > >> > natural for userspace to drive its rendering loop based on the vblank
> > >> > cycle.
> > >>
> > >>
> > >> I've been asking myself the same question (for the UDL driver) and I'm
> > >> not sure if this policy should go in the kernel. After all, there
> > >> could be legitimate reasons for user space to render lots of frames
> > >> per second. It seems to me that if user space doesn't want too many
> > >> fps, it should just throttle itself.
> > >
> > > If userspace wants to render lots of frames per second, IMO it should
> > > not be using vblank-synced operations in a way that may throttle it.
> > > The lots of frames use case is already non-working for the majority of
> > > the drivers without DRM_MODE_PAGE_FLIP_ASYNC, right?
> > >
> > > The problem here I see is that one DRM driver decides to work different
> > > to other DRM drivers. All real-hardware DRM drivers, when asked to do
> > > vblank-synced update, actually do throttle to the vblank AFAIK.
> > 
> > udl is an exception here. It is (arguably) real hardware but doesn't throttle.
> > 
> > > Is it
> > > too much to assume, that the video mode set in a driver (refresh rate)
> > > corresponds to the vblank rate which implicitly delays the completion
> > > of vblank-sync'd operations to at least the next vblank boundary?
> > 
> > I think it's wrong to make user space think that a vsynced display
> > always matches the refresh rate in a world where:
> > 
> > - some displays have variable refresh rates (not just the fancy new
> > stuff like g-sync, look for lvds_downclock in the intel driver for
> > example, also consider DSI displays)
> > 
> > - some displays have no refresh rate (the ones we are talking about
> > here: udl, bochs...)
> 
> That means that refresh rate in a video mode is bogus. Can userspace
> know when the refresh rate is meaningless? I suppose there are two
> different cases of meaningless, too: when the driver ignores it as
> input argument, and when it is used but has no guarantees for timings.
> 
> Assuming it's always meaningless wrt. timings is pretty harsh. E.g. the
> Wayland Presentation extension's implementation in Weston uses the
> refresh rate to predict the next flip time and hands it out to clients
> for scheduling/interpolation purposes.

We removed lvds_downclock in i915 (was never enabled by default). The new
code is currently edp only, but enabled by default. And it tries _really_
hard to keep up the illusion that the requested vrefresh is the one you
actually have - it upclocks every time userspace changes the screen.

The only exception is when you only ask for vblank events, and the delay
to the next pageflip is longer than the timeout. That can't be fixed right
now because drm_irq.c is the last midlayer needed by modern drivers, but
I'd like to fix it eventually.

There's future plans to allow userspace to explicitly ask for a reduced
vrefresh (e.g. video playback), but then obviously they'll match again
too.

We also have a big pile of testcases in kms_flip which check that the
timestamps are evenly spaced and agree with vrefresh. There's some oddball
bugs around tv-out that we never bothered to fix (since the requested mode
is a fake one and rescaled by the TV port, which also has it's own clock).

Imo aiming for vrefresh to be accurate is good. For gsync and friends I
think we should have an explicit range or flag to make userspace aware of
what's going on.

> > - you can do partial vsynced updates by just waiting for a specific
> > scanline range which again breaks the assumption that "vsynced" ==
> > "refreshes at the monitor rate". In this case there is no visible
> > tearing (in that sense it is vsynced) but the flip time is not
> > predictable using the refresh rate.
> 
> Okay. That also invalidates the design (well, at least the
> implementation, and sounds like DRM does not give any tools to allow
> implementing it) the Wayland Presentation extension even on "good"
> hardware, so nice to realize. I was already suggesting we should
> stabilize it since it looks good, but this puts it all back to the
> drawing board.
> 
> I think it also mostly invalidates the whole scheduling implementation
> in Weston.

Currently scanline waits is just something the intel DDX does to update
dri and video clients without tearing while still essentially doing
unsynced frontbuffer rendering. It sucks down massive amounts of gpu
bandwidth since it fully stalls the rendering (at least on i915 hw).

As long as you have buffer_age and friends you should be able to be as
efficient with pageflips (or better) as with scanline waits.

> > So I don't think we should perpetuate that problem. And I would like
> > user space to "see" the actual flip times to enable some kind of
> > scheduling where possible.
> > 
> > >
> > > I think, if the driver cannot implement proper semantics (which IMO
> > > includes the throttling) for vblank-sync'd operations and it does not
> > > want to fake them with a clock, it should just refuse vblank-synced
> > > operations.
> > 
> > Yes refusing vsynced flips for these drivers sounds reasonable. But
> > please let's not bake in another assumption in the API (or rather,
> > let's try to un-bake it).
> 
> Could you be more specific on everything, please?
> 
> What should drivers do in different situations, what guarantees we do
> have, and how does userspace predict the earliest possible flip time?
> How do you define flip time to begin with, if it's not tied to the
> scanout cycle (vblank)?
> 
> How should a compositor schedule eveything, and what can it tell to the
> clients about the timings in the immediate future?
> 
> You gave me the feeling that everything I thought I knew and relied on
> is wrong.

I guess we either kick out page_flip for all drivers who fake it. And if
that's causing regressions then we probably want to fake it with a timer.
Unpretty, but such is the game of backwards compat forever. But I'm not
sure whether we established that we have a problem already, at least I'm
missing users screaming about udl/bochs & friends.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch