[RFC 0/9] nuclear pageflip

Thu Sep 13 01:40:08 PDT 2012

On Wed, Sep 12, 2012 at 02:40:56PM -0500, Rob Clark wrote:
> On Wed, Sep 12, 2012 at 1:58 PM, Ville Syrjälä
> <ville.syrjala at linux.intel.com> wrote:
> > On Wed, Sep 12, 2012 at 01:00:19PM -0500, Clark, Rob wrote:
> >> On Wed, Sep 12, 2012 at 12:27 PM, Ville Syrjälä
> >> <ville.syrjala at linux.intel.com> wrote:
> >> > On Wed, Sep 12, 2012 at 10:48:16AM -0500, Rob Clark wrote:
> >> >> But I think we could still do this w/ one ioctl per crtc for atomic-pageflip.
> >> >
> >> > We could, if we want to sacrifice the synced multi display case. I just
> >> > think it might be a real use case at some point. IVI feels like the most
> >> > likely short term cadidate to me, but perhaps someone would finally
> >> > introduce some new style phone/tablet thingy. I have seen
> >> > concepts/prototypes of such multi display gadgets in the past, but the
> >> > industry apparently got a bit stuck on the rectangular slab with
> >> > touchscreen on one side design.
> >>
> >> I could be wrong, but I think IVI the screens would normally be too
> >> far apart to matter?
> >
> > I was thinking of something like a display on the dash that normally
> > sits low with only a small sliver visible, and extends upwards when
> > you fire up a movie player for example. Internally it could be made
> > up of two displays for power savings purposes.
> >
> >> Anyways, it is really only a problem if you can't do two ioctl()s
> >> within one vblank period. If it actually turns out to be a real
> >> problem,
> >
> > Well exactly that's the problem this whole atomic pageflip stuff is
> > trying to tackle, no? ;)
> >
> >> we could always add later an ioctl that takes an array of
> >> 'struct drm_mode_crtc_atomic_page_flip's.  I'm not sure if this is
> >> really useful or not.. but maybe I'm thinking too much about how
> >> weston does it's rendering of different output's independently.
> >
> > I'm just now thinking of surfaceflinger's way of doing things, with
> > its prepare and commit phases. If you need to issue two ioctls to handle
> > cloned displays, then you can end up in a somewhat funky situation.
> >
> > Let's say you have a video overlay in use (one each display naturally),
> > and you increase the downscaling factor enough so that you now have
> > enough memory bandwith to support only one overlay. With independent
> > check ioctls for each display, you never have the full device state
> > available in the kernel, so each check succeeds during the prepare
> > phase. So you decide that you can keep using the video overlays.
> >
> > You then proceed to commit the state, but after the first display has
> > been commited you get an error when trying to commit the second one.
> > What can you do now? The only option is to keep displaying the old
> > frame on the other displays for some time longer, and then on the
> > next frame you can switch to GPU composition. But on the next frame you
> > perhaps no longer need to use GPU composition, but since you can't
> > verify that in the prepare phase, you have no other option but to use
> > GPU composition.
> >
> > So when you run into a configuration that can be supported only
> > partially, you get animation stalls on some displays due to skipped
> > frames, and you always have to fall back to GPU composition for the
> > next frame.
> >
> > If on the other hand you would check the whole state in one ioctl,
> > you'd get the error in the first prepare phase, and could fall back
> > to GPU composition immediately.
> >
> > Am I too much of a perfectionst in considering such things? I don't
> > think so, but perhaps other people disagree.
> 
> Ok, if you have a case where the state of the two crtc's are not
> actually independent, then I think you have a valid point.
> 
> I'm not quite sure what userspace would do about it, though.. for the
> general case where vsync isn't locked, and you can't even necessarily
> assume vsync period is the same, then I don't really think you want to
> couple rendering to the displays.

I would say this is going to be the most common use case if you consider
just the number of shipping devices. It's pretty much what every Android
phone/tablet with a HDMI port has to do.

The solution we came up when working with Medfield was to throttle to
the internal display (usually ~60Hz), and let the external display drop
however many frames it needs. But I still wanted the external display to
always show the most recent frame possible, hence I came up with drm_flip
which supports that just fine. So we used three buffers and just issued
flips at the rate of the internal display, and the external display with
a 30Hz or 24Hz refresh rate ended up dropping some of the frames. It did
look fairly good actually. 

As a proof of the drm_flip implementation, I also did a quick hack where
I bumped the number of buffers up to ~15 or so, and then removed the
display related throttling completely (apart from preventing the GPU
from writing to an active scanout buffer). This allowed the app and
compositor to render at ~400 fps, and each displays would end up
displaying 60 or 30 of those frames each second without tearing. Now,
the only reason I needed so many buffers was that Android's buffer
management rotates buffers in strict order, and I would've needed to
modify that code to do proper triple buffering.

> OTOH, the 'test' step can come
> independent of vblank, so maybe you'd want to do the 'test' step
> together, and then the flip parts independently.  Hmm...

I had an idea about optimizing the check+commit by having check save the
checked state, and return a cookie of some sort. Then the commit phase
could simply pass in that same cookie, and the kernel wouldn't need to
re-check the state on commit. Obviously if any other mode setting
operation would happen in between the check and commit steps, the cookie
would need to be invalidated. This wouldn't work with the single check with
multiple commits. But I'm not sure such an optimization would be worthwile.

> >> I wonder.. if you have some special hw setup where you can keep the
> >> multiple display's in sync-lock, maybe a special virtual crtc would
> >> work.  That way userspace it appears as a single display.  I'm not
> >> really sure how crazy that would be to implement.
> >
> > Sure, add more abstraction layers and you can solve any problem :)
> 
> well, not really.. but my point was this sort of setup would be a
> somewhat custom hardware setup, so maybe some hack in that case isn't
> such a bad idea.  I dunno..

At least Medfield has support for such setups out of the box as it
were. You can hook up two DSI displays to it, and it can run the two
pipes in sync.

As a less relevant example I give you the Matrox G550 :) where you
actually have to sync the CRTCs if you want to drive both DVI outputs
at the same time. But I'll concede that the Matrox is a special case
since other independent dual output configurations on this chip don't
require such magic.

> >> But I think in the
> >> common case, where the displays are not vsync locked, treating the
> >> page flips of different crtc's independently, and rendering the
> >> contents for different outputs independently (like weston) makes the
> >> most sense.
> >
> > My API design doesn't preclude that at all. In light of my android tale
> > above how would weston decide whether it can use overlays in such a
> > case?
> 
> >From userspace API, I guess something like:
> 
> struct drm_mode_crtc_atomic_page_flip {
> 	uint32_t flags;
> 	uint32_t count_crtcs;
> 	uint64_t crtc_ids_ptr;  /* array of uint32_t */
> 	uint64_t count_props_ptr; /* array of uint32_t, # of prop's per crtc */
> 	uint64_t props_ptr;  /* ptr to array of drm_mode_obj_set_property */
> 	uint64_t user_data;
> };

Starting to look much like my drm_mode_atomic struct :)

Let's compare:

struct drm_mode_atomic {
	__u32 flags;
	__u32 count_objs;
	__u64 objs_ptr;
	__u64 count_props_ptr;
	__u64 props_ptr;
	__u64 prop_values_ptr;
	__u64 blob_values_ptr;
};

One differences seem to be that you have a mix of SOA and AOS concepts
in yours, whereas mine is pure SOA.

Also your struct has a bunch of redundant data since
drm_mode_obj_set_property keeps repeating the object ID needlessly, and
crtc_ids_ptr is also just repeating information already present in
drm_mode_obj_set_property. drm_mode_obj_set_property further forces you
to pass in the object type, which is pretty much useless.

Your version can't pass in blobs, which is going to be a problem
when you want to push in gamma tables/ramps and whatnot. Also I've been
considering that for atomic modesetting I'd pass display modes as blobs
too, since the display mode object IDs are currently hidden from user
space, and also the IDs are a bit volatile since the modes can be
shuffled around during EDID probing. I suppose one could try to fix
these display mode ID issues, and when you need to push in gamma tables
or other heavy weight data, you invent new types of drm objects to
describe them, and you pass them by ID as well.

-- 
Ville Syrjälä
Intel OTC