[RFC 0/9] nuclear pageflip

Fri Sep 14 10:02:45 PDT 2012

On Fri, Sep 14, 2012 at 11:29:04AM -0500, Rob Clark wrote:
> On Fri, Sep 14, 2012 at 10:48 AM, Ville Syrjälä
> <ville.syrjala at linux.intel.com> wrote:
> > On Fri, Sep 14, 2012 at 09:45:18AM -0500, Rob Clark wrote:
> >> On Fri, Sep 14, 2012 at 8:58 AM, Ville Syrjälä
> >> <ville.syrjala at linux.intel.com> wrote:
> >> > On Fri, Sep 14, 2012 at 08:25:53AM -0500, Rob Clark wrote:
> >> >> On Fri, Sep 14, 2012 at 7:50 AM, Ville Syrjälä
> >> >> <ville.syrjala at linux.intel.com> wrote:
> >> >> > On Thu, Sep 13, 2012 at 11:35:59AM -0500, Rob Clark wrote:
> >> >> >> On Thu, Sep 13, 2012 at 9:29 AM, Ville Syrjälä
> >> >> >> <ville.syrjala at linux.intel.com> wrote:
> >> >> >> > On Thu, Sep 13, 2012 at 08:39:54AM -0500, Rob Clark wrote:

> > I wonder if you've though about omap's FIFO merge. It can cause similar
> > issues, that is some operations may need two vblanks to complete. And it
> > looks like I'll get to worry about this stuff too since there are some
> > watermark related wait_for_vblank() workarounds in the IVB sprite code,
> > sigh.
> 
> yeah, FIFO merge is a nice big headache.. and not really ideal for
> latency unless you have some advanced warning to disable FIFO merge
> before userspace wants to switch on an extra overlay.
> 
> I think the best way to deal is just start switching off FIFO merge
> when userspace first does test w/ overlay, but return EBUSY.  It means
> we'll use the gpu for rendering for one frame, but I think that is
> better than blocking the compositor for a vblank or two.  Thou shalt
> not block the compositor.

Yeah, I suppose it could be handled through another property as well.
Perhaps some kind of "LOW_POWER_OPTIMIZATIONS" property that you'd
disable one vblank in advance. But then it's starting to be a bit
hardware specific ie. you more or less have to know the circumstances
when the property must be disabled. On omap it would be when more than
one plane is used, on IVB it appears that you need it when you want to
enable scaling.

I'm not too thrilled about the idea that the test phase would actually
touch the hardware. What happens if you do multiple test steps between
commits? But I can't immediately think of a good solution that would
avoid the need for hardware specific knowledge.

> >> And if we do support multiple crtc's w/ pageflip, I'm not sure if
> >> there is a good way to enforce two-steps.  Having a standardized way
> >> to tell userspace to try later seems like a good thing.
> >
> > Sure, for that it seems reasonable.
> >
> >> >> >> >> Also, if you pageflip on multiple CRTC's, should the be multiple
> >> >> >> >> vblank events, and multiple userdata's?
> >> >> >> >
> >> >> >> > That's a bit of an open question. I was considering several options:
> >> >> >>
> >> >> >> the thing I like about one ioctl per crtc is that it avoids this whole
> >> >> >> question..
> >> >> >>
> >> >> >> And, I think as long as you have to update multiple different scanout
> >> >> >> address registers, there is always going to be a race in multi-crtc
> >> >> >> flipping.  Having a single ioctl does make the race smaller.  I'm not
> >> >> >> sure how important that point is.
> >> >> >
> >> >> > Which race?
> >> >>
> >> >> ie. if you set REG_CRTC1_ADDR just immediately before vblank and
> >> >> REG_CRTC2_ADDR just after
> >> >
> >> > Well, with unsynced crtcs I wouldn't call that any kind of meaningful race.
> >> > The same problem after all exists even with a single crtc. You either make
> >> > the deadline and write the register before vblank, or you don't make it
> >> > and end up with a repeated frame.
> >>
> >> I meant w/ sync'd crtc's, there is still no 100% guarantee that the
> >> two flip at the same time.
> >
> > Sure there is. That's what the vblank evade stuff gives you. I just
> > happen to need it even when using just one crtc because the hardware
> > doesn't have the necessary mechanism to flip several planes atomically.
> 
> hmm, I guess I don't quite follow then.  But I guess I don't know the
> intel hw well enough.  It seemed like you weren't atomically updating
> scanout registers.

I guarantee the atomicity by making sure I'm not too close to the start
of vblank when I write the registers. It's a very generic solution that
will work on any hardware with double buffered registers that get
flipped on vblank. Even if some of the registers would get flipped at
slightly different times (eg. plane A flips at vbl_start+1, plane B at
vbl_start+10) you could still use this method by extending the range of
scanlines to be avoided.

I suppose the most difficult bit is determining how long you need to
write all the necessary registers. You want to make it long enough to
guarantee atomic operation, but short enough to avoid blocking
needlessly. Currently I just have a hardcoded number there. If the
driver is going to be deployed on a specific device, it's easy enough
to measure it and tweak the number, but it would be nice to have the
driver calibrate itself. Or you just have a sysfs knob so that it
could tweaked more easily for specific systems by non-developers.

> But anyways, I think it is probably ok to not need the crtc up-front.
> We can catch issues w/ pending vblank at the atomic_test() stage.
> Still not sure what to do about userdata.

I think that if the event carries the crtc id, then the user_data could
be shared across crtcs. User space can ref count stuff if necessary.
When done this way you can think if the user_data as a cookie that
identifies the flip event(s) originating from a specific flip request.

> Although I suppose we could
> make userdata a property attached to crtc and/or plane and that gives
> userspace plenty of flexibility about how many events it wants back.
> (Ie. no event if userdata==0.. or maybe separate send-event property.)

Hmm. That's an interesting idea. I'll have to give it some thought.

-- 
Ville Syrjälä
Intel OTC