[Intel-gfx] [PATCH 6/9] drm/i915: add frontbuffer tracking to FBC

Wed Feb 11 00:25:52 PST 2015

On Tue, Feb 10, 2015 at 10:19:10AM -0200, Paulo Zanoni wrote:
> 2015-02-09 17:05 GMT-02:00 Daniel Vetter <daniel at ffwll.ch>:
> > Ok, here comes the gist of what needs to be changed.
> > - You need to check the fbc-private copy of busy bits, otherwise the
> >   filtering for the GTT origin won't work. We want to keep fbc enabled (or
> >   renable) even when gtt is still invalidated somehow.
> > - That means when you decide to reenable fbc (i.e.
> >   dev_priv->fbc.busy_frontbuffer_bits == 0) then it's guranteed that fbc
> >   is already disabled, and a plain intel_fbc_enable is good enough.
> > - nuke in flush is not enough and too late - the frontbuffer rendering
> >   could have been going on for a while since you've seen the invalidate,
> >   the flush only happens when everything is quiet again. A nuke at the end
> >   is hence too late and not enough.
> >
> > This way you don't need the origin information in the flush helper.
> >
> > One bit I'm not yet sure off is where to put intel_fbc_update to
> > reallocate buffers over pageflips. The complication since you've written
> > these patches is the atomic flips, which can now asynchronously update the
> > primary plane (and change the pixel format), but different from the
> > pageflip code the atomic paths don't disable/enable fbc over a flip.
> >
> > Which means that doing the intel_fbc_update just somewhere in between the
> > disable and enable won't be enough to keep everything in sync.
> >
> > I think what we ultimately need is a bit of logic in the atomic_check part
> > which decides whether fbc must be disabled/enabled (e.g. when we need to
> > realloc the cfb) around the update. And then we'll place explicit
> > intel_fbc_disable/enable calls in the atomic_begin/flush callbacks.
> >
> > The other case is just updating the gtt fence data. At least on g4x+ we
> > should try not to disable fbc but keep it going over pageflips. Which
> > means there could be interesting races between the flip and gtt writes
> > right afterwards (which aren't synchronized in any way).
> >
> > I see two options:
> > - We update the fence register whenever we want, but while a pageflip is
> >   pending we don't filter out gtt invalidates (since we kinda don't know
> >   whether the hw invalidate will hit before/after the flip happens). This
> >   one is a bit tricky, we'd need to add fbc callbacks to
> >   intel_frontbuffer_flip_prepare/complete.
> > - We do a manual nuke after each flip by adding calls to intel_fbc_flip to
> >   intel_frontbuffer_flip and flip_complete. This one has the downside that
> >   proper userspace which doesn't do gtt writes will get hurt by nuking the
> >   fbc twice: Once for the flip and once right afterwards.
> 
> Thanks for the review!
> 
> If you can think of any way to write a test case to reproduce the
> problems you pointed above, please give me ideas! I'll try to do this
> while rewriting the patches, but maybe you already have something in
> your mind? I guess we need to try to find a way to get an invalidate
> but only get the corresponding flush a long time (> 1 frame) after
> that? Maybe keep the frame busy with tons of operations that don't
> change the CRC?

Well that's going to be fairly hard since you need to race pageflips with
gtt writes and then try to sneak in gtt writes for the new buffer in
before the fence and everything is set up correctly.

The time between invalidate and flush isn't a problem since for GTT mmaps
it's forever by default. So as long as you don't do any set_domain/execbuf
or anything else GTT mmaps will stay invalidated. But since we want to
filter out GTT invalidates (the hw actually works for that case!) the
problem creeps in when we update the gtt fbc fence in a racy way. The
frontbuffer tracking stuff is race-free (avoiding that race is why there's
the flip_prepare/complete split) itself.

If you want to try the following might work to reproduce this case:

Prerequisites: We don't disable fbc any more over pageflips, which we
  can do on g4x as long as the pixel format stays the same.

1. Take reference crc for black, pain test buffer white.
2. Pageflip to that white buffer.
3. Immediately afterwards (i.e. before the pageflip completed) draw the
buffer black through gtt mmaps with all the set_domain stuff that's
required.
4. Wait a few frames, grab crc.

And now that I've describe in detail there's actually no race here as long
as we update the fbc fence _before_ we do the pageflip. Some invalidates
will hit the old buffer still, but the pageflip will invalidate everything
anyway. And any gtt writes after the flip will do line invalidation
correctly.

So I don't think we'll have a problem, and I also don't think we need
another testcase really.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch