[Intel-gfx] RFC frontbuffer write tracking

Thu Jul 18 09:40:53 CEST 2013

On Thu, Jun 20, 2013 at 06:18:27PM +0100, Chris Wilson wrote:
> Having thrown around a few ideas for how to do PSR and FBC write
> tracking on the frontbuffer, including the creation of a new SCANOUT
> domain, the implementation that I've settled on is to detect writes to
> either the GTT domain (hmm, a CPU write flush should also set the fb as
> dirty) and then a deferred SCANOUT flush. The deferred flush then resets
> the write domain and kicks the GTT mapping so that any further dumb
> writes (i.e. dumb kms buffers or fbcon) cause a new frontbuffer
> invalidation. This patch series also introduces framebuffer parameters
> (which maybe should be properties) to allow userspace to opt out of the
> dumb mechanism and elect to call dirtyfb itself when it requires the
> scanout to be flushed.
>
> I've been playing around with GFDT on SNB and IVB as a means for
> prototyping the delayed updates and framebuffer parameters.

Sorry for the massive delay in writing down my review here, it's been
sitting on my table for way too long.

Mostly unsorted thoughts on the entire topic:

- dirtyfb locking is really heavyweight, it takes all modeset locks
  currently. But it's easy to just push that down into driver callbacks so
  that we can do something saner.

- It's not part of this patch series, but imo if we want to expose a hint
  to userspace that frontbuffer rendering might not be a good idea we
  should do that as a crtc or maybe plane property (Ville's primary plane
  conversion is missing for that to work out perfectly, but meh).

- If we really need properties on framebuffer we imo should add support
  for that in drm core. But I don't think that's required, since we can
  just mark framebuffers as non-legacy as soon as we've seen the first
  ->dirtyfb ioctl call. Doesn't even need any locking since we can only
  ever upgrade from false to true. New userspace which explicitly tells
  the kernel when to invalidate an fb can just do a dummy dirtyfb call at
  alloc time (or not bother if it never does frontbuffer rendering at
  all).

- I don't think we want to have a delayed reenable timer for old userspace
  which doesn't do ->dirtyfb calls. Afaics we've established that we need
  userspace's cooperation, and adding the complexity to reanable power
  saving features for legacy userspace is imo too much trouble. It's also
  something which we'll have a hard time to validate and regression test.

  And I really don't like new code which doesn't have good automated test
  coverage ;-)

- The issue with just killing psr/fbc (i.e. not just disabling it for a
  bit) on the first frontbuffer rendering is that we seem to have too many
  false positives with the current checks. I haven't traced stuff really
  but at least across a pageflip the ->pin_count check is too coarse and
  will fire for the logical new backbuffer if we're unlucky. So I think we
  need to add new, precise (from the userspaces pov) tracking of when a
  gem buffer object is used for psr or fbc:

  bool fbc_target;
  int psr_target;

  I think we need to add those to the gem object and not the framebuffer
  object since that's the place where we can check it in the
  busy/set_domain ioctl. To avoid locking madness this should be protected
  with a new frontbuffer tracking lock. Of course we'll still first check
  pin_count to optimize away the lock taking for most objects.

  Note that we only need a bool for fbc since only ever one framebuffer
  object can be used for fbc, so no refcounting is needed. But potentially
  for psr we can display more than one framebuffer object on a crtc with
  psr enabled, hence the refcount.

- Then we can fully disable fbc if fbc_target is set and fully disable psr
  if psr_target is set. And if we update them in setcrtc/setplane and
  for pageflips we'll always know which objects are logically the
  frontbuffer and won't have a mess with false-positive invalidation
  events due to us pipelining everthing.

  For legacy userspace we'd only ever reanable psr/fbc again at modeset
  and pageflip time.

  For non-legacy framebuffers (i.e. those where userspace called ->dirtyfb
  at least once) we'll just never set those tracking bits. There's a bit
  of ugliness around when userspace changes the fb from legacy to
  non-legacy mode while it's in use, but I guess we can fix that by simply
  clearing all the tracking bits when that happens (to avoid forgetting
  about an fbc/psr reference).

- To make this really useful (and also make sure it's properly tested) I
  think we should abolish all the hw frontbuffer rendering tracking in the
  gtt and system agent. We only need the hw to invalidate fbc/psr on an
  mmio write to the buffer address (or an MI_FLIP), which seems to be
  enabled unconditionally (excluding frobbing debug registers). For fbc (I
  haven't looked too closely at the psr patches) we certainly need to fix
  up our code to no longer disable fbc temporarily across a pageflip -
  that was only necessary with fbc1, not with fbc2 (which we have since
  about gm45).

  This way we should be able to work around the performance downsides and
  for psr also be able to enable psr when we have sprites displayed on the
  screen. And since modern compositors use sprites a lot this is imo
  important.

- Testing: I'm somewhat hopeful that we could check whether all this
  invalidation tracking works correctly with the pipe CRC computations.
  And if we disable all hw tracking failure to disable fbc/psr or flush
  stuff should _really_ show up badly even for manual testing.

Cheers, Daniel

> -Chris
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch