[Intel-gfx] RFC frontbuffer write tracking
Daniel Vetter
daniel at ffwll.ch
Thu Jul 18 09:40:53 CEST 2013
On Thu, Jun 20, 2013 at 06:18:27PM +0100, Chris Wilson wrote:
> Having thrown around a few ideas for how to do PSR and FBC write
> tracking on the frontbuffer, including the creation of a new SCANOUT
> domain, the implementation that I've settled on is to detect writes to
> either the GTT domain (hmm, a CPU write flush should also set the fb as
> dirty) and then a deferred SCANOUT flush. The deferred flush then resets
> the write domain and kicks the GTT mapping so that any further dumb
> writes (i.e. dumb kms buffers or fbcon) cause a new frontbuffer
> invalidation. This patch series also introduces framebuffer parameters
> (which maybe should be properties) to allow userspace to opt out of the
> dumb mechanism and elect to call dirtyfb itself when it requires the
> scanout to be flushed.
>
> I've been playing around with GFDT on SNB and IVB as a means for
> prototyping the delayed updates and framebuffer parameters.
Sorry for the massive delay in writing down my review here, it's been
sitting on my table for way too long.
Mostly unsorted thoughts on the entire topic:
- dirtyfb locking is really heavyweight, it takes all modeset locks
currently. But it's easy to just push that down into driver callbacks so
that we can do something saner.
- It's not part of this patch series, but imo if we want to expose a hint
to userspace that frontbuffer rendering might not be a good idea we
should do that as a crtc or maybe plane property (Ville's primary plane
conversion is missing for that to work out perfectly, but meh).
- If we really need properties on framebuffer we imo should add support
for that in drm core. But I don't think that's required, since we can
just mark framebuffers as non-legacy as soon as we've seen the first
->dirtyfb ioctl call. Doesn't even need any locking since we can only
ever upgrade from false to true. New userspace which explicitly tells
the kernel when to invalidate an fb can just do a dummy dirtyfb call at
alloc time (or not bother if it never does frontbuffer rendering at
all).
- I don't think we want to have a delayed reenable timer for old userspace
which doesn't do ->dirtyfb calls. Afaics we've established that we need
userspace's cooperation, and adding the complexity to reanable power
saving features for legacy userspace is imo too much trouble. It's also
something which we'll have a hard time to validate and regression test.
And I really don't like new code which doesn't have good automated test
coverage ;-)
- The issue with just killing psr/fbc (i.e. not just disabling it for a
bit) on the first frontbuffer rendering is that we seem to have too many
false positives with the current checks. I haven't traced stuff really
but at least across a pageflip the ->pin_count check is too coarse and
will fire for the logical new backbuffer if we're unlucky. So I think we
need to add new, precise (from the userspaces pov) tracking of when a
gem buffer object is used for psr or fbc:
bool fbc_target;
int psr_target;
I think we need to add those to the gem object and not the framebuffer
object since that's the place where we can check it in the
busy/set_domain ioctl. To avoid locking madness this should be protected
with a new frontbuffer tracking lock. Of course we'll still first check
pin_count to optimize away the lock taking for most objects.
Note that we only need a bool for fbc since only ever one framebuffer
object can be used for fbc, so no refcounting is needed. But potentially
for psr we can display more than one framebuffer object on a crtc with
psr enabled, hence the refcount.
- Then we can fully disable fbc if fbc_target is set and fully disable psr
if psr_target is set. And if we update them in setcrtc/setplane and
for pageflips we'll always know which objects are logically the
frontbuffer and won't have a mess with false-positive invalidation
events due to us pipelining everthing.
For legacy userspace we'd only ever reanable psr/fbc again at modeset
and pageflip time.
For non-legacy framebuffers (i.e. those where userspace called ->dirtyfb
at least once) we'll just never set those tracking bits. There's a bit
of ugliness around when userspace changes the fb from legacy to
non-legacy mode while it's in use, but I guess we can fix that by simply
clearing all the tracking bits when that happens (to avoid forgetting
about an fbc/psr reference).
- To make this really useful (and also make sure it's properly tested) I
think we should abolish all the hw frontbuffer rendering tracking in the
gtt and system agent. We only need the hw to invalidate fbc/psr on an
mmio write to the buffer address (or an MI_FLIP), which seems to be
enabled unconditionally (excluding frobbing debug registers). For fbc (I
haven't looked too closely at the psr patches) we certainly need to fix
up our code to no longer disable fbc temporarily across a pageflip -
that was only necessary with fbc1, not with fbc2 (which we have since
about gm45).
This way we should be able to work around the performance downsides and
for psr also be able to enable psr when we have sprites displayed on the
screen. And since modern compositors use sprites a lot this is imo
important.
- Testing: I'm somewhat hopeful that we could check whether all this
invalidation tracking works correctly with the pipe CRC computations.
And if we disable all hw tracking failure to disable fbc/psr or flush
stuff should _really_ show up badly even for manual testing.
Cheers, Daniel
> -Chris
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
More information about the Intel-gfx
mailing list