[Intel-gfx] [PATCH 11/9] drm/i915: Opt out of vblank disable timer on >gen2

Thu Nov 19 13:27:37 PST 2015

On Thu, Nov 19, 2015 at 10:06:01PM +0200, Ville Syrjälä wrote:
> On Thu, Nov 19, 2015 at 05:44:51PM -0200, Paulo Zanoni wrote:
> > 2014-05-26 11:26 GMT-03:00  <ville.syrjala at linux.intel.com>:
> > > From: Ville Syrjälä <ville.syrjala at linux.intel.com>
> > >
> > > Now that the vblank races are plugged, we can opt out of using
> > > the vblank disable timer and just let vblank interrupts get
> > > disabled immediately when the last reference is dropped.
> > >
> > > Gen2 is the exception since it has no hardware frame counter.
> > 
> > Hi
> > 
> > Remember last week's FBC vblank optimization patch that had an
> > erroneous drm_crtc_vblank_get() instead of drm_crtc_vblank_count()?
> > After I fixed the bug in the patch I realized that it was the
> > unbalanced vblank_get() call that moved PC state residency up.
> > 
> > I did some experiments, and on my specific BDW machine, after running
> > "powertop --auto-tune", I get about 15-25% PC7 residency without FBC.
> > If I revert this patch, the number jumps to 40-45%. With FBC, the PC7
> > residency goes from 60-70% to 85-90% when I revert this patch. I'm
> > running just an idle Cinnamon with an open terminal.
> > 
> > So, since the commit message lacks more details, what are the
> > downsides of reverting this patch? What are the advantages of opting
> > out of the vblank timer? I see my desktop does tons and tons of vblank
> > get/put calls per second, so the disable timer makes a lot of sense.
> 
> "Idle" desktop :(
> 
> Really the immediate disable should save power. Where are these tons of
> vblank get/puts coming from actually? I would assume you'd get a handful
> per frame at most, and that when you're actually doing something. On an
> idle system I would expect nothing at all happens during most frames.

Yes, with the immediate disable we end up with a few get/put per frame
of rendering, as userspace queries and queues the next flip/swap. With
more clients, there are more opportunities for queries prior to queuing
the swap. It really shouldn't be more than a handful per frame if all
the clients are vrefresh limited. (The oddity was the vblank_mode=0
rendering where we still maintained the handful of get/put per frame...)

> > I also wish there was some easy way to check how this patch (or its
> > revert) affect a bunch of different workloads...

Maybe add the PC residencies to the power meter in intel-gpu-overlay,
alongside the RC residencies?

> > (Also CCing Chris for insightful comments on performance)
> 
> IIRC Chris had a patch to not disable the interrupt immediately when
> the refcount drops to 0, but instead delay the disable until the next
> interrupt actually happens. But I guess it didn't go in? Probably I
> should have reviewed it but didn't. It sounds like a decent idea to
> me in any case for the active use case.

The discussion went off into the barriers around the vblank counter, and
I forgot to bring it up again. I think even Mario was happy enough about
it.

Paulo, you may want to try

http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=nightly&id=ef97f4680d316cc8f9656dded1e1e41544c7225e

or you can change the KEEPALIVES value in src/sna/sna_dri2.c for a
similar effect.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre