[Intel-gfx] hsw rps values regress RPS on Macbook Air

Tue Oct 16 16:53:55 CEST 2012

On Tue, 16 Oct 2012 16:38:02 +0200
Daniel Vetter <daniel at ffwll.ch> wrote:

> On Tue, Oct 16, 2012 at 3:53 PM, Jesse Barnes <jbarnes at virtuousgeek.org> wrote:
> > Any thoughts?  Would collecting the above info in Mesa be pretty easy?
> > I think we already collect FPS info if certain debug flags are set, and
> > frame time seems like a pretty trivial calculation based on some
> > timestamping in a couple of places...
> 
> How does this work with 2 and more clients rendering to the gpu, each
> pretty much oblivious to what the others are doing (i.e. your
> composited desktop case)? And if that works, how does it figure out
> that the 25 fps video client indeed doesn't want to draw more than
> that on a 50 fps screen (which is presumably the frame target)? Note

How does what figure it out?  A client that wants to draw at 25fps
would use 25fps as its target framerate, not 50.  And it wouldn't
request more GPU frequency unless it wasn't hitting that target.  But
the media stack isn't something I've considered much.  I think we can
use the same approach there in general, but things get a little trickier
with encoding and transcoding, since you don't necessarily want to run
flat out all the time, but you also don't have an FPS target unless
you're doing realtime transcode for camera->display or something.

> that the 25 fps case is very tricky, since a lot of workloads with too
> little parrallelism between the gpu and cpu and hence lots of idle
> time on both run fastest if you lock both gpu and cpu to the max. And
> since 2d flips a lot between hw/sw rendering, your mostly idle desktop
> is a prime example of this.

We need the GPU to go as fast as the most demanding client requires, or
in the case of multiple competing clients, as fast as is needed to
drive them all at their target framerate.  The GL stack should take
care of making requests to make this happen based on what I outlined.

> Also, I totally don't see how that differs from what the hw does, safe
> that we don't know the exact algo the hw implements: Instead of the hw
> generating up/down request userspace does the same. And the pin to max
> is already implemented in sysfs now. The other issue is that userspace
> individually lacks the global overview.

Only the kernel has the global view.  But each client will know if it's
hitting its target or not, and can request more frequency if not (which
could be due to its own demands or that other stuff is taking some GPU
too).

The big difference between this and the hw mechanism is that here we're
basing our frequency requests on what we actually need (the target
framerate).  The hw currently has no idea about this, and can't really,
unless we tell it.  So for example the current stuff will increase the
GPU frequency beyond nominal when an app runs, even if it's hitting its
target framerate, which is less efficient than just leaving the freq at
nominal, which is what this would do.

Jesse