[Intel-gfx] hsw rps values regress RPS on Macbook Air

Tue Oct 16 21:50:22 CEST 2012

Jesse Barnes <jbarnes at virtuousgeek.org> writes:

> On Fri, 12 Oct 2012 11:34:08 -0700
> Eric Anholt <eric at anholt.net> wrote:
>
>> Jesse Barnes <jbarnes at virtuousgeek.org> writes:
>> 
>> > On Tue, 09 Oct 2012 13:05:54 -0700
>> > Eric Anholt <eric at anholt.net> wrote:
>> >
>> >> On my new MBA with danvet's drm-intel-next-queued, I'm not getting
>> >> working RPS.  vblank_mode=0 glxgears never ups the frequency, and
>> >> vblank_mode=0 openarena only makes it up to 500mhz.  Reverting
>> >> 1ee9ae3244c4789f3184c5123f3b2d7e405b3f4c gets the machine to responsive
>> >> RPS: fully on while the GPU is busy, fully lowered when it's not.
>> >> 
>> >> Since we're always just looking for all-on or all-off and never see
>> >> workloads that actually want to be somewhere in between, could we please
>> >> just move to race to idle for RPS?
>> >
>> > Ramping to the max freq is fine for benchmarking.  But for normal
>> > vblank throttled activity, using the lowest freq (assuming it's
>> > above our nominal freq) that can hit the refresh is the right answer
>> > from a power perspective.
>> 
>> Have you seen any workloads where a middle frequency value is actually
>> chosen by the current RPS system?
>
> I can't tell if this is a snarky response or not. :)  But either way it
> misses my point: I think the current RPS system isn't ideal for many of
> our workloads and the way our GL stack runs things.  I've thought we
> could do better for awhile now but couldn't think of a way that would
> let userspace request lower frequencies if it didn't need the extra
> processing power, but if we collect a little data in Mesa maybe we can
> do it.

It's not snarky, I'm really wondering if you've actually seen middle
frequencies like this software is designed to do.  I spend a lot of time
looking at performance, and whenever I look at RPS state, it's either at
highest or lowest, or not working.  I've never seen a functioning
workload that stays at the middle.  Testing with the hsw values
reverted:

openarena vblank-synced at 60fps (can do 240), my clock bounces between
350 and 1150.

nexuiz vblank-synced isn't hitting 60 here, and the clock isn't getting
all the way to 1150 (saw I think 900 and 1100 a bunch), and the CPU
isn't the bottleneck as measured by top.  So this is the only case of
these 3 that's actually choosing a middle frequency, but it shouldn't
be.

I even tried using glxgears vblank-synced, resizing my window.  As I
scale up, it seems to spend more time at 1150 instead of 350, but it's
not choosing something in the middle, even though it seems like the most
obvious workload for this middle frequency support.

I'd like to replace "not working" with "high when busy, low when not",
while you're saying that we have to support a middle frequency like the
complicated software is trying to achieve.

> I propose a new ioctl, I915_FREQ_REQUEST, with 3 different parameters,
> I915_MAX_FREQ, I915_MORE_FREQ, and I915_LESS_FREQ.  The first would
> tell the kernel the app would like to run at the maximum possible
> speed, regardless of power or throttling considerations.  MORE
> would simply tell the kernel the app needs a higher frequency to meet
> its frame rate target, and LESS would tell the kernel it could run
> slower and still hit its target.
>
> In Mesa, we'd need to track the FPS target for the app, the current FPS
> (e.g. over the last second, or using a decaying average with some
> weight toward recent activity), and the time between swapbuffers calls
> (as an approximation of how long it takes us to draw each frame).
>
> Periodically (maybe every second when we update our current FPS), Mesa
> would either request more frequency if it wasn't hitting its FPS
> target, or less frequency if its frame draw time was less than 90% of
> the maximum alloted frame time (the period for the frequency we're
> trying to hit).  The FPS target would be based on the swap interval for
> the app.
>
> In a benchmarking mode (i.e. vblank_mode=0 or swapinterval set to 0),
> we could just make a I915_MAX_FREQ request and be done with it.
>
> Within the kernel, we'd evaluate every app's requests and choose the
> max frequency requested, re-setting things on every ioctl call and when
> apps close.
>
> Any thoughts?  Would collecting the above info in Mesa be pretty easy?
> I think we already collect FPS info if certain debug flags are set, and
> frame time seems like a pretty trivial calculation based on some
> timestamping in a couple of places...

Unfortunately, enough apps don't use swap interval, and instead of use
SGI_video_sync or OML_sync_control.  In that case, we don't know the
swap interval outside of a blocking call, unless we look at their
history and try to guess.  It sounds ugly, and I guess we'd basically
end up with I915_MAX_FREQ as our policy.

The design is also predicated on some bad assumptions.  One is that
frame-to-frame workloads stay consistent.  3x difference in work between
high and low-framerate scenes within an app I'd say is normal, and you'd
need to be able to recognize that change and fix the frequency within
half a second in the worst case I'd think.  Think about your compositor,
too: right now it's updating a character at a time as I type, then I go
hit the expose button and it has to redraw the whole screen and that's a
waaay different workload.  I want responsiveness.

The other bad assumption I think is that there's a bunch headroom for us
to reduce the frequency.  Games are tuned to the hardware, to be able to
barely hit 60fps -- if you're way over 60, then either the app turns on
more pretty graphics options or you do.  You don't have a bunch of extra
space to play with turning down the frequency.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/intel-gfx/attachments/20121016/fc3f6613/attachment.sig>