[Bug 102199] Kabylake has poor performance, doesn't upclock during activity quickly with single display configurations

Mon Aug 21 11:17:21 UTC 2017

https://bugs.freedesktop.org/show_bug.cgi?id=102199

--- Comment #21 from Chris Wilson <chris at chris-wilson.co.uk> ---
(In reply to Ray Strode [halfline] from comment #20)
> (In reply to Chris Wilson from comment #19)
> > There is no way we can predict which of those will result in a requirement
> > for low latency output.
> Input usually leads to screen updates...  Granted not all screen updates
> take longer than a vblank period when the gpu is downclocked... Still, it
> would certainly ramp up less often than glFinish before every swap buffer in
> the compositor right?

I would have said that most input events do not result in a new request. My gut
says that it is the reverse, that we will be boosting for longer
inappropriately by responding to each input event. Still, it a mechanism worth
thinking about.

 > > You do not want to bake policy into the kernel, 
> But isn't everything we're talking about here heuristics (missed vblank,
> wait boosting)?

Yes, it doesn't belong here; it just the current hack. One side of the spectrum
is having the autotune work well for the vast majority, but even that follows a
rough policy that can be adjusted (min/max, but we don't allow aggressiveness
to be tuned), and the other side is in letting userspace override that when
necessary.

> > the other hand if may be a worthwhile experiment for userspace to construct
> > that link (i.e. pass a input fd to the gfx that it then watches, and ????)
> not sure what you mean by "pass an input fd to the graphics."  certainly if
> there was an ioctl "PLEASE BOOST THE CLOCKSPEED OF THE GPU NOW", gnome-shell
> could call it before starting big animations, and whenever the user clicked
> or typed.

The "pass an fd" was about tieing an input device to the gfx so it could
autoboost on an event, the rest of this has been about sketching out how to
make an ioctl that could do force boosts.

> > Realistically that will just result in the GPU always running at max
> > frequency, or idling.
> why? if the user is watching a video they aren't using the mouse or
> typing...  Likewise, if they're reading a webpage, or if they're not using
> the computer.
> 
> But if they're moving the mouse around, they're probably going to click
> something. if they click something, it's probably going to fade open or
> scroll or whatever.  it seems like that would the right to scale up the
> clock?

Maybe, maybe not. It's not a decision I want to make, but I don't begrudge
providing a mechanism for someone else to dictate that policy.

> > We may as well just set a flag on the fd (that the
> > compositor hands to its clients) that says render-as-fast-as-possible, or
> > just switch off reclocking all together.
> not following this.

Earlier you pointed out that the gfx pipeline is long, and to deliver the
input-output results as early as possible we should have started that pipeline
at max clocks, and not just when we realise that we are about to write to the
framebuffer. The alternative is that when delivering a screen update, you only
include those bits that are completed, then your high priority update can skip
to the head of the queue -- and should be able to perform well enough without
an explicit boost, just autotuning of the workload as a whole.

> > What we are talking about here is a mechanism to override the workload based
> > reclocking of the GPU for a much faster initial ramp (otherwise it uses an
> > exponential curve, reclocking every 10-30ms); it's a power-performance
> > tradeoff. The gpu can decode a video at minimal clocks, there is no reason
> > for that workload to be consuming more power than required. So the key is
> > detecting what workload is going to miss a latency deadline and only
> > supercharging the gpu for that task.
> Sure, this makes sense to me. But do you think there are a lot of rendering
> tasks that result directly from user input, that wouldn't benefit from
> faster initial ramp?

Yes. Interactive latency is important, but we should be able to deliver it at a
better point on the power curve than max clocks. 
> 
> Thinking about it, I guess an obvious one would be typing a document. hmm.
> so that makes me think we shouldn't do it for keyboard events. But of
> course, hitting the super key should still open the shell overview fast. hmm.

Yup, measuring the output delay of something like the super-key that is well
defined, and setting an expectation on that is exactly what we need to do. This
is what is covered by gnome-shell-perf-tool?

Going beyond that tracking an input event through to a client output request is
more tricky, but could be done for simple applications (or at least mock apps).

But if you can think of more tools like gnome-shell-perf-tool that set
expectations upon which we can measure ourselves against, that will be most
useful. We are not as good as we should be at integrating such tests into our
CI, but that we can improve.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the QA Contact for the bug.
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx-bugs/attachments/20170821/4fe65bb6/attachment-0001.html>