[Bug 102199] Kabylake has poor performance, doesn't upclock during activity quickly with single display configurations

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Mon Aug 21 17:17:44 UTC 2017


https://bugs.freedesktop.org/show_bug.cgi?id=102199

--- Comment #22 from Lyude Paul <lyude at redhat.com> ---

(In reply to Ray Strode [halfline] from comment #18)
> (In reply to Chris Wilson from comment #17)
> > The gfx driver doesn't have that information. I'm not sure if the kernel
> > knows what is an active hid, and certainly there's no link to it to either
> > the cpu scheduler or to us. 
> I don't understand, the kernel sends input events from
> keyboard/mice/touchscreens to userspace via /dev/input/eventN interface, so
> why couldn't the clock speed get boosted any time an event got posted to one
> of those devices?
Mentioned this to halfline over the RH IRC already, but putting it on the bz
for the recrod: I'm with ickle on this; while the patch for this is technically
kernel heuristics it is heuristics based off of data that the driver has much
more knowledge then something like an input device. Especially since what input
devices come out of an evdev node are usually not going to be 1:1 with the
events that userspace ends up receiving since we have layers like libinput
doing additional filtering and heuristics on the input events before handing
them to the compositor.

Plus, it's certainly possible that we could have various render jobs that need
to be completed by the scanout period without actually having any user input.
> 
> git grep seems to suggest there's a function called input_register_handler()
> that lets various parts of the kernel hook into the event stream for all
> devices (granted this was just 60 seconds of searching, so I may be missing
> something).
> 
> > From our perspective the only way we can
> > identify such streams is by looking at what ends up on the display, and
> > backtracking from there. The piece that pulls all of this together is the
> > compositor.
> But wouldn't it better if we could ramp up the clockspeed before firefox
> started rerendering a page getting drag scrolled, not after the compositor
> is responding to damage requests?
I think the issue here is that we really can't predict the future of render
jobs, unless we're the ones creating those jobs. The only thing we really know
is when those jobs must be completed by, which in the case for any application
drawing to the screen (including gnome-shell, firefox, etc.)

To be honest, I very much like ickle's ideas of having render batch deadlines,
since we could dynamically adjust the boost curve based on the amount of time
we have left from the start of the render job until it's deadline. Assuming we
mark all render jobs for things that are destined to be scanned out to the
screen as having a vblank deadline. I'm thinking something that would look like
this:

(each tick = 1ms)
(U = boost up) (D = boost down)
(S = start) (E = end)
(dividers going from top to bottom indicate the start of a new scanout
period)
(frequencies in MHz, and are all guesstimates)

So, let's assume we're looking at the scanout timeline for when the user starts
interacting with say, Firefox
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
|         |S|   |U|   |U| |E|   | |D|   |D|               |S| |E|
|                               |                          U    |
300             500  1000         500   300              1000

So, the idea would basically be the closer we are to the next vblank when the
time critical job starts, the faster we ramp up with boosts. We could probably
optimize this even more by holding a boost that's done in anticipation of the
next scanout across the next vblank. This might need a bit more consideration
for edge cases such as a render job that is destined to end up on more then one
display, but I think just falling back to "boost more" in those scenarios like
we do right now would be sufficient since power isn't as important with >1
monitor. Additionally, we could also combine ickle's suggestion of putting high
priority jobs like this on the top of the queue and ditching anything else and
do that for the duration of the vblank or up until the point at which all time
sensitive batches are complete.

Of course though, ickle has already mentioned that we're a long way off from
having SCHED_DEADLINE, but maybe we could at least use this logic for any
EGL_IMG_context_priority jobs instead of just blindly boosting ahead of time? 
> 
> > Because we are looking for anything that interacts with the user to
> > preferentially give those low latency. We identify these by contexts which
> > are tied to files and processes. On the other hand, we can remove that
> > guesswork if userspace is able to tell us what is high priority and what is
> > not.
> Okay, so it's not just video card in turbo mode or power save mode, but also
> certain GL contexts getting preferential access to the card.  Seems like a
> worthwhile thing to have, I guess, but giving gnome-shell preferential
> access doesn't help firefox rerender faster, right ?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the QA Contact for the bug.
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx-bugs/attachments/20170821/714e9504/attachment-0001.html>


More information about the intel-gfx-bugs mailing list