[Intel-gfx] [PATCH v2] drm/i915: Keep the forcewake timer alive for 1ms past the most recent use

Mon May 15 12:06:47 UTC 2017

On 15/05/2017 11:41, Chris Wilson wrote:
> On Mon, May 15, 2017 at 11:14:32AM +0100, Tvrtko Ursulin wrote:
>>
>> On 12/05/2017 23:16, Chris Wilson wrote:
>>> Currently the timer is armed for 1ms after the first use and is killed
>>> immediately, dropping the forcewake as early as possible. However, for
>>
>> Correct for implicit grabs, but for explicit it is 1-2ms after the last use.
>
> From the put of the first, we don't rearm the timer on later puts.

What do you mean by first? I see the timer getting armed on the last put.

>>> very frequent operations the forcewake dance has a large impact on
>>> latency and keeping the timer alive until we are idle is preferred. To
>>
>> What workloads see the difference and by how much?
>
> The issue I have is that we can't submit nops fast enough using

Fast enough for what? You mean just for your liking ie. we can be faster?

> lite-restore. A large part of that was from the rpm get/put in the
> tasklet, but I suspect ultimately it is the extra mmio/lite-restore that
> is slowing us down. Now, I'm happy that the lite-restore to keep port[1]
> accessible for the next context is a benefit, so I'm looking at how we
> can improve the continual resubmission.

Ok.

>> At the time I've fixed the auto-release to go from 0-1 jiffies to
>> 1-2ms, we talked about this conundrum - whether to consider the
>> first grab or last put for the timer. But we decided thorough
>> testing is needed to see if this would make a difference and what
>> power side effects it might have.
>
> In the above scenario, it never goes off so we are the paying the worst
> price of a useless dance. It's the periodic 1ms poll on an idle system
> that will suffer most, but in pratice this will delay turning off by an
> extra 1ms - and that may be the difference between 0.1% and 50% in power
> consumption :|

What is a periodic 1ms poll on the idle system? If the system is idle 
the auto-release timer will not be running.

In the rapid short submission scenario I see the auto-release timer 
potentially racing with the tasklet and needlessly dropping the fw. But 
rapid short submission is so much faster than the 1-2ms auto-release 
that this race must be very infrequent.

I instead expect mostly to see the timer run and find the 
domain->wake_count > 0 due a tasklet running in parallel who has grabbed 
the fw for itself.

Worse case scenario sounds like it would be some submission period 
around the auto-release period but just shifted in "phase", no?

So I do see some benefit, but would just want to see some numbers in the 
commit message and a more precise description of the scenario it improves.

Regards,

Tvrtko

>>> achieve this, if we call intel_uncore_forcewake_get whilst the timer is
>>> alive (repeated use), then set a flag to restart the timer on expiry
>>> rather than drop the forcewake usage count. The timer is racy, the
>>> consequence of the race is to expire the timer earlier than is now
>>> desired but does not impact on correct behaviour. The offset the race
>>> slightly, we set the active flag again on intel_uncore_forcewake_put.
>>
>> Using the hrtimer API to modify the timer was too expensive?
>
> In the past it has been unsuitable for frequent adjustment, and we may
> be using I915_READ every few instructions.
> -Chris
>