[Intel-gfx] [PATCH 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Tue Apr 5 10:02:15 UTC 2016


On 05/04/16 09:59, Chris Wilson wrote:
> On Tue, Apr 05, 2016 at 09:54:58AM +0100, Tvrtko Ursulin wrote:
>>
>> On 04/04/16 19:58, Chris Wilson wrote:
>>> On Mon, Apr 04, 2016 at 05:51:09PM +0100, Tvrtko Ursulin wrote:
>>>> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>>>
>>>> Current implementation releases the forcewake at any time between
>>>> straight away, and one jiffie from the last put, or first automatic
>>>> grab.
>>>
>>> That isn't the problem though. The problem is that we set the timer on
>>> first use rather than last use. All you are stating here is that by
>>> lengthening the timeout on your system you reduce the number of times it
>>> times out.
>>
>> It is true the reduction I see is due lengthening of the average timeout.
>>
>> But with regards to re-arming approach, I thought so initially
>> myself, but then, if we expect bursty access then it shouldn't
>> matter and the simplicity of doing it like it currently is better.
>>
>> Even in practice, I noticed the effect of lengthening the timeout is
>> much greater than moving the arming to the last access. And to get
>> to very few to none auto releases on busy workloads we need in the
>> regions of 5ms, which would be a big change. Or maybe not if you
>> consider HZ=100 kernels.
>>
>> It is very difficult to know what is actually better considering
>> power between the CPU and GPU and performance. So I though the best
>> would be to keep it similar to the current timings, just fix the
>> dependency on HZ and also slack with hrtimers might help with
>> something.
>>
>> As a final data point, explicit puts and auto releases seems to be
>> relatively balanced in my testing. With this patch T-Rex for example
>> auto-releases in the region of 3-4 times in 10ms, with around 5-10
>> explicit gets, and 5-10 implicit gets in 10ms.
>>
>> A different, interrupt heavy workload (~20k irqs/sec) manages
>> similarly 2-4 auto-releases per 10ms, and has ~250 explicit gets and
>> ~380 implicit per 10ms.
>>
>> Looking at the two I think the fact they manage to auto-release
>> relatively similarly, compared to a huge different in fw gets,
>> suggest burstyness. So I am not sure that any smarts with the
>> release period would be interesting. At least not without serious
>> power/perf measurements.
>>
>>> Having said that, the conversion to hrtimer seems sensible but to add
>>> tracking of the last access as opposed to first we either fallback to
>>> jiffie (in which case hrtimer is moot) or rely on ktime_get_raw() being
>>> fast enough for every register write. Hmm, my usual response to that has
>>> been if it matters we avoid the heavyweight macros and use the _FW
>>> interface - or even raw readl/writel.
>>>
>>> Could you try storing ktime_get_raw() on every access and rearming the
>>> timer if it expires before last-access + min-period?
>>
>> That would be similar to your patch from before my holiday, right?
>> As I said above, I did not notice much change with that approach.
>> Just extending the timeout has a much greater effect, but as I wrote
>> above, I am not sure we want to change it.
>
> There was just one bug in that patch in checking the expiration that
> makes a huge difference, but if you please add the discussion above to
> the commit log that would be invaluable.

I think I had a fixed version, would have to dig in branches now.

I will add more text to the commit.

Most importantly, do you think this would affect HZ=10 or HZ=250 kernels 
at all? Presumably if it could, someone would know today that there is a 
difference between the kernels based on HZ already?

Regards,

Tvrtko


More information about the Intel-gfx mailing list