[Intel-gfx] [RFC 0/3] Engine utilization tracking

Wed May 10 08:38:24 UTC 2017

On 09/05/2017 19:11, Dmitry Rogozhkin wrote:
> On 5/9/2017 8:51 AM, Tvrtko Ursulin wrote:
>> On 09/05/2017 16:29, Chris Wilson wrote:
>>> On Tue, May 09, 2017 at 04:16:41PM +0100, Tvrtko Ursulin wrote:
>>>>
>>>> On 09/05/2017 15:26, Chris Wilson wrote:
>>>>> On Tue, May 09, 2017 at 03:09:33PM +0100, Tvrtko Ursulin wrote:
>>>>>> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>>>>>
>>>>>> By popular customer demand here is the prototype for cheap engine
>>>>>> utilization
>>>>>> tracking.
>>>>>
>>>>> customer and debugfs?
>>>>
>>>> Well I did write in one of the following paragraphs on this topic.
>>>> Perhaps I should have put it in procfs. :) Sysfs API looks
>>>> restrictive or perhaps I missed a way to get low level (fops) access
>>>> to it.
>>>>
>>>>>> It uses static branches so in the default off case it really
>>>>>> should be cheap.
>>>>>
>>>>> Not as cheap (for the off case) as simply sampling RING_HEAD/RING_TAIL
>>>>
>>>> Off case are three no-op instructions in three places in the irq
>>>> tasklet. And a little bit of object size growth, if you worry about
>>>> that aspect?
>>>
>>> It's just how the snowball begins.
>>
>> We should be able to control it. We also have to consider which one is
>> lighter for this particular use case.
>>
>>>>> which looks to be the same level of detail. I wrapped all this up in a
>>>>> perf interface once up a time...
>>>>
>>>> How does that work? Via periodic sampling? Accuracy sounds like it
>>>> would be proportionate to the sampling frequency, no?
>>>
>>> Right, and the sampling frequency is under user control (via perf) with
>>> a default of around 1000, gives a small systematic error when dealing
>>> with %
>>>
>>> I included power, interrupts, rc6, frequency (and the statistics but I
>>> never used those and dropped them once oa landed), as well as
>>> utilisation, just for the convenience of having sane interface :)
>>
>> Can you resurrect those patches? Don't have to rebase and all but I
>> would like to see them at least.
> Mind that the idea behind the requested kind of stats is primary usage
> by the customers in the _product_ environment to track GPU occupancy and
> predict based on this stats whether they can execute something else.
> Which means that 1) debugfs and any kind of debug-like infrastructure is

Yeah I acknowledged in the cover letter debugfs is not ideal.

I could implement it in sysfs I suppose by doing time based transitions 
as opposed to having explicit open/release hooks. It wouldn't make a 
fundamental different to this RFC from the overhead point of view.

But most importantly we need to see in detail how does Chris' perf based 
idea looks like and does it fit your requirements.

> really a no-option, 2) any kind of restrictions are no-option (like
> disable RC6 states). Also, there is no need to expose low-level detailed
> information like how many EUs and VMEs were in use - this belongs to the
> debug things. As for now i915 driver exposes only single required
> metric: gt_act_freq_mhz.

I suppose it doesn't matter if the perf based solution (or any really) 
exports more than what you want/need since it is such that you can 
select the events you are interested in.

But the overhead and accuracy of both solutions, plus some other 
considerations like maintainability, need to be looked at.

Regards,

Tvrtko