[Intel-gfx] [RFC 0/3] Engine utilization tracking

Wed May 10 15:50:01 UTC 2017

On 5/10/2017 1:38 AM, Tvrtko Ursulin wrote:
>
> On 09/05/2017 19:11, Dmitry Rogozhkin wrote:
>> On 5/9/2017 8:51 AM, Tvrtko Ursulin wrote:
>>> On 09/05/2017 16:29, Chris Wilson wrote:
>>>> On Tue, May 09, 2017 at 04:16:41PM +0100, Tvrtko Ursulin wrote:
>>>>>
>>>>> On 09/05/2017 15:26, Chris Wilson wrote:
>>>>>> On Tue, May 09, 2017 at 03:09:33PM +0100, Tvrtko Ursulin wrote:
>>>>>>> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>>>>>>
>>>>>>> By popular customer demand here is the prototype for cheap engine
>>>>>>> utilization
>>>>>>> tracking.
>>>>>>
>>>>>> customer and debugfs?
>>>>>
>>>>> Well I did write in one of the following paragraphs on this topic.
>>>>> Perhaps I should have put it in procfs. :) Sysfs API looks
>>>>> restrictive or perhaps I missed a way to get low level (fops) access
>>>>> to it.
>>>>>
>>>>>>> It uses static branches so in the default off case it really
>>>>>>> should be cheap.
>>>>>>
>>>>>> Not as cheap (for the off case) as simply sampling 
>>>>>> RING_HEAD/RING_TAIL
>>>>>
>>>>> Off case are three no-op instructions in three places in the irq
>>>>> tasklet. And a little bit of object size growth, if you worry about
>>>>> that aspect?
>>>>
>>>> It's just how the snowball begins.
>>>
>>> We should be able to control it. We also have to consider which one is
>>> lighter for this particular use case.
>>>
>>>>>> which looks to be the same level of detail. I wrapped all this up 
>>>>>> in a
>>>>>> perf interface once up a time...
>>>>>
>>>>> How does that work? Via periodic sampling? Accuracy sounds like it
>>>>> would be proportionate to the sampling frequency, no?
>>>>
>>>> Right, and the sampling frequency is under user control (via perf) 
>>>> with
>>>> a default of around 1000, gives a small systematic error when dealing
>>>> with %
>>>>
>>>> I included power, interrupts, rc6, frequency (and the statistics but I
>>>> never used those and dropped them once oa landed), as well as
>>>> utilisation, just for the convenience of having sane interface :)
>>>
>>> Can you resurrect those patches? Don't have to rebase and all but I
>>> would like to see them at least.
>> Mind that the idea behind the requested kind of stats is primary usage
>> by the customers in the _product_ environment to track GPU occupancy and
>> predict based on this stats whether they can execute something else.
>> Which means that 1) debugfs and any kind of debug-like infrastructure is
>
> Yeah I acknowledged in the cover letter debugfs is not ideal.
>
> I could implement it in sysfs I suppose by doing time based 
> transitions as opposed to having explicit open/release hooks. It 
> wouldn't make a fundamental different to this RFC from the overhead 
> point of view.
>
> But most importantly we need to see in detail how does Chris' perf 
> based idea looks like and does it fit your requirements.
>
>> really a no-option, 2) any kind of restrictions are no-option (like
>> disable RC6 states). Also, there is no need to expose low-level detailed
>> information like how many EUs and VMEs were in use - this belongs to the
>> debug things. As for now i915 driver exposes only single required
>> metric: gt_act_freq_mhz.
>
> I suppose it doesn't matter if the perf based solution (or any really) 
> exports more than what you want/need since it is such that you can 
> select the events you are interested in.
>
> But the overhead and accuracy of both solutions, plus some other 
> considerations like maintainability, need to be looked at.
>

Let's review both solutions. I am not against.

> Regards,
>
> Tvrtko