[Intel-gfx] [RFC 0/3] Engine utilization tracking

Daniel Vetter daniel at ffwll.ch
Wed May 10 19:45:24 UTC 2017


On Wed, May 10, 2017 at 10:38 AM, Tvrtko Ursulin
<tvrtko.ursulin at linux.intel.com> wrote:
>
> On 09/05/2017 19:11, Dmitry Rogozhkin wrote:
>>
>> On 5/9/2017 8:51 AM, Tvrtko Ursulin wrote:
>>>
>>> On 09/05/2017 16:29, Chris Wilson wrote:
>>>>
>>>> On Tue, May 09, 2017 at 04:16:41PM +0100, Tvrtko Ursulin wrote:
>>>>>
>>>>>
>>>>> On 09/05/2017 15:26, Chris Wilson wrote:
>>>>>>
>>>>>> On Tue, May 09, 2017 at 03:09:33PM +0100, Tvrtko Ursulin wrote:
>>>>>>>
>>>>>>> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>>>>>>
>>>>>>> By popular customer demand here is the prototype for cheap engine
>>>>>>> utilization
>>>>>>> tracking.
>>>>>>
>>>>>>
>>>>>> customer and debugfs?
>>>>>
>>>>>
>>>>> Well I did write in one of the following paragraphs on this topic.
>>>>> Perhaps I should have put it in procfs. :) Sysfs API looks
>>>>> restrictive or perhaps I missed a way to get low level (fops) access
>>>>> to it.
>>>>>
>>>>>>> It uses static branches so in the default off case it really
>>>>>>> should be cheap.
>>>>>>
>>>>>>
>>>>>> Not as cheap (for the off case) as simply sampling RING_HEAD/RING_TAIL
>>>>>
>>>>>
>>>>> Off case are three no-op instructions in three places in the irq
>>>>> tasklet. And a little bit of object size growth, if you worry about
>>>>> that aspect?
>>>>
>>>>
>>>> It's just how the snowball begins.
>>>
>>>
>>> We should be able to control it. We also have to consider which one is
>>> lighter for this particular use case.
>>>
>>>>>> which looks to be the same level of detail. I wrapped all this up in a
>>>>>> perf interface once up a time...
>>>>>
>>>>>
>>>>> How does that work? Via periodic sampling? Accuracy sounds like it
>>>>> would be proportionate to the sampling frequency, no?
>>>>
>>>>
>>>> Right, and the sampling frequency is under user control (via perf) with
>>>> a default of around 1000, gives a small systematic error when dealing
>>>> with %
>>>>
>>>> I included power, interrupts, rc6, frequency (and the statistics but I
>>>> never used those and dropped them once oa landed), as well as
>>>> utilisation, just for the convenience of having sane interface :)
>>>
>>>
>>> Can you resurrect those patches? Don't have to rebase and all but I
>>> would like to see them at least.
>>
>> Mind that the idea behind the requested kind of stats is primary usage
>> by the customers in the _product_ environment to track GPU occupancy and
>> predict based on this stats whether they can execute something else.
>> Which means that 1) debugfs and any kind of debug-like infrastructure is
>
>
> Yeah I acknowledged in the cover letter debugfs is not ideal.
>
> I could implement it in sysfs I suppose by doing time based transitions as
> opposed to having explicit open/release hooks. It wouldn't make a
> fundamental different to this RFC from the overhead point of view.
>
> But most importantly we need to see in detail how does Chris' perf based
> idea looks like and does it fit your requirements.

+1 on perf pmu, that sounds much more like the userspace interface
you're looking for. If it's not that, then perhaps hand-rolled like
the i915 OA stuff we now have (but starting out with a perf pmu sounds
much better, at least for anything global which doesn't need to be
per-context or per-batch).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch


More information about the Intel-gfx mailing list