[Intel-gfx] [PATCH v3] drm/i915/pmu: Reconstruct active state on starting busy-stats

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Fri Jan 12 11:43:11 UTC 2018


On 12/01/2018 10:35, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-01-12 10:30:26)
>>
>>
>> On 12/01/2018 09:51, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2018-01-12 09:40:40)
>>>> So submit side doesn't work in either case, unless I am missing
>>>> something. Would need the pair of port manipulation and context_in to be
>>>> atomic.
>>>
>>> Sure, there is a small window with the result that we either never turn
>>> off the stats, or turn them off one too early (and then recover if the
>>> my understanding of the underflow protection holds). The same problem as
>>> existed before the reconstruction, except now the window is much
>>> smaller. I'm not that scared by this (it should not explode, nor result
>>> in hopelessly wrong stats) so it can wait until stats enabling doesn't
>>> need to peek at execlists. I think you will need to postpone enabling
>>> until the next context-switch if we wanted to avoid the atomics; except
>>> that poses a problem with the igt expecting busy-start to be accurate. A
>>> dilemma for later.
>>
>> My analysis was partially incorrect, yes, there is underflow protection
>> already.
>>
>> But I don't see that there is a race window where we end up with
>> permanent 100% busyness before the reconstruction patch. Where do you
>> see that?
>>
>> The worst I see without the reconstruction is to miss the accounting of
>> the batch currently in progress when stats get enabled. Which is a much
>> less serious, totally ignorable event.
> 
> One is observable via pmu, the other not. If we fail to turn off
> busy-stats accounting, nothing is lost except for a few wasted cycles.
> Except on the next pmu, it starts from the previous cs instead of the
> enabling -- but that is a problem that also exists with the second user.

I don't follow, I am talking about permanent 100%. Let me copy what I 
wrote earlier:

port0 context complete
context_out - not enabled, no-op
stats enable - port0 busy, active = 1
port0 clear
submit
context_in - active = 2 !!! BAD
port0 set

That leads to permanent 100% until busy stats are disabled. I think that 
is hugely less desirable than just failing to account for the currently 
running batch, which was the case before reconstruction on enable, and 
is 99.9% only a problem for IGTs.

Do you think there is a flaw in this analysis or something else?

Regards,

Tvrtko






More information about the Intel-gfx mailing list