[igt-dev] [Intel-gfx] [PATCH i-g-t v6] tests/perf_pmu: Verify engine busyness accuracy

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Mon Feb 19 10:58:25 UTC 2018


On 19/02/2018 10:26, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-02-19 09:57:20)
>>
>> On 19/02/2018 09:27, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2018-02-19 09:19:47)
>>>>
>>>> Do you have a link to BSW hang? Is that obviously related to PMU?
>>>
>>> It's only occurring in this test, just looks like an issue with the
>>> spinner:
>>>
>>> [bsw] https://intel-gfx-ci.01.org/tree/drm-tip/kasan_2/fi-bsw-n3050/igt@perf_pmu@busy-accuracy-2-bcs0.html
>>
>> ...
>> <0>[  681.022677] perf_pmu-1516    1..s1 282520414us : execlists_submission_tasklet: bcs0 in[0]:  ctx=3.1, seqno=a
>> <0>[  681.022838] perf_pmu-1516    1..s1 282520580us : execlists_submission_tasklet: bcs0 cs-irq head=5 [5?], tail=0 [0?]
>> <0>[  681.023001] perf_pmu-1516    1..s1 282520594us : execlists_submission_tasklet: bcs0 csb[0]: status=0x00000001:0x00000000, active=0x1
>> <0>[  681.023168] kworker/-338     1.... 298087910us : reset_common_ring: bcs0 seqno=a
>> <0>[  681.023321] ksoftirq-17      1..s. 298088483us : execlists_submission_tasklet: bcs0 in[0]:  ctx=3.1, seqno=a
>> <0>[  681.023482] ksoftirq-17      1..s. 298088575us : execlists_submission_tasklet: bcs0 cs-irq head=0 [0], tail=1 [1]
>> <0>[  681.023644] ksoftirq-17      1..s. 298088579us : execlists_submission_tasklet: bcs0 csb[1]: status=0x00000018:0x00000003, active=0x1
>> <0>[  681.023811] ksoftirq-17      1..s. 298088581us : execlists_submission_tasklet: bcs0 out[0]: ctx=3.1, seqno=a
>>
>> Everything stops.
>>
>>> [kbl] https://intel-gfx-ci.01.org/tree/drm-tip/kasan_2/fi-kbl-7560u/igt@perf_pmu@busy-accuracy-2-bcs0.html
>>
>> ...
>> <0>[  506.745332] perf_pmu-1544    3..s1 107905835us : execlists_submission_tasklet: bcs0 in[0]:  ctx=3.1, seqno=a
>> <0>[  506.745397]   <idle>-0       2..s1 107905980us : execlists_submission_tasklet: bcs0 cs-irq head=2 [1?], tail=3 [3?]
>> <0>[  506.745440]   <idle>-0       2..s1 107905983us : execlists_submission_tasklet: bcs0 csb[3]: status=0x00000001:0x00000000, active=0x1
>> <0>[  506.745498] kworker/-30      3.... 120840583us : reset_common_ring: bcs0 seqno=a
>> <0>[  506.745547] ksoftirq-29      3..s. 120840688us : execlists_submission_tasklet: bcs0 in[0]:  ctx=3.1, seqno=a
>> <0>[  506.745598] in:imklo-499     2..s1 120840710us : execlists_submission_tasklet: bcs0 cs-irq head=0 [0], tail=1 [1]
>> <0>[  506.745637] in:imklo-499     2..s1 120840712us : execlists_submission_tasklet: bcs0 csb[1]: status=0x00000018:0x00000003, active=0x1
>> <0>[  506.745676] in:imklo-499     2..s1 120840713us : execlists_submission_tasklet: bcs0 out[0]: ctx=3.1, seqno=a
>>
>> Everything stops here.
>>
>> I have not idea what's happening here. In both cases I would expect the test
>> to have exited after the GPU hang (or at least attempt to exit!), since it
>> would detect it overran the timeout.
>>
>> Could it be stuck in gem_sync after the reset? Or somewhere else?
> 
> I think it's that we will be throwing the calibration off if it hangs.
> If busy_ns = 10s, won't that generate a target idle time of 500s?

Indeed, well spotted. I'll need to add a hang detector of some sort.

In the meantime trying to figure out how to wire up GuC to engine stats. 
The fix to get correct state on stats enable by looking at ports is a 
problem given different tracking in GuC mode I had.

Regards,

Tvrtko




More information about the igt-dev mailing list