[igt-dev] [Intel-gfx] [PATCH i-g-t v6] tests/perf_pmu: Verify engine busyness accuracy

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Mon Feb 19 09:57:20 UTC 2018


On 19/02/2018 09:27, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-02-19 09:19:47)
>>
>> Do you have a link to BSW hang? Is that obviously related to PMU?
> 
> It's only occurring in this test, just looks like an issue with the
> spinner:
> 
> [bsw] https://intel-gfx-ci.01.org/tree/drm-tip/kasan_2/fi-bsw-n3050/igt@perf_pmu@busy-accuracy-2-bcs0.html

...
<0>[  681.022677] perf_pmu-1516    1..s1 282520414us : execlists_submission_tasklet: bcs0 in[0]:  ctx=3.1, seqno=a
<0>[  681.022838] perf_pmu-1516    1..s1 282520580us : execlists_submission_tasklet: bcs0 cs-irq head=5 [5?], tail=0 [0?]
<0>[  681.023001] perf_pmu-1516    1..s1 282520594us : execlists_submission_tasklet: bcs0 csb[0]: status=0x00000001:0x00000000, active=0x1
<0>[  681.023168] kworker/-338     1.... 298087910us : reset_common_ring: bcs0 seqno=a
<0>[  681.023321] ksoftirq-17      1..s. 298088483us : execlists_submission_tasklet: bcs0 in[0]:  ctx=3.1, seqno=a
<0>[  681.023482] ksoftirq-17      1..s. 298088575us : execlists_submission_tasklet: bcs0 cs-irq head=0 [0], tail=1 [1]
<0>[  681.023644] ksoftirq-17      1..s. 298088579us : execlists_submission_tasklet: bcs0 csb[1]: status=0x00000018:0x00000003, active=0x1
<0>[  681.023811] ksoftirq-17      1..s. 298088581us : execlists_submission_tasklet: bcs0 out[0]: ctx=3.1, seqno=a

Everything stops.

> [kbl] https://intel-gfx-ci.01.org/tree/drm-tip/kasan_2/fi-kbl-7560u/igt@perf_pmu@busy-accuracy-2-bcs0.html

...
<0>[  506.745332] perf_pmu-1544    3..s1 107905835us : execlists_submission_tasklet: bcs0 in[0]:  ctx=3.1, seqno=a
<0>[  506.745397]   <idle>-0       2..s1 107905980us : execlists_submission_tasklet: bcs0 cs-irq head=2 [1?], tail=3 [3?]
<0>[  506.745440]   <idle>-0       2..s1 107905983us : execlists_submission_tasklet: bcs0 csb[3]: status=0x00000001:0x00000000, active=0x1
<0>[  506.745498] kworker/-30      3.... 120840583us : reset_common_ring: bcs0 seqno=a
<0>[  506.745547] ksoftirq-29      3..s. 120840688us : execlists_submission_tasklet: bcs0 in[0]:  ctx=3.1, seqno=a
<0>[  506.745598] in:imklo-499     2..s1 120840710us : execlists_submission_tasklet: bcs0 cs-irq head=0 [0], tail=1 [1]
<0>[  506.745637] in:imklo-499     2..s1 120840712us : execlists_submission_tasklet: bcs0 csb[1]: status=0x00000018:0x00000003, active=0x1
<0>[  506.745676] in:imklo-499     2..s1 120840713us : execlists_submission_tasklet: bcs0 out[0]: ctx=3.1, seqno=a

Everything stops here.

I have not idea what's happening here. In both cases I would expect the test
to have exited after the GPU hang (or at least attempt to exit!), since it
would detect it overran the timeout.

Could it be stuck in gem_sync after the reset? Or somewhere else?

Could we add "echo t > /proc/sysrq-trigger" equivalent when owatch triggers?

Or it would overflow some buffer? Should work in cases like this one, when
it is not a machine hang.

Regards,

Tvrtko


More information about the igt-dev mailing list