[igt-dev] [Intel-gfx] [PATCH i-g-t v6] tests/perf_pmu: Verify engine busyness accuracy

Chris Wilson chris at chris-wilson.co.uk
Mon Feb 19 11:04:44 UTC 2018


Quoting Tvrtko Ursulin (2018-02-19 10:58:25)
> 
> On 19/02/2018 10:26, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-02-19 09:57:20)
> >>
> >> On 19/02/2018 09:27, Chris Wilson wrote:
> >>> Quoting Tvrtko Ursulin (2018-02-19 09:19:47)
> >>>>
> >>>> Do you have a link to BSW hang? Is that obviously related to PMU?
> >>>
> >>> It's only occurring in this test, just looks like an issue with the
> >>> spinner:
> >>>
> >>> [bsw] https://intel-gfx-ci.01.org/tree/drm-tip/kasan_2/fi-bsw-n3050/igt@perf_pmu@busy-accuracy-2-bcs0.html
> >>
> >> ...
> >> <0>[  681.022677] perf_pmu-1516    1..s1 282520414us : execlists_submission_tasklet: bcs0 in[0]:  ctx=3.1, seqno=a
> >> <0>[  681.022838] perf_pmu-1516    1..s1 282520580us : execlists_submission_tasklet: bcs0 cs-irq head=5 [5?], tail=0 [0?]
> >> <0>[  681.023001] perf_pmu-1516    1..s1 282520594us : execlists_submission_tasklet: bcs0 csb[0]: status=0x00000001:0x00000000, active=0x1
> >> <0>[  681.023168] kworker/-338     1.... 298087910us : reset_common_ring: bcs0 seqno=a
> >> <0>[  681.023321] ksoftirq-17      1..s. 298088483us : execlists_submission_tasklet: bcs0 in[0]:  ctx=3.1, seqno=a
> >> <0>[  681.023482] ksoftirq-17      1..s. 298088575us : execlists_submission_tasklet: bcs0 cs-irq head=0 [0], tail=1 [1]
> >> <0>[  681.023644] ksoftirq-17      1..s. 298088579us : execlists_submission_tasklet: bcs0 csb[1]: status=0x00000018:0x00000003, active=0x1
> >> <0>[  681.023811] ksoftirq-17      1..s. 298088581us : execlists_submission_tasklet: bcs0 out[0]: ctx=3.1, seqno=a
> >>
> >> Everything stops.
> >>
> >>> [kbl] https://intel-gfx-ci.01.org/tree/drm-tip/kasan_2/fi-kbl-7560u/igt@perf_pmu@busy-accuracy-2-bcs0.html
> >>
> >> ...
> >> <0>[  506.745332] perf_pmu-1544    3..s1 107905835us : execlists_submission_tasklet: bcs0 in[0]:  ctx=3.1, seqno=a
> >> <0>[  506.745397]   <idle>-0       2..s1 107905980us : execlists_submission_tasklet: bcs0 cs-irq head=2 [1?], tail=3 [3?]
> >> <0>[  506.745440]   <idle>-0       2..s1 107905983us : execlists_submission_tasklet: bcs0 csb[3]: status=0x00000001:0x00000000, active=0x1
> >> <0>[  506.745498] kworker/-30      3.... 120840583us : reset_common_ring: bcs0 seqno=a
> >> <0>[  506.745547] ksoftirq-29      3..s. 120840688us : execlists_submission_tasklet: bcs0 in[0]:  ctx=3.1, seqno=a
> >> <0>[  506.745598] in:imklo-499     2..s1 120840710us : execlists_submission_tasklet: bcs0 cs-irq head=0 [0], tail=1 [1]
> >> <0>[  506.745637] in:imklo-499     2..s1 120840712us : execlists_submission_tasklet: bcs0 csb[1]: status=0x00000018:0x00000003, active=0x1
> >> <0>[  506.745676] in:imklo-499     2..s1 120840713us : execlists_submission_tasklet: bcs0 out[0]: ctx=3.1, seqno=a
> >>
> >> Everything stops here.
> >>
> >> I have not idea what's happening here. In both cases I would expect the test
> >> to have exited after the GPU hang (or at least attempt to exit!), since it
> >> would detect it overran the timeout.
> >>
> >> Could it be stuck in gem_sync after the reset? Or somewhere else?
> > 
> > I think it's that we will be throwing the calibration off if it hangs.
> > If busy_ns = 10s, won't that generate a target idle time of 500s?
> 
> Indeed, well spotted. I'll need to add a hang detector of some sort.

Oh, I think I know why it's hanging. As the buffer will be idle, the
kernel is allowed to move it, and __submit_spin_batch() doesn't tell the
kernel to preserve the original address (so the kernel assumes that the
relocations are relative to the passed in address and so move the buffer
to match). I should have noticed that before given the discussion around
EXEC_OBJECT_PINNED for the spinner.

I think there's an easy enough patch...
-Chris


More information about the igt-dev mailing list