[Intel-gfx] ✗ Fi.CI.IGT: failure for drm/i915: Add support for asynchronous display power disabling

Imre Deak imre.deak at intel.com
Mon May 6 09:44:41 UTC 2019


On Fri, May 03, 2019 at 04:52:58PM +0300, Imre Deak wrote:
> > > > [...]
> > > >   * igt at gem_persistent_relocs@forked-interruptible-thrashing:
> > > >     - shard-glk:          [PASS][1] -> [TIMEOUT][2]
> > > >    [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6032/shard-glk6/igt@gem_persistent_relocs@forked-interruptible-thrashing.html
> > > >    [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12955/shard-glk7/igt@gem_persistent_relocs@forked-interruptible-thrashing.html
> > > 
> > > Looks like an unrelated issue: on this GLK there are two HDMI displays
> > > connected, so the change shouldn't make any diffence on it. The change
> > > only affects the DP detect and hotplug paths, where we'll do now an
> > > async power domain put.
> > 
> > There's no history of glk locking up there, 
> > 
> > > The machine is still up when the problem happens, the test seems to get
> > > stuck and aborted by the test runner (after ~6mins according to [1]).
> > > 
> > > [43/82] (762s left) gem_persistent_relocs (forked-interruptible-thrashing)
> > > Starting subtest: forked-interruptible-thrashing
> > > Timeout. Killing the current test with SIGQUIT.
> > > Timeout. Killing the current test with SIGKILL.
> > 
> > and yet it locked up sufficiently to not respond to a signal, suggesting
> > an oops (the test takes 3s normally on glk).
> 
> No pstore logs either. I also noticed that the run [1] above resulted in
> an incomplete, if that's indicative of anything. The same goes for the
> previous Patchwork_12954 run.

For reference what we discussed on IRC:

There is also a previous Trybot run on SKL that hang in the same test in
a similar way:
https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4242/shard-skl9/igt%40gem_persistent_relocs%40forked-interruptible-thrashing.html

We're missing stack dumps to better isolate the problem, but based on
the above it's unlikely to be caused by the changes in this patchset.
Chris suggested

cat /proc/*/stack on timeout and ptracing the child from the IGT runner,
adding Petri for that.

I'm trying to repro the problem on SKL/GLK with and without this
patchset, so far I didn't hit the issue.

I opened the following bug to capture the findings:
https://bugs.freedesktop.org/show_bug.cgi?id=110618

and will check out Chris' patchset that may be related/fix the problem:
https://patchwork.freedesktop.org/series/60257/

--Imre


More information about the Intel-gfx mailing list