[Intel-gfx] [PATCH RFC] drm/i915: Print dmesg warn on unintended hangs
Daniel Vetter
daniel at ffwll.ch
Fri Nov 10 13:21:12 UTC 2017
On Fri, Nov 10, 2017 at 02:49:25PM +0200, Mika Kuoppala wrote:
> Chris Wilson <chris at chris-wilson.co.uk> writes:
>
> > Quoting Mika Kuoppala (2017-11-10 12:20:55)
> >> Chris Wilson <chris at chris-wilson.co.uk> writes:
> >>
> >> > Quoting Mika Kuoppala (2017-11-10 11:53:47)
> >> >> We have a problem of distinguishing intended hangs
> >> >> submitted by igt during CI/bat and hangs that are nonintended
> >> >> happening in close proximity.
> >> >
> >> > Do we? I haven't had that problem in distinguishing them.
> >>
> >> Piglit can't tell them apart afaik. Due to info level.
> >
> > Piglit? If the test passes, it doesn't matter how the kernel got there,
> > the user behaviour is as expected. If the test wants to assert that it
> > didn't hang, it can do that.
>
> Through reset counts? At starters we could assert in framework that
> all tests that do not call igt_hang() expect reset count to
> stay the same between entry/exit.
>
> I see the logic behind that user behaviour is as expected.
>
> Would be good that CI folks chime in here and detail how
> they want things to work.
I'm very vary of having to sprinkle that all over CI tbh, but if it's in
the framework I guess it can work too. Will be fun to figure out how to
catch unintended hangs in the tests that do provoke hangs, but should be
doable.
But for adding it to the framework I think we're already putting way too
much random quiescent stuff in there, and for generic kms tests there's
kinda no need for that. So not entirely sold that this is the best
approach we can do.
A semi-middleground would be if we have new functions that open a gem fd
for rendering, and we have some sanity-checks to make sure that only when
you ask for rendering do the igt ioctl wrappers allow you to. Then we
could stuff all these checks in there.
But that still leaves the issue that a gpu hang on e.g. a s/r test or
module reload won't be caught, and we really want to catch these.
Module reload btw is also one case where just checking the reset counter
will just not work. And module reload is exactly one of these cases where
we do want to make sure we don't misprogram the gpu so it dies.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
More information about the Intel-gfx
mailing list