[Intel-gfx] [PATCH] [RFC] drm/i915: Generate a hang error code

Daniel Vetter daniel at ffwll.ch
Wed Feb 5 16:15:02 CET 2014


On Wed, Feb 05, 2014 at 02:59:08PM +0000, Jesse Barnes wrote:
> On Tue,  4 Feb 2014 12:18:55 +0000
> Ben Widawsky <benjamin.widawsky at intel.com> wrote:
> 
> > We get a large number of bugs which have a, "hey I have that too"
> > because they see a GPU hang in dmesg. While two machines of the same
> > model having a GPU hang is indeed a coincidence, it is far from enough
> > evidence to suggest they are the same.
> > 
> > In order to reduce this effect, and hopefully get people to file new bug
> > reports, clearly the error message itself has been insufficient (see ref
> > at the bottom for a new bug report with this characteristic).
> > 
> > The algorithm is purposely pretty naive. I don't think we need much in
> > order to avoid the problem I am trying to solve, and keeping it naive
> > gives us some ability to make a decent test case.
> 
> I like the direction of this.  If we can get some basic info into the
> dmesg part of things (the only part regular users will actually look
> at) we can probably avoid some of the "me too" action we see on general
> GPU hangs.  Having PID, comm, and some sort of hang signature are all
> good steps in that direction imo.

tbh I don't see much value in regular users trying to triage gpu hang. If
they're not damn sure that they have a dupe (which means same platform,
versions of the software stack and crashing games) I much prefer if they
just send in a duplicate bug for us to triage.

With the mis-design of bugzilla it's much harder to untangle a wrong
me-too than mark something as duplicate. And especially long-running bugs
are a royal pain if there's too much wrong me-too noise in there.

Not a comment on the patch itself, just a general comment wrt avoiding
me-too gpu hang reports.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch



More information about the Intel-gfx mailing list