[Bug 99611] [SNB] GPU hang after over temperature

Thu Jul 6 07:15:09 UTC 2017

https://bugs.freedesktop.org/show_bug.cgi?id=99611

--- Comment #7 from Chris Tillman <toff.tillman at gmail.com> ---
Well, I think I agree with you, but that's not what the log told me to do:

Jan 31 21:12:29 ctillman kernel: [64809.320303] [drm] GPU HANG: ecode
6:0:0x86fafffa, in Xorg [612], reason: Hang on render ring, action:
reset
Jan 31 21:12:29 ctillman kernel: [64809.320305] [drm] GPU hangs can
indicate a bug anywhere in the entire gfx stack, including userspace.
Jan 31 21:12:29 ctillman kernel: [64809.320306] [drm] Please file a
_new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Jan 31 21:12:29 ctillman kernel: [64809.320306] [drm] drm/i915
developers can then reassign to the right component if it's not a
kernel issue.
Jan 31 21:12:29 ctillman kernel: [64809.320306] [drm] The gpu crash
dump is required to analyze gpu hangs, so please always attach it.
Jan 31 21:12:29 ctillman kernel: [64809.320307] [drm] GPU crash dump
saved to /sys/class/drm/card0/error

I understand if you don't have the dump log that it can be difficult to pin
down. I tried to find a coretemp bug list but couldn't. I'm happy for you
to close it if you can't think of anything else.

On Thu, Jul 6, 2017 at 8:22 AM, <bugzilla-daemon at freedesktop.org> wrote:

> *Comment # 6 <https://bugs.freedesktop.org/show_bug.cgi?id=99611#c6> on
> bug 99611 <https://bugs.freedesktop.org/show_bug.cgi?id=99611> from
> Elizabeth <elizabethx.de.la.torre.mena at intel.com> *
>
> (In reply to Chris Tillman from comment #5 <https://bugs.freedesktop.org/show_bug.cgi?id=99611#c5>)> ... available measurements from coretemp were not being heeded. The
> > logs show that an overtemperature is reported only for a cycle until it
> > sets back, saying for example
> >
> > "[57849.613938] CPU1: Core temperature above threshold, cpu clock throttled
> > (total events = 12172)"
> > and then almost immediately (12 microseconds) after,
> > "[57849.614950] CPU1: Core temperature/speed normal"
> >
> > It appears from the logs that the only response to monitoring is an
> > immediate reset of the sensor, and that protection of the machine is not
> > occurring.
> >
> Hello Chris,
> Although this is in fact a problem, it seems to be more related to the CPU and
> the coretemp program than to the GPU and DRM, if that's the case there is no
> much to do for us here. Could you please take some time to check on the
> community forums for some orientation on what product/component could be
> causing the problem and change the bug information?
> That would be helpful to find a solution on this case:https://01.org/linuxgraphics/community
> Thank you.
>
> ------------------------------
> You are receiving this mail because:
>
>    - You reported the bug.
>
>

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx-bugs/attachments/20170706/30f317a6/attachment-0001.html>