[Bug 112226] [HadesCanyon/regression] GPU hang causes also X server to die

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Thu Nov 7 14:35:53 UTC 2019


https://bugs.freedesktop.org/show_bug.cgi?id=112226

Eero Tamminen <eero.t.tamminen at intel.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|[HadesCanyon] GPU hangs     |[HadesCanyon/regression]
                   |don't anymore recover       |GPU hang causes also X
                   |(although kernel still      |server to die
                   |claims that they do)        |

--- Comment #3 from Eero Tamminen <eero.t.tamminen at intel.com> ---
(In reply to Alex Deucher from comment #1)
> Please attach your dmesg output and xorg log is using X.  Please note that
> after a GPU reset, in most cases you need to restart your desktop
> environment because no desktop environments properly handle the loss of
> their contexts at the moment.

Failed tests complain about the invalid MIT-MAGIC-COOKIE-1, so it seems that
later failures are because X went down (and came back up with display manager).

AFAIK reset should affect only the context running in the GPU when it was
reseted, not the others [1], and in this case the problematic client should be
GfxBench (Manhattan test-case, see bug 108898), not X server.

Btw. Why AMD kernel module doesn't tell which process / context had the issue,
like i915 does?

[1] At least that's the case with i915, as long as the whole system doesn't
hang. 


(In reply to Eero Tamminen from comment #0)
> * If latest Mesa is used with drm-tip kernel 5.3, 4/5 times X fails to
> start.  This started to happen with Mesa version within couple of days of
> the GPU hang recovery issue, so potentially there are more issue in Mesa
> (HadesCanyon) AMD support

Correction.  That issue happens only when using latest Mesa with few months old
X server and (5.3) drm-tip kernel. If latest git versions of all are used, X
starts fine.  But since the indicated date, it dies later, when Manhattan
test-case causes problems.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20191107/bb31e7f8/attachment.html>


More information about the dri-devel mailing list