[Bug 108854] [polaris11] - GPU Hang - ring gfx timeout

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Fri Feb 22 18:15:41 UTC 2019


https://bugs.freedesktop.org/show_bug.cgi?id=108854

--- Comment #16 from Tom Seewald <tseewald at gmail.com> ---
(In reply to Tom St Denis from comment #15)
> If you can't reproduce on a newer version of mesa then it's "been fixed" :-)

My (probably incorrect) understanding is roughly this:

    +-------+-------+
1.) |  Application  |
    +-------+-------+
       |
       | Possibly sending bad commands/calls to Mesa
       |
       v
    +------+---------+
2.) |     Mesa       |
    +------+---------+
       |
       | Passing on bad calls from the application
       |     or
       | There is a bug in Mesa itself where it is sending bad calls/commands
to the kernel
       v
    +--------+--------+
3.) |  Kernel/amdgpu  |
    +--------+--------+
       |
       | amdgpu puts the physical device in a bad state due to bad commands
from Mesa
       v
    +--------+--------+
4.) |       GPU       |
    +--------+--------+

Given that mesa 18.3.3+ "fixes" the issue, it sounds like a specific case of
mesa sending garbage to the kernel (step 2 to 3) has been fixed.

But in general shouldn't the kernel driver (ideally) be able to handle mesa
passing malformed/bad commands rather than freezing the device (step 3 to 4)? 
I understand not every case can be covered, and I also understand that GPU
resets need to be supported in user space for seamless recovery, but shouldn't
the driver "unstick" itself enough so the computer can be rebooted normally?

Thanks for your time and patience.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20190222/fada713f/attachment.html>


More information about the dri-devel mailing list