[Bug 203111] Unrecoverable GPU crash with DiRT 4

bugzilla-daemon at bugzilla.kernel.org bugzilla-daemon at bugzilla.kernel.org
Tue Apr 9 01:39:44 UTC 2019


https://bugzilla.kernel.org/show_bug.cgi?id=203111

--- Comment #6 from Alex Deucher (alexdeucher at gmail.com) ---
(In reply to Thomas from comment #5)
> Thanks a lot for the detailed answer. I'm still not sure if I understand
> everything correctly (shouldn't the kernel driver validate the command
> stream from userspace/mesa and stop bad things before they hit hardware /
> hang the GPU?) 

It's not really feasible.  For one, it adds a lot of CPU overhead.  There is
also so much state in the 3D pipeline it's nearly impossible to validate all of
the possible cases that could cause a hang.  In some cases, you may not even
know that a particular combination is bad until it gets hit.

> 
> Damn, if this wouldn't be the wrong place I would ask for more details about
> your last reply (the thing about the display servers not catching up with
> the GPU reset - aren't there drivers which perform GPU resets just nice
> under X11 already? What about Wayland?). It's so freaking nice, I bet I
> would learn a lot if we wold continue the discussion... Anyway, thanks again
> for explaining and sorry for me going a bit off topic in this reply.

I'm not sure if other drivers silently reset the GPU when they encounter a
hang.  It's generally easier to deal with on integrated GPUs since they operate
on system memory.  On dGPUs, the contents of vram might be lost after a GPU
reset as the memory controller is reset.  If vram is lost, the application that
is running needs to reload it's vram state.  Also for reliability, applications
should really be made aware of a GPU reset so they can validate their data. 
E.g., you don't want a scientific application to silently get bad data because
the GPU was reset silently in the background.

> 
> 
> One last thing... It's exremely off topic but I already derailed this reply
> and it has to be told: Thank you Alex for being the guy you are. I bet AMD
> doesn't pay you to explain technical details to stupid end users like me but
> that's very appreciated. You're a hero, keep on rockin'!

Thanks!  Glad to help.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.


More information about the dri-devel mailing list