[Bug 75701] New: Radeon: GPU recovery is unable to recover from GPU lockups (HD5770 - OpenCL example).

bugzilla-daemon at bugzilla.kernel.org bugzilla-daemon at bugzilla.kernel.org
Wed May 7 18:46:51 PDT 2014


https://bugzilla.kernel.org/show_bug.cgi?id=75701

            Bug ID: 75701
           Summary: Radeon: GPU recovery is unable to recover from GPU
                    lockups (HD5770 - OpenCL example).
           Product: Drivers
           Version: 2.5
    Kernel Version: 3.15-rc4
          Hardware: x86-64
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: high
          Priority: P1
         Component: Video(DRI - non Intel)
          Assignee: drivers_video-dri at kernel-bugs.osdl.org
          Reporter: t3st3r at mail.ru
        Regression: No

Created attachment 135351
  --> https://bugzilla.kernel.org/attachment.cgi?id=135351&action=edit
Unsuccessful GPU recovery attempt - kernel log

There are some cases when Radeon GPUs can lock up on some MESA errors and so
on. While it MESA bugs and somesuch, there is what I believe to be kernel side
bug as well. 

Kernel side problem is how kernel handles GPU recovery procedure. Right now GPU
recovery would fail most of time on virtually any MESA bug and any GPUm, system
would be left in completely unusable state due to lack of graphic output. 


Couple of recent examples would be filed for 2 GPU families.
*This* bug is for GPU deadlock on HD5770 (Evergreen - JUNIPER) on bugged MESA
OpenCL operations.

To reproduce:
1) Install Ubuntu 14.04.
2) Add "oibaf PPA" to get recent MESA-based drivers. 
3) Update GPU drivers from Oibaf PPA.
4) Install mesa-opencl-icd library for OpenCL (icd based) support.
5) Boot with 3.15-rc4 kernel (can be self-compiled or taken from kernel PPA,
does not affects bug).
6) Get "Clpeak" tool (https://github.com/krrishnarraj/clpeak.git) and build it
(OpenCL VRAM benchmark tool). 
7) Try to run it.
8) Program will do some benchmark. Then GPU would lock up.
9) Then kernel part would try recovery. It would fail all the time.

Result:
 GPU locks up. Recovery fails. System left in unusable state due to lack of
graphic output.

Expected:
 More or less sane GPU recovery. Some data could be lost, picture can be
distorted, some opencl/opengl calls can return errors, some programs can crash.
But leaving GPU in faulity state and trying to restore the very same faulty
state (without success obviously) isn't a option. What happens now is
absolultely worst GPU recovery at all as it leaves system in unusable state
with GPU which can't be brought back without reboot (there is no screen output
at this point).

-- 
You are receiving this mail because:
You are watching the assignee of the bug.


More information about the dri-devel mailing list