[mipsel+rs780e]Occasionally "GPU lockup" after resuming from suspend.

Tue Nov 8 06:25:07 PST 2011

2011/11/8  <chenhc at lemote.com>:
> And, I want to know something:
> 1, Does GPU use MC to access GTT?

Yes.  All GPU clients (display, 3D, etc.) go through the MC to access
memory (vram or gart).

> 2, What can cause MC timeout？

Lots of things.  Some GPU client still active, some GPU client hung or
not properly initialized.

Alex

>
>> Hi,
>>
>> Some status update.
>> 在 2011年9月29日 下午5:17，Chen Jie <chenj at lemote.com> 写道：
>>> Hi,
>>> Add more information.
>>> We got occasionally "GPU lockup" after resuming from suspend(on mipsel
>>> platform with a mips64 compatible CPU and rs780e, the kernel is
>>> 3.1.0-rc8
>>> 64bit).  Related kernel message:
>>> /* return from STR */
>>> [  156.152343] radeon 0000:01:05.0: WB enabled
>>> [  156.187500] [drm] ring test succeeded in 0 usecs
>>> [  156.187500] [drm] ib test succeeded in 0 usecs
>>> [  156.398437] ata2: SATA link down (SStatus 0 SControl 300)
>>> [  156.398437] ata3: SATA link down (SStatus 0 SControl 300)
>>> [  156.398437] ata4: SATA link down (SStatus 0 SControl 300)
>>> [  156.578125] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>>> [  156.597656] ata1.00: configured for UDMA/133
>>> [  156.613281] usb 1-5: reset high speed USB device number 4 using
>>> ehci_hcd
>>> [  157.027343] usb 3-2: reset low speed USB device number 2 using
>>> ohci_hcd
>>> [  157.609375] usb 3-3: reset low speed USB device number 3 using
>>> ohci_hcd
>>> [  157.683593] r8169 0000:02:00.0: eth0: link up
>>> [  165.621093] PM: resume of devices complete after 9679.556 msecs
>>> [  165.628906] Restarting tasks ... done.
>>> [  177.085937] radeon 0000:01:05.0: GPU lockup CP stall for more than
>>> 10019msec
>>> [  177.089843] ------------[ cut here ]------------
>>> [  177.097656] WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:267
>>> radeon_fence_wait+0x25c/0x33c()
>>> [  177.105468] GPU lockup (waiting for 0x000013C3 last fence id
>>> 0x000013AD)
>>> [  177.113281] Modules linked in: psmouse serio_raw
>>> [  177.117187] Call Trace:
>>> [  177.121093] [<ffffffff806f3e7c>] dump_stack+0x8/0x34
>>> [  177.125000] [<ffffffff8022e4f4>] warn_slowpath_common+0x78/0xa0
>>> [  177.132812] [<ffffffff8022e5b8>] warn_slowpath_fmt+0x38/0x44
>>> [  177.136718] [<ffffffff80522ed8>] radeon_fence_wait+0x25c/0x33c
>>> [  177.144531] [<ffffffff804e9e70>] ttm_bo_wait+0x108/0x220
>>> [  177.148437] [<ffffffff8053b478>]
>>> radeon_gem_wait_idle_ioctl+0x80/0x114
>>> [  177.156250] [<ffffffff804d2fe8>] drm_ioctl+0x2e4/0x3fc
>>> [  177.160156] [<ffffffff805a1820>] radeon_kms_compat_ioctl+0x28/0x38
>>> [  177.167968] [<ffffffff80311a04>] compat_sys_ioctl+0x120/0x35c
>>> [  177.171875] [<ffffffff80211d18>] handle_sys+0x118/0x138
>>> [  177.179687] ---[ end trace 92f63d998efe4c6d ]---
>>> [  177.187500] radeon 0000:01:05.0: GPU softreset
>>> [  177.191406] radeon 0000:01:05.0:   R_008010_GRBM_STATUS=0xF57C2030
>>> [  177.195312] radeon 0000:01:05.0:   R_008014_GRBM_STATUS2=0x00111103
>>> [  177.203125] radeon 0000:01:05.0:   R_000E50_SRBM_STATUS=0x20023040
>>> [  177.363281] radeon 0000:01:05.0: Wait for MC idle timedout !
>>> [  177.367187] radeon 0000:01:05.0:
>>> R_008020_GRBM_SOFT_RESET=0x00007FEE
>>> [  177.390625] radeon 0000:01:05.0: R_008020_GRBM_SOFT_RESET=0x00000001
>>> [  177.414062] radeon 0000:01:05.0:   R_008010_GRBM_STATUS=0xA0003030
>>> [  177.417968] radeon 0000:01:05.0:   R_008014_GRBM_STATUS2=0x00000003
>>> [  177.425781] radeon 0000:01:05.0:   R_000E50_SRBM_STATUS=0x2002B040
>>> [  177.433593] radeon 0000:01:05.0: GPU reset succeed
>>> [  177.605468] radeon 0000:01:05.0: Wait for MC idle timedout !
>>> [  177.761718] radeon 0000:01:05.0: Wait for MC idle timedout !
>>> [  177.804687] radeon 0000:01:05.0: WB enabled
>>> [  178.000000] [drm:r600_ring_test] *ERROR* radeon: ring test failed
>>> (scratch(0x8504)=0xCAFEDEAD)
>> After pinned ring in VRAM, it warned an ib test failure. It seems
>> something wrong with accessing through GTT.
>>
>> We dump gart table just after stopped cp, and compare gart table with
>> the dumped one just after r600_pcie_gart_enable, and don't find any
>> difference.
>>
>> Any idea?
>>
>>> [  178.007812] [drm:r600_resume] *ERROR* r600 startup failed on resume
>>> [  178.988281] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't
>>> schedule
>>> IB(5).
>>> [  178.996093] [drm:radeon_cs_ioctl] *ERROR* Failed to schedule IB !
>>> [  179.003906] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't
>>> schedule
>>> IB(6).
>>> ...
>>
>>
>>
>> Regards,
>> -- Chen Jie
>>
>
>
>