[mipsel+rs780e]Occasionally "GPU lockup" after resuming from suspend.

Thu Sep 29 02:17:13 PDT 2011

Hi,

Add more information.

We got occasionally "GPU lockup" after resuming from suspend(on mipsel
platform with a mips64 compatible CPU and rs780e, the kernel is 3.1.0-rc8
64bit).  Related kernel message:
/* return from STR */
[  156.152343] radeon 0000:01:05.0: WB enabled
[  156.187500] [drm] ring test succeeded in 0 usecs
[  156.187500] [drm] ib test succeeded in 0 usecs
[  156.398437] ata2: SATA link down (SStatus 0 SControl 300)
[  156.398437] ata3: SATA link down (SStatus 0 SControl 300)
[  156.398437] ata4: SATA link down (SStatus 0 SControl 300)
[  156.578125] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  156.597656] ata1.00: configured for UDMA/133
[  156.613281] usb 1-5: reset high speed USB device number 4 using ehci_hcd
[  157.027343] usb 3-2: reset low speed USB device number 2 using ohci_hcd
[  157.609375] usb 3-3: reset low speed USB device number 3 using ohci_hcd
[  157.683593] r8169 0000:02:00.0: eth0: link up
[  165.621093] PM: resume of devices complete after 9679.556 msecs
[  165.628906] Restarting tasks ... done.
[  177.085937] radeon 0000:01:05.0: GPU lockup CP stall for more than
10019msec
[  177.089843] ------------[ cut here ]------------
[  177.097656] WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:267
radeon_fence_wait+0x25c/0x33c()
[  177.105468] GPU lockup (waiting for 0x000013C3 last fence id 0x000013AD)
[  177.113281] Modules linked in: psmouse serio_raw
[  177.117187] Call Trace:
[  177.121093] [<ffffffff806f3e7c>] dump_stack+0x8/0x34
[  177.125000] [<ffffffff8022e4f4>] warn_slowpath_common+0x78/0xa0
[  177.132812] [<ffffffff8022e5b8>] warn_slowpath_fmt+0x38/0x44
[  177.136718] [<ffffffff80522ed8>] radeon_fence_wait+0x25c/0x33c
[  177.144531] [<ffffffff804e9e70>] ttm_bo_wait+0x108/0x220
[  177.148437] [<ffffffff8053b478>] radeon_gem_wait_idle_ioctl+0x80/0x114
[  177.156250] [<ffffffff804d2fe8>] drm_ioctl+0x2e4/0x3fc
[  177.160156] [<ffffffff805a1820>] radeon_kms_compat_ioctl+0x28/0x38
[  177.167968] [<ffffffff80311a04>] compat_sys_ioctl+0x120/0x35c
[  177.171875] [<ffffffff80211d18>] handle_sys+0x118/0x138
[  177.179687] ---[ end trace 92f63d998efe4c6d ]---
[  177.187500] radeon 0000:01:05.0: GPU softreset
[  177.191406] radeon 0000:01:05.0:   R_008010_GRBM_STATUS=0xF57C2030
[  177.195312] radeon 0000:01:05.0:   R_008014_GRBM_STATUS2=0x00111103
[  177.203125] radeon 0000:01:05.0:   R_000E50_SRBM_STATUS=0x20023040
[  177.363281] radeon 0000:01:05.0: Wait for MC idle timedout !
[  177.367187] radeon 0000:01:05.0:   R_008020_GRBM_SOFT_RESET=0x00007FEE
[  177.390625] radeon 0000:01:05.0: R_008020_GRBM_SOFT_RESET=0x00000001
[  177.414062] radeon 0000:01:05.0:   R_008010_GRBM_STATUS=0xA0003030
[  177.417968] radeon 0000:01:05.0:   R_008014_GRBM_STATUS2=0x00000003
[  177.425781] radeon 0000:01:05.0:   R_000E50_SRBM_STATUS=0x2002B040
[  177.433593] radeon 0000:01:05.0: GPU reset succeed
[  177.605468] radeon 0000:01:05.0: Wait for MC idle timedout !
[  177.761718] radeon 0000:01:05.0: Wait for MC idle timedout !
[  177.804687] radeon 0000:01:05.0: WB enabled
[  178.000000] [drm:r600_ring_test] *ERROR* radeon: ring test failed
(scratch(0x8504)=0xCAFEDEAD)
[  178.007812] [drm:r600_resume] *ERROR* r600 startup failed on resume
[  178.988281] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule
IB(5).
[  178.996093] [drm:radeon_cs_ioctl] *ERROR* Failed to schedule IB !
[  179.003906] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule
IB(6).
...

What may cause a "GPU lockup"? Why reset didn't work? Any idea?

BTW,  one question:
I got 'RADEON_IS_PCI | RADEON_IS_IGP' in rdev->flags, which causes
need_dma32 was set.
Is it correct? (drivers/char/agp is not available on mips, could that be the
reason?)

[  177.179687]在 2011年9月28日 下午3:23， <chenhc at lemote.com>写道：

> Hi Alex,
>
> When we do STR (S3) with a RS780E radeon card on MIPS platform. "GPU
> reset" may happen after resume (the possibility is about 5%). After that,
> X is unusuable.
>
> We know there is a "ring test" at system resume time and GPU reset time.
> Whether GPU reset happens, the "ring test" at system resume time is always
> successful. But the "ring test" at GPU reset time usually fails.
>
> We use the latest kernel (3.1.0-RC8 from git) and X.org is 7.6.
>
> Any ideas?
>
> Best regards,
> Huacai Chen
>
>

Regards,
- Chen Jie
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20110929/718d8ecf/attachment.htm>