[[REGRESSION]: hibernate/sleep regression w/ bisection] SOLVED?

Thu Jun 7 09:36:13 PDT 2012

Tejun/Jerome (and radeon devs):

I'd like to bring a suspend/resume radeon bug full circle (see:
http://thread.gmane.org/gmane.linux.kernel/1209587 for complete thread and
Tejun's excellent summary below).

The problem was triggered by new input serio driver code (commit
8ee294cd9def000 found through bisection). Don't ask me why, but that set it
off.

In a nutshell, X would intermittently lock up across suspend/resume cycles.

The issue remained at least through 3.1.5. I skipped 3.2.x altogether to
kernel 3.3.4 and confirm the problem seems to be gone.

What might have happened in the radeon codebase since 3.1.x that would have
addressed this either intentionally or as a side-effect? Maybe 721604a15b934f
or 9fc04b503df9a3?

Thanks.

~ Andy

----- Forwarded message from Tejun Heo <tj at kernel.org> -----

Date: Fri, 4 Nov 2011 09:14:31 -0700
From: Tejun Heo <tj at kernel.org>
Subject: Re: [REGRESSION]: hibernate/sleep regression w/ bisection
To: Andrew Watts <akwatts at ymail.com>
Cc: Dmitry Torokhov <dmitry.torokhov at gmail.com>, linux-kernel at vger.kernel.org,
	linux-pm at lists.linux-foundation.org, David Airlie <airlied at linux.ie>,
	dri-devel at lists.freedesktop.org

(cc'ing David Airlie and dri-devel)

Hello, the original thread can be read from

  http://thread.gmane.org/gmane.linux.kernel/1209587

Full sysrq-t output at

  http://article.gmane.org/gmane.linux.kernel/1211256

So, the problem is that after a seemingly unreated update to input
serio driver (convert to use workqueue), X seems to lock up
sporadically across suspend/resume cycles.

I went through the full sysrq-t output but couldn't spot anything
suspicious w/ anything else.  No worker is stuck and nobody is waiting
for flush to finish.

Stack trace for X follows.

> X               S f499b944  5800  1652   1651 0x00400080
>  f499b9a8 00003086 00000000 f499b944 c100d4a4 00000000 00000000 f499b958
>  00000000 f499b9a8 f5173140 d7857c56 00000057 f5173140 d8b69880 00000057
>  00000001 00000000 f499b9b4 c104dd89 000f4240 00000000 00000000 f499ba68
> Call Trace:
>  [<c1291301>] ttm_bo_wait_unreserved+0x5f/0x106
>  [<c129145f>] ttm_bo_reserve_locked+0xb7/0xe1
>  [<c1292c27>] ttm_bo_reserve+0x26/0x95
>  [<c12c3c97>] radeon_crtc_do_set_base+0xbd/0x6d2
>  [<c12c42e7>] radeon_crtc_set_base+0x1b/0x1d
>  [<c12c430d>] radeon_crtc_mode_set+0x24/0xdd7
>  [<c1279c57>] drm_crtc_helper_set_mode+0x32c/0x48b
>  [<c1279e2f>] drm_helper_resume_force_mode+0x79/0x23e
>  [<c12ace10>] radeon_gpu_reset+0x84/0x98
>  [<c12c0838>] radeon_fence_wait+0x2d1/0x311
>  [<c12c0e37>] radeon_sync_obj_wait+0xc/0xe
>  [<c12908be>] ttm_bo_wait+0xa1/0x108
>  [<c12d6e7b>] radeon_gem_wait_idle_ioctl+0x76/0xc4
>  [<c127e62e>] drm_ioctl+0x1c2/0x42c
>  [<c10e288e>] do_vfs_ioctl+0x79/0x54b
>  [<c10e2dcb>] sys_ioctl+0x6b/0x70
>  [<c1593813>] sysenter_do_call+0x12/0x22

Do you guys have any ideas what's going on?  It seems to be waiting
for bo->reserved to go zero.  Is it possible that someone there is
forgetting to properly kick a work item after resume causing the wait
to stall?

Andrew, can you please kill the X server after the hang and see
whether that brings the system back?  I think sshd should still work
and if not you can write a script to kill the X server after 30secs
after resume (and kill that script if resume succeeds).

Thank you.

-- 
tejun

----- End forwarded message -----