[Intel-gfx] i915: severe lag after resume (was Re: i915: hotplug events gone wild)

Andrew Lutomirski luto at mit.edu
Mon Feb 8 13:25:57 CET 2010


On Thu, Feb 4, 2010 at 1:11 PM, Jesse Barnes <jbarnes at virtuousgeek.org> wrote:
> On Thu, 4 Feb 2010 11:41:48 -0500
> Andrew Lutomirski <luto at mit.edu> wrote:
>
>> On Sun, Jan 31, 2010 at 9:54 PM, Andrew Lutomirski <luto at mit.edu>
>> wrote:
>> > On Sun, Jan 31, 2010 at 8:03 PM, ykzhao <yakui.zhao at intel.com>
>> > wrote:
>> >> On Sun, 2010-01-31 at 19:49 +0800, Andrew Lutomirski wrote:
>> >>> On Sat, Jan 30, 2010 at 10:02 PM, Andrew Lutomirski
>> >>> <luto at mit.edu> wrote:
>> >>> > [I posted this bug earlier with a terrible description as
>> >>> > "resume lagginess and other problems."  Here it is again with a
>> >>> > better bug report.]
>> >>> >
>> >>> > I'm running 2.6.33-rc5 (plus some wireless-testing stuff, but
>> >>> > I've seen this problem on a variety of 2.6.33-rc? kernels).
>> >>> >  Every now and then, X starts to lag badly on my GM45 laptop.
>> >>> >  When this happens, I usually see a bunch of events in
>> >>> > udevmonitor.  Running with drm.debug=0x02 (and the patch below
>> >>> > to keep the log under control), I see tons of messages like
>> >>> > this:
>> >>>
>> >>> I triggered it again.  This time, the messages looked like
>> >>> (drm.debug=3 from a different VT to avoid all the hotplug stuff
>> >>> running off the screen, and running a different debugging hack --
>> >>> see all the way at the bottom):
>> >>>
>> >>> [ 1324.285057] [drm:i915_driver_irq_handler], hotplug event
>> >>> received, stat 0x28200000, mask 0x38000800
>> >>
>> >> >From the stat value it seems that this is related with the HDMI
>> >> >hotplug.
>> >> Will you please confirm whether th HDMI is pluged/unpluged in your
>> >> test?
>> >
>> > This is a Lenovo X200s, and it doesn't have HDMI.  I have LVDS on
>> > and everything else (i.e. VGA and the docking station, which has a
>> > DP port) disconnected.
>> >
>>
>> I don't think this is a hotplug bug.  I don't remember seeing it back
>> in early January (i.e. before by laptop died and got its motherboard
>> replaced).  The bug is present in 2.6.32.7 and in 2.6.33-rc6.
>>
>> I think it's a bad interaction between some kind of idle code and
>> suspend/resume.  I can't trigger it without suspending and resuming at
>> least once after reboot, and I can't make it go away completely once
>> it starts triggering.
>>
>> Once the bug starts, it seems to manifest in one of two forms.
>>
>> Bug form 1 (the bad one): X lags so badly that I can hardly do
>> anything.  The mouse seems to update only twice a second or so.
>> Compositing gets so slow that I can't use any programs.  I can switch
>> VTs and use the console, but switching back to X doesn't fix it.
>> udevmonitor shows a huge flood of events (4/second, maybe).  Once this
>> starts, it keeps happening for quite awhile or until I kill X.
>> Killing X seems to switch me to bug form 2.
>>
>> Bug form 2 (the less bad one): When X is idle, my mouse seems to skip
>> once or twice a second.  When X is not idle (e.g. I'm dragging a
>> window), everything is fine.  intel_gpu_top seems to suppress the bug
>> and shows nothing useful.  udevmonitor shows a slow stream of hotplug
>> events.
>>
>> In either case, clearing the high bits of PORT_HOTPLUG_EN using
>> intel_reg_write (i.e. writing 0x320 to 0x61110) stops the hotplug
>> events but *does not* fix the lag.  (It is more reliable in stopping
>> the hotplug events if I patch the dp detect code to not change the
>> high bits back.)
>>
>> These problems seem to start one minute or so after resuming.  They're
>> bad enough that suspend/resume is almost unusable.
>>
>> Userspace is F12.
>>
>> If it helps at all, I started noticing this bug at the same time that
>> I noticed that writing 1 to reset in sysfs breaks graphics.  (It used
>> to work.)
>
> So if you use powersave=0 you don't see the lag?  You could try
> increasing the idle timer timeout; it's 1000ms now, you could make it
> 5000ms or so, but I don't think we do anything when transitioning
> to/from idle that would take long enough to cause huge lag...
>

I tested a bunch more combinations.

powersave=0 does not prevent the lag.

The lag is present on 2.6.32.1, 2.6.32.7, and 2.6.33-rc6.  It seems
easier to trigger on 2.6.33-rc6 and 2.6.32.7, but I'm not sure exactly
what triggers it in the first place.

I triggered it once without suspending and resuming.

I've watch 'top' once, and Xorg's CPU usage stayed at ~50% (i.e. one
core), but if I dragged a window slowly, the lag stopped and CPU went
*down*.  intel_gpu_top shows nothing that appears interesting.

Manually clearing the low bit of PWRCTXA did not seem to stop the lag
once it started (I figured that would be worth a try).

There is never any lag on a framebuffer console, even if I do see
hotplug events.

Any more ideas?  This is making my laptop rather difficult to use.

--Andy



More information about the Intel-gfx mailing list