[Intel-gfx] debugging Haswell eDP black screen after S3

Ben Guthro ben at guthro.net
Tue May 28 22:43:25 CEST 2013


On Tue, May 21, 2013 at 1:28 PM, Ben Guthro <ben at guthro.net> wrote:
> On Tue, May 21, 2013 at 10:02 AM, Daniel Vetter <daniel at ffwll.ch> wrote:
>> On Tue, May 21, 2013 at 3:44 PM, Ben Guthro <ben at guthro.net> wrote:
>>>> This will break kms since now you have the vbios and the linux kms driver
>>>> fighting over the same piece of hw. Does
>>>>
>>>> xset dpms force off
>>>> xset dpms force on
>>>>
>>>> cause similar issues?
>>>
>>> No, these work as expected (on 3.8)
>>> I didn't realize that these broke with KMS. I'll stick with the S3 reproduction.
>>
>> Ok, so things are at least not terribly broken.
>>
>>>> If not please make sure that vbetool isn't badly interfering with the
>>>> kernel modeset driver on suspend/resume. At least looking at your dmesg
>>>> and reg dumps vbe wreaking havoc with the kms driver seems like a rather
>>>> likely scenario. Also, can you please test latest 3.10-rc kernels?
>>>
>>> 3.10-rc2 doesn't seem to work at all - it boots to a black screen every time.
>>
>> That otoh is ugly. Could be that though that this is the same (or a
>> similar bug) to your resume issue - in the last few kernel releases
>> we've tried very hard to unify the code between initial driver load at
>> boot-up and resume.
>
> Perhaps I should qualify "at all"
>
> It seems that it fails somewhat late in the boot process. If I remove
> the "boot splash" cli params, I can see it transition into the high
> res mode, and seemingly get into init.
> However, even if I boot to single user mode, the screen goes black.
>
> Unfortunately, both times I tried to test this, and then reboot, I
> ended up at a "grub rescue" prompt, with an unusable system.
>
>>
>> So can you please try to bisect where the boot-up regression has been
>> introduced between 3.8 and 3.10-rc2?
>
> I'm not sure I'll be able to do this.
> With the failure condition I describe above, I am unable to even ssh
> into this machine to debug, nevermind install a new kernel.
> This means I need to generate a new kernel, and install kit with that
> kernel for every bisection test.
>
> This may be more time than I am able to dedicate to this problem - but I'll try.
>
> Ben

It appears I did not CC the list on my last 2 replies.
My apologies - I'll re-paste them below.


I tried to bisect this, but was unsuccessful, in that I didn't seem to
have a reproducible test case to get back into this failure condition.
It seemed that it always would succeed for me...which of course makes
bisecting near impossible.

I tried updating to 3.10-RC3...well, actually to this changeset at the
tip of Linus' tree:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=58f8bbd2e39c3732c55698494338ee19a92c53a0

I can get X to come up now on this machine - albeit very slowly.
Once it comes up, it seems to hang, and respawn

I get a lot of these in the log now, as well:

[  392.195734] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer
elapsed... GPU hung

Things in the log that look suspicious to me are:
[   34.293452] [drm:intel_pipe_set_base] *ERROR* pin & fence failed
[   34.293486] [drm:intel_crtc_set_config] *ERROR* failed to set mode
on [CRTC:3], err = -28


I get the following errors in the X log, that prevent it from coming up:
[    76.142] (EE) intel(0): failed to set mode: No space left on device
[    76.142]
Fatal server error:
[    76.142] AddScreen/ScreenInit failed for driver 0
[    76.142]
[    76.142] (EE)



Xorg also crashes in the following manner:


[   218.876] (EE) Backtrace:
[   218.880] (EE) 0: X (xorg_backtrace+0x34) [0x7fe44fff9754]
[   218.880] (EE) 1: X (0x7fe44fe44000+0x1b96a9) [0x7fe44fffd6a9]
[   218.880] (EE) 2: /lib/x86_64-linux-gnu/libpthread.so.0
(0x7fe44f16a000+0xfcb0) [0x7fe44f179cb0]
[   218.880] (EE) 3: /lib/x86_64-linux-gnu/libc.so.6
(0x7fe44ddcf000+0x148c6b) [0x7fe44df17c6b]
[   218.880] (EE) 4: /usr/lib/xorg/modules/drivers/intel_drv.so
(0x7fe44cb5a000+0x17c36) [0x7fe44cb71c36]
[   218.880] (EE) 5: /usr/lib/xorg/modules/drivers/intel_drv.so
(0x7fe44cb5a000+0x19857) [0x7fe44cb73857]
[   218.880] (EE) 6: /usr/lib/xorg/modules/drivers/intel_drv.so
(0x7fe44cb5a000+0xed429) [0x7fe44cc47429]
[   218.880] (EE) 7: X (0x7fe44fe44000+0x13e8ac) [0x7fe44ff828ac]
[   218.880] (EE) 8: X (0x7fe44fe44000+0x5239e) [0x7fe44fe9639e]
[   218.880] (EE) 9: X (0x7fe44fe44000+0x557a1) [0x7fe44fe997a1]
[   218.880] (EE) 10: X (0x7fe44fe44000+0x4415a) [0x7fe44fe8815a]
[   218.880] (EE) 11: /lib/x86_64-linux-gnu/libc.so.6
(__libc_start_main+0xed) [0x7fe44ddf076d]
[   218.880] (EE) 12: X (0x7fe44fe44000+0x444b1) [0x7fe44fe884b1]
[   218.880] (EE)
[   218.880] (EE) Bus error at address 0x7fe44a6c9080
[   218.880]
Fatal server error:
[   218.881] Caught signal 7 (Bus error). Server aborting
[   218.881]
[   218.881] (EE)


I recognize that this isn't terribly helpful without the symbol
resolution. I tried installing debug symbols, but they didn't seem to
help.

Ben



More information about the Intel-gfx mailing list