Debugging Thinkpad T430s occasional suspend failure.

Linus Torvalds torvalds at linux-foundation.org
Sat Feb 16 15:02:11 PST 2013


On Sat, Feb 16, 2013 at 1:45 PM, Hugh Dickins <hughd at google.com> wrote:
>
> I hacked around on your PM_TRACE set_magic_time() / read_magic_time()
> yesterday, to save an oopsing core kernel ip there, instead of hashed
> pm trace info (it makes sense in this case to invert your sequence,
> putting the high order into years and the low order into minutes).

That sounds like a good idea in general. The PM_TRACE() thing was done
to figure out things that locked up the PCI bus etc, but encoding the
oopses during suspend sounds like a really good idea too.

Is your patch clean enough to just be made part of the standard
PM_TRACE infrastructure, or was it something really hacky and one-off?

> Rewarded last night by reboot to Feb 21 14:45:53 2006.  Which is
> ffffffff812d60ed intel_choose_pipe_bpp_dither.isra.13+0x216/0x2d6
>
> /home/hugh/3087X/drivers/gpu/drm/i915/intel_display.c:4159
>          * enable dithering as needed, but that costs bandwidth.  So choose
>          * the minimum value that expresses the full color range of the fb but
>          * also stays within the max display bpc discovered above.
>          */
>
>         switch (fb->depth) {
> ffffffff812d60e9:       48 8b 55 c0             mov    -0x40(%rbp),%rdx
> ffffffff812d60ed:       8b 02                   mov    (%rdx),%eax
>
> (gcc chose to pass a pointer to fb->depth down to the function,
> instead of fb itself, since that is the only use of it there.)
>
> I expect that fb is NULL; but with an average of one failure to resume
> per day, and ~26 bits of info per crash, this is not a fast procedure!
>
> I notice that intel_pipe_set_base() allows for NULL fb,
> so I'm currently running with the oops-in-rtc hackery, plus
> -       switch (fb->depth) {
> +       if (WARN_ON(!fb))
> +               bpc = 8;
> +       else switch (fb->depth) {
>
> There's been a fair bit of change to intel_display.c since 3.7 (if
> my 3.7 was indeed good), mainly splitting intel_ into haswell_ versus
> ironlake_, but I've not yet spotted anything obvious; nor yet looked
> to see where fb would originate from anyway.
>
> Once I've got just a little more info out of it, I'll start another
> thread addressed principally to the drm/gpu/i915 guys.

I think it's worth it to give them a heads-up already. So I've cc'd
the main suspects here..

Daniel, Dave - any comments about a NULL fb in
intel_choose_pipe_bpp_dither() during either suspend or resume? Some
googling shows this:

    https://bugzilla.redhat.com/show_bug.cgi?id=895123

which sounds remarkably similar, and is also during a suspend attempt
(but apparently Satish got a full oops out).. Some timing race with a
worker entry?

                        Linus


More information about the dri-devel mailing list