[Bugme-new] [Bug 16488] New: [i915] Framebuffer ID error after suspend/hibernate leading to X crash

Linus Torvalds torvalds at linux-foundation.org
Tue Aug 3 08:47:35 PDT 2010


On Tue, Aug 3, 2010 at 12:25 AM, Chris Wilson <chris at chris-wilson.co.uk> wrote:
> On Mon, 2 Aug 2010 16:55:03 -0700, Andrew Morton <akpm at linux-foundation.org> wrote:
>>
>> (switched to email.  Please respond via emailed reply-to-all, not via the
>> bugzilla web interface).
>>
>> On Sun, 1 Aug 2010 08:55:49 GMT
>> bugzilla-daemon at bugzilla.kernel.org wrote:
>>
>> > https://bugzilla.kernel.org/show_bug.cgi?id=16488
>>
>> Innocuous-looking one-liner is said to have made Milan's X server even
>> worse than normal.
>
> We go from a random OOPS to a consistent error (and a failing userspace).
> It sounds more likely that we have uncovered a real bug, probably in
> the ddx.

I can't really imagine that that one-liner made the difference. Not
under any normal load. I suspect it just changes some allocation
pattern very subtly, and then the memory scribble (or whatever) that
really causes the bug perhaps changes.

The original oops reported in launchpad was

  BUG: unable to handle kernel NULL pointer dereference at 00000108
  IP: [<f8578b97>] intel_release_load_detect_pipe+0x27/0xb0 [i915]

and as far as I can tell, that's due to a load off a NULL crtc, here:

        struct drm_crtc_helper_funcs *crtc_funcs = crtc->helper_private;

the disassembly is

   0:	55                   	push   %ebp
   1:	89 e5                	mov    %esp,%ebp
   3:	83 ec 14             	sub    $0x14,%esp
   6:	89 5d f4             	mov    %ebx,-0xc(%ebp)
   9:	89 75 f8             	mov    %esi,-0x8(%ebp)
   c:	89 7d fc             	mov    %edi,-0x4(%ebp)
   f:	0f 1f 44 00 00       	nopl   0x0(%eax,%eax,1)
  14:	8b b0 ec 02 00 00    	mov    0x2ec(%eax),%esi   # crtc = encoder->crtc
  1a:	89 c3                	mov    %eax,%ebx
  1c:	8b 80 f4 02 00 00    	mov    0x2f4(%eax),%eax   # dev = encoder->dev
  22:	89 d7                	mov    %edx,%edi
  24:	89 45 f0             	mov    %eax,-0x10(%ebp)
  27:*	8b 8e 08 01 00 00    	mov    0x108(%esi),%ecx     <-- trapping
instruction (crtc_funcs = crtc->helper_private)
  2d:	80 bb 04 03 00 00 00 	cmpb   $0x0,0x304(%ebx)   #
intel_encoder->load_detect_temp
  34:	75 2a                	jne    0x60
  36:	0f b6 46 18          	movzbl 0x18(%esi),%eax  # crtc->enabled
  3a:	84 c0                	test   %al,%al

in case anybody cares. However, I have no idea how ctrc would be NULL
in the first place there, it comes from

        struct drm_encoder *encoder = &intel_encoder->enc;
        ...
        struct drm_crtc *crtc = encoder->crtc;

and I don't know the setup code. It _does_ strike me that the C code does:

        ...
        struct drm_crtc *crtc = encoder->crtc;
        struct drm_encoder_helper_funcs *encoder_funcs =
encoder->helper_private;
        struct drm_crtc_helper_funcs *crtc_funcs = crtc->helper_private;

        if (intel_encoder->load_detect_temp) {
                encoder->crtc = NULL;
                connector->encoder = NULL;
        ....

where I react to the fact that first we load "crtc = encoder->crtc"
and dereference that pointer (crtc->helper_private) without checking
whether it might be NULL, and then in some case we clear that field
(encoder->crtc = NULL), so clearly the whole "encoder->crtc" field
_can_ be NULL.

However, I don't see why it should only show up for some people...

                          Linus


More information about the dri-devel mailing list