[Intel-gfx] xserver crash with linux 4.6.0-rc3 and later

Chris Wilson chris at chris-wilson.co.uk
Fri Apr 29 17:51:13 UTC 2016


On Fri, Apr 29, 2016 at 01:25:30PM -0400, John S Gruber wrote:
> Starting with linux 4.6.0-rc3 my Ubuntu Wily system no longer allows logons from
> due to an immediate abort in xserver after just after entering my
> userid and password. (lightdm drew the sign on screen OK).
> 
> The xserver problem seems to result from a null reference from
>  __kgem_retire_rq from package xserver-xorg-video-intel version
> 2:2.99.917+git20150808-0ubuntu4.
> 
> Bisecting the kernel I found that this was triggered by commit
> 426960bed3217f72a1b7bb94f084d79cc616ec0f. Reverting this commit based on
> 4.6-rc5 eliminated my crash.
> 
> The problem was specific to my HP Pavilion laptop with Intel HD 5500
> integrated graphics . A desktop Acer, also using Intel graphics, was
> fine. On the laptop it was completely consistent.
> 
> The laptop has:
> 
> 00:02.0 VGA compatible controller: Intel Corporation Broadwell-U
> Integrated Graphics (rev 09) (prog-if 00 [VGA controller])
>     DeviceName: Intel(R) Graphics GT2
> 
> Testing the laptop with Ubuntu xenial (with xserver-xorg-video-intel
> version 2:2.99.917+git20160325-1ubuntu1) was fine, however.
> 
> Please let me know if this is problematic, and if so, if I should provide
> additional information. I don't follow the list.
> 
> ----------------------
> 
> The triggering commit:
> 
> drm/i915: Seal busy-ioctl uABI and prevent leaking of internal ids

The seeds of that crash were already sown. The error is that on a batch
buffer allocation failure, the preallocated failsafe ended up on the
request list (which is not supposed to happen and so it runs off the end
of the list).

commit 69d8edc11173df021aa2e158b2530257113141fd
Author: Chris Wilson <chris at chris-wilson.co.uk>
Date:   Fri Aug 7 10:08:17 2015 +0100

    sna: Handle batch allocation failure
    
    Whilst we currently do not try and submit a failed batch buffer
    allocation, we still treat it as a valid request. This explodes much
    later when we inspect the NULL rq->bo.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=91577
    Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>

is the cause of the crash, but

commit 2d26643cab33a32847afaf13b50d326d09d58bf7
Author: Chris Wilson <chris at chris-wilson.co.uk>
Date:   Fri Nov 13 19:03:36 2015 +0000

    sna/dri2: Drop the reference on the fence when complete
    
    Fixes regression from
    
    commit 8d9e496670f48b4eec64dfe1bcedb49793cf3073
    Author: Chris Wilson <chris at chris-wilson.co.uk>
    Date:   Wed Jul 22 11:14:01 2015 +0100
    
        sna/dri2: Take over the placeholder vblank
    
    After noting the fence was complete, we would clear it. But I forgot
    that we actually held a reference on to it, and so we would leak the 64k
    batch, and starve the system of available memory in about 18 minutes of
    SwapBuffers.
    
    Reported-by: Arkadiusz Miskiewicz <arekm at maven.pl>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92911
    Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>

is where the bug began. The kernel just made it easier to hit the
pre-existing bugs in userspace.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


More information about the Intel-gfx mailing list