[Intel-gfx] xserver crash with linux 4.6.0-rc3 and later
Chris Wilson
chris at chris-wilson.co.uk
Fri Apr 29 17:51:13 UTC 2016
On Fri, Apr 29, 2016 at 01:25:30PM -0400, John S Gruber wrote:
> Starting with linux 4.6.0-rc3 my Ubuntu Wily system no longer allows logons from
> due to an immediate abort in xserver after just after entering my
> userid and password. (lightdm drew the sign on screen OK).
>
> The xserver problem seems to result from a null reference from
> __kgem_retire_rq from package xserver-xorg-video-intel version
> 2:2.99.917+git20150808-0ubuntu4.
>
> Bisecting the kernel I found that this was triggered by commit
> 426960bed3217f72a1b7bb94f084d79cc616ec0f. Reverting this commit based on
> 4.6-rc5 eliminated my crash.
>
> The problem was specific to my HP Pavilion laptop with Intel HD 5500
> integrated graphics . A desktop Acer, also using Intel graphics, was
> fine. On the laptop it was completely consistent.
>
> The laptop has:
>
> 00:02.0 VGA compatible controller: Intel Corporation Broadwell-U
> Integrated Graphics (rev 09) (prog-if 00 [VGA controller])
> DeviceName: Intel(R) Graphics GT2
>
> Testing the laptop with Ubuntu xenial (with xserver-xorg-video-intel
> version 2:2.99.917+git20160325-1ubuntu1) was fine, however.
>
> Please let me know if this is problematic, and if so, if I should provide
> additional information. I don't follow the list.
>
> ----------------------
>
> The triggering commit:
>
> drm/i915: Seal busy-ioctl uABI and prevent leaking of internal ids
The seeds of that crash were already sown. The error is that on a batch
buffer allocation failure, the preallocated failsafe ended up on the
request list (which is not supposed to happen and so it runs off the end
of the list).
commit 69d8edc11173df021aa2e158b2530257113141fd
Author: Chris Wilson <chris at chris-wilson.co.uk>
Date: Fri Aug 7 10:08:17 2015 +0100
sna: Handle batch allocation failure
Whilst we currently do not try and submit a failed batch buffer
allocation, we still treat it as a valid request. This explodes much
later when we inspect the NULL rq->bo.
References: https://bugs.freedesktop.org/show_bug.cgi?id=91577
Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
is the cause of the crash, but
commit 2d26643cab33a32847afaf13b50d326d09d58bf7
Author: Chris Wilson <chris at chris-wilson.co.uk>
Date: Fri Nov 13 19:03:36 2015 +0000
sna/dri2: Drop the reference on the fence when complete
Fixes regression from
commit 8d9e496670f48b4eec64dfe1bcedb49793cf3073
Author: Chris Wilson <chris at chris-wilson.co.uk>
Date: Wed Jul 22 11:14:01 2015 +0100
sna/dri2: Take over the placeholder vblank
After noting the fence was complete, we would clear it. But I forgot
that we actually held a reference on to it, and so we would leak the 64k
batch, and starve the system of available memory in about 18 minutes of
SwapBuffers.
Reported-by: Arkadiusz Miskiewicz <arekm at maven.pl>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92911
Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
is where the bug began. The kernel just made it easier to hit the
pre-existing bugs in userspace.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
More information about the Intel-gfx
mailing list