[Intel-gfx] [REGRESSION] system hang on ILK/SNB/IVB

Lukas Wunner lukas at wunner.de
Thu Mar 31 15:23:04 UTC 2016


Hi Gabriel,

On Thu, Mar 31, 2016 at 10:42:37AM +0300, Gabriel Feceoru wrote:
> On 31.03.2016 00:35, Lukas Wunner wrote:
> >On Wed, Mar 30, 2016 at 08:20:26PM +0300, Gabriel Feceoru wrote:
> >>This commit causes a hang while running kms suspend tests
> >>(kms_pipe_crc_basic at suspend-read-crc-pipe-*) on ILK/SNB/IVB, affecting CI.
> 
> Tomi already replied, meantime I also looked at the results.
> The current regression is for ILK/SNB/IVB only (v1 seemed to affect more
> platforms).
> Unfortunately these machines were not available when v2 was tested, so this
> couldn't be detected.

I dev on an IVB machine and cannot reproduce this. Suspend works fine.

All the patch does is call async_synchronize_full()
(1) when a hotplug event arrives or
(2) when the last DRM client closes the connection.
Either of these two things seems to be happening on your test machines
when running the suspend test.

The PM core suspends and resumes individual devices asynchronously and
calls async_synchronize_full() in a couple of places. If a device's PM
callbacks also call async_synchronize_full(), the machine deadlocks.

It is unnecessary that we call async_synchronize_full(), we only need
to synchronize up to a specific cookie (which represents initialization
of the fbdev). So I've just posted a patch to replace the calls to
async_synchronize_full() with async_synchronize_cookie(). This should
make things less fragile and hopefully also solve the hangs you're seeing.

Best regards,

Lukas

> >>
> >>Probably the same problem with the one in v2, but on older HW.
> >>
> >>
> >>commit a7442b93cf32c1e1ddb721a26cd1f92302e2a222
> >>Author: Lukas Wunner <lukas at wunner.de>
> >>Date:   Wed Mar 9 12:52:53 2016 +0100
> >>
> >>     drm/i915: Fix races on fbdev
> >>
> >>     The ->lastclose callback invokes intel_fbdev_restore_mode() and has
> >>     been witnessed to run before intel_fbdev_initial_config_async()
> >>     has finished.
> >>
> >>     We might likewise receive hotplug events before we've had a chance to
> >>     fully set up the fbdev.
> >>
> >>     Fix by waiting for the asynchronous thread to finish.
> >>
> >>     v2:
> >>     An async_synchronize_full() was also added to intel_fbdev_set_suspend()
> >>     in v1 which turned out to be entirely gratuitous. It caused a deadlock
> >>     on suspend (discovered by CI, thanks to Damien Lespiau and Tomi Sarvela
> >>     for CI support) and was unnecessary since a device is never suspended
> >>     until its ->probe callback (and all asynchronous tasks it scheduled)
> >>     have finished. See dpm_prepare(), which calls wait_for_device_probe(),
> >>     which calls async_synchronize_full().
> >>
> >>     Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93580
> >>     Reported-by: Gustav Fägerlind <gustav.fagerlind at gmail.com>
> >>     Reported-by: "Li, Weinan Z" <weinan.z.li at intel.com>
> >>     Cc: Chris Wilson <chris at chris-wilson.co.uk>
> >>     Cc: stable at vger.kernel.org
> >>     Signed-off-by: Lukas Wunner <lukas at wunner.de>
> >>     Signed-off-by: Daniel Vetter <daniel.vetter at ffwll.ch>
> >>     Link: http://patchwork.freedesktop.org/patch/msgid/20160309115147.67B2B6E0D3@gabe.freedesktop.org
> >>
> >>
> >>Regards,
> >>Gabriel
> >v2 passed CI fine, save for one warning not caused by the patch:
> >https://patchwork.freedesktop.org/series/4068/
> >
> >For comparison, this was v1:
> >https://patchwork.freedesktop.org/patch/75840/
> >


More information about the Intel-gfx mailing list