[Intel-gfx] [PATCH 0/3] drm/i915: Handle hanging during nonblocking modeset correctly.

Fri Jan 27 14:58:08 UTC 2017

On Fri, Jan 27, 2017 at 02:31:55PM +0000, Chris Wilson wrote:
> On Fri, Jan 27, 2017 at 03:21:29PM +0100, Daniel Vetter wrote:
> > On Fri, Jan 27, 2017 at 09:30:50AM +0000, Chris Wilson wrote:
> > > On Thu, Jan 26, 2017 at 04:59:21PM +0100, Maarten Lankhorst wrote:
> > > > When writing some testcases for nonblocking modesets. I found out that the
> > > > infinite wait on the old fb was causing issues.
> > > 
> > > The crux of the issue here is the locked wait for old dependencies and
> > > the inability to inject the intel_prepare_reset disabling of all planes.
> > > There are a couple of locked waits on struct_mutex within the modeset
> > > locks for intel_overlay and if we happen to be using the display plane
> > > for the first time.
> > > 
> > > The first I suggested solving using fences to track dependencies and
> > > keep the order between atomic states. Cancelling the outstanding
> > > modesets, replacing with a disable and then on restore jumping to the
> > > final state look doable. It also requires avoiding the struct_mutex for
> > > disabling, which is quite easy. To avoid the wait under struct_mutex,
> > > we've talked about switching to mmio, but for starters we could move the
> > > wait from inside intel_overlay into the fence for the atomic operation.
> > > (But's that a little more surgery than we would like for intel_overlay I
> > > guess - dig out Ville's patches for overlay planes?) And to prevent the
> > > wait under struct_mutex for pin_to_display_plane, my plane is to move
> > > that to an async fenced operation that is then naturally waited upon by
> > > the atomic modeset.
> > 
> > A bit more a hack, but a different idea, and I think hack for gen234.0 is
> > ok:
> > 
> > We complete all the requests before we start the hw reset with fence.error
> > = -EIO. But we do this only when we need to get at the display locks. A
> > slightly more elegant solution would be to trylock modeset locks, and if
> > one of them fails (and only then) complete all requests with -EIO to get
> > the concurrent modeset to proceed before we reset the hardware. That's
> > essentially the logic we had before all the reworks, and it worked. But I
> > didn't look at how scary that all would be to make it work again ...
> 
> The modeset lock may not just be waiting on our requests (even on pnv we
> can expect that there are already users celebrating that pnv+nouveau
> finally works ;) and that the display is not the only user/observer of
> those requests. Using the requests to break the modeset lock just feels
> like the wrong approach.

It's a cycle, and we need to break it somewhere. Another option might be
to break the cycle the same way we do it for gem locks: Wake up everyone
and restart the modeset ioctl. Since the trouble only happens for
synchronous modesets where we hold the locks while waiting for fences, we
can also break out of that and restart. And I also don't think that would
leak to other drivers, after all our gem locking restart dances also don't
leak to other drivers - it's just our own driver's lock which are affected
by these special wakupe semantics.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch