[Intel-gfx] [PATCH 0/3] drm/i915: Handle hanging during nonblocking modeset correctly.

Fri Jan 27 14:31:55 UTC 2017

On Fri, Jan 27, 2017 at 03:21:29PM +0100, Daniel Vetter wrote:
> On Fri, Jan 27, 2017 at 09:30:50AM +0000, Chris Wilson wrote:
> > On Thu, Jan 26, 2017 at 04:59:21PM +0100, Maarten Lankhorst wrote:
> > > When writing some testcases for nonblocking modesets. I found out that the
> > > infinite wait on the old fb was causing issues.
> > 
> > The crux of the issue here is the locked wait for old dependencies and
> > the inability to inject the intel_prepare_reset disabling of all planes.
> > There are a couple of locked waits on struct_mutex within the modeset
> > locks for intel_overlay and if we happen to be using the display plane
> > for the first time.
> > 
> > The first I suggested solving using fences to track dependencies and
> > keep the order between atomic states. Cancelling the outstanding
> > modesets, replacing with a disable and then on restore jumping to the
> > final state look doable. It also requires avoiding the struct_mutex for
> > disabling, which is quite easy. To avoid the wait under struct_mutex,
> > we've talked about switching to mmio, but for starters we could move the
> > wait from inside intel_overlay into the fence for the atomic operation.
> > (But's that a little more surgery than we would like for intel_overlay I
> > guess - dig out Ville's patches for overlay planes?) And to prevent the
> > wait under struct_mutex for pin_to_display_plane, my plane is to move
> > that to an async fenced operation that is then naturally waited upon by
> > the atomic modeset.
> 
> A bit more a hack, but a different idea, and I think hack for gen234.0 is
> ok:
> 
> We complete all the requests before we start the hw reset with fence.error
> = -EIO. But we do this only when we need to get at the display locks. A
> slightly more elegant solution would be to trylock modeset locks, and if
> one of them fails (and only then) complete all requests with -EIO to get
> the concurrent modeset to proceed before we reset the hardware. That's
> essentially the logic we had before all the reworks, and it worked. But I
> didn't look at how scary that all would be to make it work again ...

The modeset lock may not just be waiting on our requests (even on pnv we
can expect that there are already users celebrating that pnv+nouveau
finally works ;) and that the display is not the only user/observer of
those requests. Using the requests to break the modeset lock just feels
like the wrong approach.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre