[Intel-gfx] [PATCH 0/3] drm/i915: Handle hanging during nonblocking modeset correctly.

Fri Jan 27 15:08:45 UTC 2017

On Fri, Jan 27, 2017 at 03:58:08PM +0100, Daniel Vetter wrote:
> On Fri, Jan 27, 2017 at 02:31:55PM +0000, Chris Wilson wrote:
> > On Fri, Jan 27, 2017 at 03:21:29PM +0100, Daniel Vetter wrote:
> > > On Fri, Jan 27, 2017 at 09:30:50AM +0000, Chris Wilson wrote:
> > > > On Thu, Jan 26, 2017 at 04:59:21PM +0100, Maarten Lankhorst wrote:
> > > > > When writing some testcases for nonblocking modesets. I found out that the
> > > > > infinite wait on the old fb was causing issues.
> > > > 
> > > > The crux of the issue here is the locked wait for old dependencies and
> > > > the inability to inject the intel_prepare_reset disabling of all planes.
> > > > There are a couple of locked waits on struct_mutex within the modeset
> > > > locks for intel_overlay and if we happen to be using the display plane
> > > > for the first time.
> > > > 
> > > > The first I suggested solving using fences to track dependencies and
> > > > keep the order between atomic states. Cancelling the outstanding
> > > > modesets, replacing with a disable and then on restore jumping to the
> > > > final state look doable. It also requires avoiding the struct_mutex for
> > > > disabling, which is quite easy. To avoid the wait under struct_mutex,
> > > > we've talked about switching to mmio, but for starters we could move the
> > > > wait from inside intel_overlay into the fence for the atomic operation.
> > > > (But's that a little more surgery than we would like for intel_overlay I
> > > > guess - dig out Ville's patches for overlay planes?) And to prevent the
> > > > wait under struct_mutex for pin_to_display_plane, my plane is to move
> > > > that to an async fenced operation that is then naturally waited upon by
> > > > the atomic modeset.
> > > 
> > > A bit more a hack, but a different idea, and I think hack for gen234.0 is
> > > ok:
> > > 
> > > We complete all the requests before we start the hw reset with fence.error
> > > = -EIO. But we do this only when we need to get at the display locks. A
> > > slightly more elegant solution would be to trylock modeset locks, and if
> > > one of them fails (and only then) complete all requests with -EIO to get
> > > the concurrent modeset to proceed before we reset the hardware. That's
> > > essentially the logic we had before all the reworks, and it worked. But I
> > > didn't look at how scary that all would be to make it work again ...
> > 
> > The modeset lock may not just be waiting on our requests (even on pnv we
> > can expect that there are already users celebrating that pnv+nouveau
> > finally works ;) and that the display is not the only user/observer of
> > those requests. Using the requests to break the modeset lock just feels
> > like the wrong approach.
> 
> It's a cycle, and we need to break it somewhere. Another option might be
> to break the cycle the same way we do it for gem locks: Wake up everyone
> and restart the modeset ioctl. Since the trouble only happens for
> synchronous modesets where we hold the locks while waiting for fences, we
> can also break out of that and restart. And I also don't think that would
> leak to other drivers, after all our gem locking restart dances also don't
> leak to other drivers - it's just our own driver's lock which are affected
> by these special wakupe semantics.

It's a queue of nonblocking modesets that we need to worry about, afaik.
Moving the wait for blocking modeset outside of modeset lock is easily
achievable (and avoiding the other waits under both the modeset + 
struct_mutex I have at least an idea for). So the challenge is how to
inject all-planes-off for gen3 and then allow the queue to continue again
afterwards.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre