[Intel-gfx] [PATCH v4 5/5] drm/i915: Solve the GPU reset vs. modeset deadlocks with an rw_semaphore

Fri Jun 30 15:39:03 UTC 2017

Quoting Daniel Vetter (2017-06-30 16:30:59)
> On Fri, Jun 30, 2017 at 04:53:19PM +0300, Ville Syrjälä wrote:
> > On Fri, Jun 30, 2017 at 03:35:03PM +0200, Daniel Vetter wrote:
> > > On Thu, Jun 29, 2017 at 10:26:08PM +0300, Ville Syrjälä wrote:
> > > > On Thu, Jun 29, 2017 at 06:57:30PM +0100, Chris Wilson wrote:
> > > > > Quoting ville.syrjala at linux.intel.com (2017-06-29 15:36:42)
> > > > > > From: Ville Syrjälä <ville.syrjala at linux.intel.com>
> > > > > > 
> > > > > > Introduce an rw_semaphore to protect the display commits. All normal
> > > > > > commits use down_read() and hence can proceed in parallel, but GPU reset
> > > > > > will use down_write() making sure no other commits are in progress when
> > > > > > we have to pull the plug on the display engine on pre-g4x platforms.
> > > > > > There are no modeset/gem locks taken inside __intel_atomic_commit_tail()
> > > > > > itself, and we wait for all dependencies before the down_read(), and
> > > > > > thus there is no chance of deadlocks with this scheme.
> > > > > 
> > > > > This matches what I thought should be done (I didn't think of using
> > > > > rwsem just a mutex, nice touch). The point I got stuck on was what
> > > > > should be done after the reset? I expected another modeset to return the
> > > > > state back or otherwise the inflight would get confused?
> > > > 
> > > > I guess that can happen. For instance, if we have a crtc_enable() in flight,
> > > > and then we do a reset before it gets committed we would end up doing
> > > > crtc_enable() twice in a row without a crtc_disable in between. For page
> > > > flips and such this shouldn't be a big deal in general.
> > > 
> > > atomic commits are ordered. You have to wait for the previous ones to
> > > complete before you do a new one. If you don't do that, then all hell
> > > breaks loose.
> > 
> > What we're effectively doing here is inserting two new commits in
> > the middle of the timeline, one to disable everything, and another
> > one to re-enable everything. At the end of the the re-enable we would
> > want the hardware state should match exactly what was there before
> > the disable, hence any commits still in the timeline should work
> > correctly. That is, their old_state will match the hardware state
> > when it comes time to commit them.
> > 
> > But we can only do that properly after we start to track the committed
> > state. Without that tracking we can get into trouble wrt. the
> > hardware state not matching the old state when it comes time to
> > commit the new state.
> 
> Yeah, but my point is that this here is an extremely fancy and fragile
> (and afaics in this form, incomplete) fix for something that in the past
> was fixed much, much simpler. Why do we need this massive amount of
> complexity now? Who's asking for all this (we don't even have anyone yet
> asking for fully queued atomic commits, much less on gen4)?

It was never "fixed", it was always borked; broken by design.
-Chris