[Intel-gfx] [PATCH 01/11] drm/i915/gt: Close race between engine_park and intel_gt_retire_requests

Sat Nov 16 17:54:38 UTC 2019

Quoting Chris Wilson (2019-11-16 17:51:29)
> The general concept was that intel_timeline.active_count was locked by
> the intel_timeline.mutex. The exception was for power management, where
> the engine->kernel_context->timeline could be manipulated under the
> global wakeref.mutex.
> 
> This was quite solid, as we always manipulated the timeline only while
> we held an engine wakeref.
> 
> And then we started retiring requests outside of struct_mutex, only
> using the timelines.active_list and the timeline->mutex. There we
> started manipulating intel_timeline.active_count outside of an engine
> wakeref, and so introduced a race between __engine_park() and
> intel_gt_retire_requests(), a race that could result in the
> engine->kernel_context not being added to the active timelines and so
> losing requests, which caused us to keep the system permanently powered
> up [and unloadable].
> 
> The race would be easy to close if we could take the engine wakeref for
> the timeline before we retire -- except timelines are not bound to any
> engine and so we would need to keep all active engines awake. The
> alternative is to guard intel_timeline_enter/intel_timeline_exit for use
> outside of the timeline->mutex.

So the idea of taking a wakeref on all active engines during retire is
growing on me. That's the less frequent path and so avoids adding
another set of atomic manipulations to every request.
-Chris