[Intel-gfx] [PATCH] drm/i915: Clear local engine-needs-reset bit if in progress elsewhere

Tue Aug 29 09:07:18 UTC 2017

Quoting Jeff McGee (2017-08-28 21:18:44)
> On Mon, Aug 28, 2017 at 08:44:48PM +0100, Chris Wilson wrote:
> > Quoting jeff.mcgee at intel.com (2017-08-28 20:25:30)
> > > From: Jeff McGee <jeff.mcgee at intel.com>
> > > 
> > > If someone else is resetting the engine we should clear our own bit as
> > > part of skipping that engine. Otherwise we will later believe that it
> > > has not been reset successfully and then trigger full gpu reset. If the
> > > other guy's reset actually fails, he will trigger the full gpu reset.
> > 
> > The reason we did continue on to the global reset was to serialise
> > i915_handle_error() with the other thread. Not a huge issue, but a
> > reasonable property to keep -- and we definitely want a to explain why
> > only one reset at a time is important.
> > 
> > bool intel_engine_lock_reset() {
> >       if (!test_and_set_bit(I915_RESET_ENGINE + engine->id,
> >                             &engine->i915->gpu_error.flags))
> >               return true;
> > 
> >       intel_engine_wait_for_reset(engine);
> The current code doesn't wait for the other thread to finish the reset, but
> this would add that wait. 

Pardon? If we can't reset the engine, we go to the full reset which is
serialised, both with individual engine resets and other globals.

> Did you intend that as an additional change to
> the current code? I don't think it is necessary. Each thread wants to
> reset some subset of engines, so it seems the thread can safely exit as soon
> as it knows each of those engines has been reset or is being reset as part
> of another thread that got the lock first. If any of the threads fail to
> reset an engine they "own", then full gpu reset is assured.

It's unexpected for this function to return before the reset.
-Chris