[Intel-gfx] [PATCH 1/5] drm/i915: Remove stale asserts from i915_gem_find_active_request()

Wed May 30 11:14:12 UTC 2018

Quoting Tvrtko Ursulin (2018-05-30 12:01:47)
> 
> On 30/05/2018 11:55, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-05-30 11:43:16)
> >>
> >> On 29/05/2018 14:29, Chris Wilson wrote:
> >>> Since we use i915_gem_find_active_request() from inside
> >>> intel_engine_dump() and may call that at any time, we do not guarantee
> >>> that the engine is paused nor that the signal kthreads and irq handler
> >>> are suspended, so we cannot assert that the breadcrumb doesn't advance
> >>> and that the irq hasn't happened on another CPU signaling the request we
> >>> believe to be idle.
> >>>
> >>> Fixes: f636edb214a5 ("drm/i915: Make i915_engine_info pretty printer to standalone")
> >>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> >>> Cc: Mika Kuoppala <mika.kuoppala at intel.com>
> >>> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> >>> ---
> >>>    drivers/gpu/drm/i915/i915_gem.c | 17 ++++++++---------
> >>>    1 file changed, 8 insertions(+), 9 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> >>> index 05f44ca35a06..530d6d0109b4 100644
> >>> --- a/drivers/gpu/drm/i915/i915_gem.c
> >>> +++ b/drivers/gpu/drm/i915/i915_gem.c
> >>> @@ -2972,23 +2972,22 @@ i915_gem_find_active_request(struct intel_engine_cs *engine)
> >>>        struct i915_request *request, *active = NULL;
> >>>        unsigned long flags;
> >>>    
> >>> -     /* We are called by the error capture and reset at a random
> >>> -      * point in time. In particular, note that neither is crucially
> >>> -      * ordered with an interrupt. After a hang, the GPU is dead and we
> >>> -      * assume that no more writes can happen (we waited long enough for
> >>> -      * all writes that were in transaction to be flushed) - adding an
> >>> +     /*
> >>> +      * We are called by the error capture, reset and to dump engine
> >>> +      * state at random points in time. In particular, note that neither is
> >>> +      * crucially ordered with an interrupt. After a hang, the GPU is dead
> >>> +      * and we assume that no more writes can happen (we waited long enough
> >>> +      * for all writes that were in transaction to be flushed) - adding an
> >>>         * extra delay for a recent interrupt is pointless. Hence, we do
> >>>         * not need an engine->irq_seqno_barrier() before the seqno reads.
> >>> +      * At all other times, we must assume the GPU is still running, but
> >>> +      * we only care about the snapshot of this moment.
> >>>         */
> >>>        spin_lock_irqsave(&engine->timeline.lock, flags);
> >>>        list_for_each_entry(request, &engine->timeline.requests, link) {
> >>>                if (__i915_request_completed(request, request->global_seqno))
> >>>                        continue;
> >>>    
> >>> -             GEM_BUG_ON(request->engine != engine);
> >>
> >> Removal of this one belongs to virtual engine perhaps?
> > 
> > Nah, even with virtual engine; the engine timeline list is still true. I
> > was just thinking it was too random to check here (e.g. in the middle of
> > error capture) and that our more recent addition of checking during
> > retirement was a little more rigorous (and timely).
> 
> It is a random check indeed. Commit message append: "While at it remove 
> a random assert on..."
> 
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>

Pushed this patch by itself, just so I can have a new CI baseline :)
Thanks for the review,
-Chris