[Intel-gfx] [PATCH 2/5] drm/i915/selftests: Flush the active callbacks
Chris Wilson
chris at chris-wilson.co.uk
Fri Nov 22 13:55:19 UTC 2019
Quoting Tvrtko Ursulin (2019-11-22 13:43:17)
>
> On 22/11/2019 13:09, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2019-11-22 13:01:56)
> >>
> >> On 22/11/2019 11:21, Chris Wilson wrote:
> >>> Before checking the current i915_active state for the asynchronous work
> >>> we submitted, flush any ongoing callback. This ensures that our sampling
> >>> is robust and does not sporadically fail due to bad timing as the work
> >>> is running on another cpu.
> >>>
> >>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> >>> ---
> >>> drivers/gpu/drm/i915/gt/selftest_context.c | 19 +++++++++++++------
> >>> 1 file changed, 13 insertions(+), 6 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c b/drivers/gpu/drm/i915/gt/selftest_context.c
> >>> index 3586af636304..939798338242 100644
> >>> --- a/drivers/gpu/drm/i915/gt/selftest_context.c
> >>> +++ b/drivers/gpu/drm/i915/gt/selftest_context.c
> >>> @@ -48,20 +48,25 @@ static int context_sync(struct intel_context *ce)
> >>>
> >>> mutex_lock(&tl->mutex);
> >>> do {
> >>> - struct dma_fence *fence;
> >>> + struct i915_request *rq;
> >>> long timeout;
> >>>
> >>> - fence = i915_active_fence_get(&tl->last_request);
> >>> - if (!fence)
> >>> + if (list_empty(&tl->requests))
> >>> break;
> >>>
> >>> - timeout = dma_fence_wait_timeout(fence, false, HZ / 10);
> >>> + rq = list_last_entry(&tl->requests, typeof(*rq), link);
> >>> + i915_request_get(rq);
> >>> +
> >>> + timeout = i915_request_wait(rq, 0, HZ / 10);
> >>> if (timeout < 0)
> >>> err = timeout;
> >>> else
> >>> - i915_request_retire_upto(to_request(fence));
> >>> + i915_request_retire_upto(rq);
> >>>
> >>> - dma_fence_put(fence);
> >>> + spin_lock_irq(&rq->lock);
> >>> + spin_unlock_irq(&rq->lock);
> >>
> >> Ugh.. this at least needs a comment.
> >>
> >> What is the actual race?
> >
> > We complete the sync before the interrupt handler's irq work has
> > finished executing the callback to mark the barrier as completed.
> > So when we look at whether the engine is parked or not, we see the state
> > before the request has finished processing and find it still active.
>
> You mean i915_active_fence_get returns not really last request? How can
> that be? Or the key is just intel_engine_pm_flush?
The active ingredient in this patch should be using the tl->requests
instead of tl->last_request and then using pm_flush()
Looking at what was there before:
> >>> - fence = i915_active_fence_get(&tl->last_request);
is definitely where I went "yikes, if we are skipping because
last_request is NULL, we had better make sure that the barrier callbacks
on the rq->fence.cb_list were executed" which means waiting until the
interrupt handler dropped rq->lock.
Looking at the code that is there now, retiring the known list of
requests (and then looping the engine_park addition) should be accurate.
-Chris
More information about the Intel-gfx
mailing list