[Intel-gfx] [PATCH 2/5] drm/i915/selftests: Flush the active callbacks
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Fri Nov 22 14:17:42 UTC 2019
On 22/11/2019 13:55, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-11-22 13:43:17)
>>
>> On 22/11/2019 13:09, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2019-11-22 13:01:56)
>>>>
>>>> On 22/11/2019 11:21, Chris Wilson wrote:
>>>>> Before checking the current i915_active state for the asynchronous work
>>>>> we submitted, flush any ongoing callback. This ensures that our sampling
>>>>> is robust and does not sporadically fail due to bad timing as the work
>>>>> is running on another cpu.
>>>>>
>>>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>>>> ---
>>>>> drivers/gpu/drm/i915/gt/selftest_context.c | 19 +++++++++++++------
>>>>> 1 file changed, 13 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c b/drivers/gpu/drm/i915/gt/selftest_context.c
>>>>> index 3586af636304..939798338242 100644
>>>>> --- a/drivers/gpu/drm/i915/gt/selftest_context.c
>>>>> +++ b/drivers/gpu/drm/i915/gt/selftest_context.c
>>>>> @@ -48,20 +48,25 @@ static int context_sync(struct intel_context *ce)
>>>>>
>>>>> mutex_lock(&tl->mutex);
>>>>> do {
>>>>> - struct dma_fence *fence;
>>>>> + struct i915_request *rq;
>>>>> long timeout;
>>>>>
>>>>> - fence = i915_active_fence_get(&tl->last_request);
>>>>> - if (!fence)
>>>>> + if (list_empty(&tl->requests))
>>>>> break;
>>>>>
>>>>> - timeout = dma_fence_wait_timeout(fence, false, HZ / 10);
>>>>> + rq = list_last_entry(&tl->requests, typeof(*rq), link);
>>>>> + i915_request_get(rq);
>>>>> +
>>>>> + timeout = i915_request_wait(rq, 0, HZ / 10);
>>>>> if (timeout < 0)
>>>>> err = timeout;
>>>>> else
>>>>> - i915_request_retire_upto(to_request(fence));
>>>>> + i915_request_retire_upto(rq);
>>>>>
>>>>> - dma_fence_put(fence);
>>>>> + spin_lock_irq(&rq->lock);
>>>>> + spin_unlock_irq(&rq->lock);
>>>>
>>>> Ugh.. this at least needs a comment.
>>>>
>>>> What is the actual race?
>>>
>>> We complete the sync before the interrupt handler's irq work has
>>> finished executing the callback to mark the barrier as completed.
>>> So when we look at whether the engine is parked or not, we see the state
>>> before the request has finished processing and find it still active.
>>
>> You mean i915_active_fence_get returns not really last request? How can
>> that be? Or the key is just intel_engine_pm_flush?
>
> The active ingredient in this patch should be using the tl->requests
> instead of tl->last_request and then using pm_flush()
>
> Looking at what was there before:
>>>>> - fence = i915_active_fence_get(&tl->last_request);
>
> is definitely where I went "yikes, if we are skipping because
> last_request is NULL, we had better make sure that the barrier callbacks
> on the rq->fence.cb_list were executed" which means waiting until the
> interrupt handler dropped rq->lock.
>
> Looking at the code that is there now, retiring the known list of
> requests (and then looping the engine_park addition) should be accurate.
Hm I hate this very intimate knowledge of operation. And I am still not
following. :( Is some path bypassing adding to tl->last_request?
I see that the selftest does two things, first has i915_active_is_idle
which needs the context to be retired.
Second bit is intel_engine_pm_is_awake which alone would be handled by
intel_engine_gt_pm_flush.
So the first i915_active_is_idle check is the problematic one I think.
Regards,
Tvrtko
More information about the Intel-gfx
mailing list