[Intel-gfx] [PATCH 11/15] drm/i915/execlists: Cancel banned contexts on schedule-out

Mon Oct 14 12:25:58 UTC 2019

On 14/10/2019 13:06, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-10-14 13:00:01)
>>
>> On 14/10/2019 10:07, Chris Wilson wrote:
>>> On schedule-out (CS completion) of a banned context, scrub the context
>>> image so that we do not replay the active payload. The intent is that we
>>> skip banned payloads on request submission so that the timeline
>>> advancement continues on in the background. However, if we are returning
>>> to a preempted request, i915_request_skip() is ineffective and instead we
>>> need to patch up the context image so that it continues from the start
>>> of the next request.
>>>
>>> v2: Fixup cancellation so that we only scrub the payload of the active
>>> request and do not short-circuit the breadcrumbs (which might cause
>>> other contexts to execute out of order).
>>>
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/gt/intel_lrc.c    | 129 ++++++++----
>>>    drivers/gpu/drm/i915/gt/selftest_lrc.c | 273 +++++++++++++++++++++++++
>>>    2 files changed, 361 insertions(+), 41 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
>>> index e16ede75412b..b76b35194114 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
>>> @@ -234,6 +234,9 @@ static void execlists_init_reg_state(u32 *reg_state,
>>>                                     const struct intel_engine_cs *engine,
>>>                                     const struct intel_ring *ring,
>>>                                     bool close);
>>> +static void
>>> +__execlists_update_reg_state(const struct intel_context *ce,
>>> +                          const struct intel_engine_cs *engine);
>>>    
>>>    static void cancel_timer(struct timer_list *t)
>>>    {
>>> @@ -270,6 +273,31 @@ static void mark_eio(struct i915_request *rq)
>>>        i915_request_mark_complete(rq);
>>>    }
>>>    
>>> +static struct i915_request *active_request(struct i915_request *rq)
>>> +{
>>> +     const struct intel_context * const ce = rq->hw_context;
>>> +     struct i915_request *active = NULL;
>>> +     struct list_head *list;
>>> +
>>> +     if (!i915_request_is_active(rq)) /* unwound, but incomplete! */
>>> +             return rq;
>>> +
>>> +     rcu_read_lock();
>>> +     list = &rcu_dereference(rq->timeline)->requests;
>>> +     list_for_each_entry_from_reverse(rq, list, link) {
>>> +             if (i915_request_completed(rq))
>>> +                     break;
>>> +
>>> +             if (rq->hw_context != ce)
>>> +                     break;
>>> +
>>> +             active = rq;
>>> +     }
>>> +     rcu_read_unlock();
>>> +
>>> +     return active;
>>> +}
>>> +
>>>    static inline u32 intel_hws_preempt_address(struct intel_engine_cs *engine)
>>>    {
>>>        return (i915_ggtt_offset(engine->status_page.vma) +
>>> @@ -991,6 +1019,56 @@ static void kick_siblings(struct i915_request *rq, struct intel_context *ce)
>>>                tasklet_schedule(&ve->base.execlists.tasklet);
>>>    }
>>>    
>>> +static void restore_default_state(struct intel_context *ce,
>>> +                               struct intel_engine_cs *engine)
>>> +{
>>> +     u32 *regs = ce->lrc_reg_state;
>>> +
>>> +     if (engine->pinned_default_state)
>>> +             memcpy(regs, /* skip restoring the vanilla PPHWSP */
>>> +                    engine->pinned_default_state + LRC_STATE_PN * PAGE_SIZE,
>>> +                    engine->context_size - PAGE_SIZE);
>>> +
>>> +     execlists_init_reg_state(regs, ce, engine, ce->ring, false);
>>> +}
>>> +
>>> +static void cancel_active(struct i915_request *rq,
>>> +                       struct intel_engine_cs *engine)
>>> +{
>>> +     struct intel_context * const ce = rq->hw_context;
>>> +
>>> +     /*
>>> +      * The executing context has been cancelled. Fixup the context so that
>>> +      * it will be marked as incomplete [-EIO] upon resubmission and not

(read below first)

... and not misleadingly say "Fixup the context so that it will be 
marked as incomplete" because there is nothing in this function which 
does that. It mostly happens by the virtual of context already being 
marked as banned somewhere else. This comment should just explain the 
decision to rewind the ring->head for more determinism. It can still 
mention canceling of user payload and -EIO. Just needs to be clear of 
the single extra thing achieved here by the head rewind and context edit.

>>> +      * execute any user payloads. We preserve the breadcrumbs and
>>> +      * semaphores of the incomplete requests so that inter-timeline
>>> +      * dependencies (i.e other timelines) remain correctly ordered.
>>> +      *
>>> +      * See __i915_request_submit() for applying -EIO and removing the
>>> +      * payload on resubmission.
>>> +      */
>>> +     GEM_TRACE("%s(%s): { rq=%llx:%lld }\n",
>>> +               __func__, engine->name, rq->fence.context, rq->fence.seqno);
>>> +     __context_pin_acquire(ce);
>>> +
>>> +     /* On resubmission of the active request, payload will be scrubbed */
>>> +     rq = active_request(rq);
>>> +     if (rq)
>>> +             ce->ring->head = intel_ring_wrap(ce->ring, rq->head);
>>
>> Without this change, where would the head be pointing after
>> schedule_out? Somewhere in the middle of the active request? And with
>> this change it is rewound to the start of it?
> 
> RING_HEAD could be anywhere between rq->head and rq->tail. We could just
> leave it be, but it would be better if we reset it if it was past
> rq->postfix (as we rewrite those instructions). It's just simpler if we
> reset it back to the start, and scrub the request.

Okay, now I follow. I suggest going back up ^^^ and making the comment 
clear...

Regards,

Tvrtko