[Intel-gfx] [PATCH 2/5] drm/i915/perf: allow holding preemption on filtered ctx

Mon May 27 22:11:33 UTC 2019

On 24/05/2019 11:07, Chris Wilson wrote:
> Quoting Lionel Landwerlin (2019-05-24 10:51:49)
>> On 24/05/2019 10:42, Chris Wilson wrote:
>>> Quoting Lionel Landwerlin (2019-05-24 10:28:16)
>>>> On 21/05/2019 17:36, Chris Wilson wrote:
>>>>> Quoting Lionel Landwerlin (2019-05-21 15:08:52)
>>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
>>>>>> index f263a8374273..2ad95977f7a8 100644
>>>>>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
>>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
>>>>>> @@ -2085,7 +2085,7 @@ static int gen9_emit_bb_start(struct i915_request *rq,
>>>>>>            if (IS_ERR(cs))
>>>>>>                    return PTR_ERR(cs);
>>>>>>     
>>>>>> -       *cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
>>>>>> +       *cs++ = MI_ARB_ON_OFF | rq->hw_context->arb_enable;
>>>>> My prediction is that this will result in this context being reset due
>>>>> to preemption timeouts and the context under profile being banned. Note
>>>>> that preemption timeouts will be the primary means for hang detection
>>>>> for endless batches.
>>>>> -Chris
>>>>>
>>>> Another thought :
>>>>
>>>> What if we ran with the max priority?
>>>> It would be fine to have the hangcheck preempt the workload (it's pretty
>>>> short and shouldn't affect perf counters from 3d/compute pipeline much)
>>>> as long as ensure nothing else runs.
>>> It's certainly safer from the pov that we don't block preemption and so
>>> don't incur forced resets. Not keen on the system being perturbed by the
>>> act of observing it, and I still dislike the notion of permitting one
>>> client to hog the GPU so easily. Makes me think of RT throttling, and
>>> generally throwing out the absolute priority system (in exchange for
>>> computed deadlines or something).
>>> -Chris
>>>
>> I don't like it much either but I can't see how to do otherwise with the
>> hardware we currently have.
>>
>> I'm thinking of 2 priorities values one of scheduling, one once running.
> It's not quite that easy as you may start running concurrently with one
> of your dependencies and must therefore manage the priority inversion if
> you boost yourself. And I've just gone through and thrown out the
> current complexity of manipulating priority as they run because it made
> timeslicing much harder (where the priority was changing between
> evaluating the need for the context switch and the context switch
> occurring -- such mistakes can be noticed in throughput sensitive
> transcode workloads).

It's like you wrote a scheduler before!

Here is how I could see this work.

I can see the 3 different stages of a request :

   - waiting on dependencies

   - in the engine queue

   - in the HW

The request would maintain is normal/default priority until it hits the HW.

When hitting the HW for the first time, its priority is upgraded to perf 
priority so that it sticks to the HW until completition (or some other 
timeout kicks it off the HW).

Does that still sound broken?

Thanks a lot,

-Lionel

>
>> Most contexts would have both values equal.
>>
>> Could mitigate the issue a bit?
> A bit, it gives you a soft notion of a no-preempt flag without queue
> jumping. rq_prio(rq) | intel_context->effective_priority or somesuch.
> -Chris
>