[Intel-gfx] [PATCH 1/6] drm/i915: Limit C-states when waiting for the active request

Mon Aug 6 10:28:59 UTC 2018

On 06/08/2018 10:59, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-08-06 10:34:54)
>>
>> On 06/08/2018 09:30, Chris Wilson wrote:
>>> If we are waiting for the currently executing request, we have a good
>>> idea that it will be completed in the very near future and so want to
>>> cap the CPU_DMA_LATENCY to ensure that we wake up the client quickly.
>>
>> I cannot shake the opinion that we shouldn't be doing this. For instance
>> what if the client has been re-niced (down), or it has re-niced itself?
>> Obviously wrong to apply this for those.
> 
> Niceness only restricts its position on the scheduler runqueue, doesn't
> actually have any cpufreq implications (give or take RT heuristics).
> So I don't think we need a tsk->prio restriction.

I was thinking the client obviously doesn't care about latency or 
anything (more or less) so we would be incorrectly applying the PM QoS 
request.

>> Or when you say we have a good idea something will be completed in the
>> very near future. Say there is a 60fps workload which is sending 5ms
>> batches and waits on them. That would be 30% of time spent outside of
>> low C states for a workload which doesn't need it.
> 
> Quite frankly, they shouldn't be using wait on the current frame. For
> example, in mesa you wait for the end of the previous frame which should
> be roughly complete, and since it is a stall before computing the next,
> latency is still important.

Maybe, but I think we shouldn't go into details on how a client might 
use it, and create limitations / hidden gotchas on this level if they do 
not behave as we expect / prescribe.

But I have noticed so far you have been avoiding to comment on the idea 
of explicit flag. :)

Regards,

Tvrtko

>   
>> Also having read what the OpenCL does, where they want to apply
>> different wait optimisations for different call-sites, the idea that we
>> should instead be introducing a low-latency flag to wait ioctl sounds
>> more appropriate.
> 
> I'm not impressed by what I've heard there yet. There's also the
> dilemma with what to do with dma-fence poll().
> 
>>> +             if (!qos &&
>>> +                 i915_seqno_passed(intel_engine_get_seqno(rq->engine),
>>> +                                   wait.seqno - 1))
>>
>> I also realized that this will get incorrectly applied when there is
>> preemption. If a low-priority request gets preempted after we applied
>> the PM QoS it will persist for much longer than intended. (Until the
>> high-prio request completes and then low-prio one.) And the explicit
>> low-latency wait flag would have the same problem. We could perhaps go
>> with removing the PM QoS request if preempted. It should not be frequent
>> enough to cause issue with too much traffic on the API. But
> 
> Sure, I didn't think it was worth worrying about. We could cancel it and
> reset it on next execution.
>   
>> Another side note - quick grep shows there are a few other "seqno - 1"
>> callsites so perhaps we should add a helper for this with a more
>> self-explanatory like __i915_seqno_is_executing(engine, seqno) or something?
> 
> I briefly considered something along those lines,
> intel_engine_has_signaled(), intel_engine_has_started. I also noticed
> that I didn't kill i915_request_started even though I though we had.
> -Chris
>