[Intel-gfx] [PATCH 02/10] drm/i915: Adjust PM QoS response frequency based on GPU load.
Francisco Jerez
currojerez at riseup.net
Wed Mar 18 19:42:11 UTC 2020
Francisco Jerez <currojerez at riseup.net> writes:
> Chris Wilson <chris at chris-wilson.co.uk> writes:
>
>> Quoting Francisco Jerez (2020-03-10 21:41:55)
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
>>> index b9b3f78f1324..a5d7a80b826d 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
>>> @@ -1577,6 +1577,11 @@ static void execlists_submit_ports(struct intel_engine_cs *engine)
>>> /* we need to manually load the submit queue */
>>> if (execlists->ctrl_reg)
>>> writel(EL_CTRL_LOAD, execlists->ctrl_reg);
>>> +
>>> + if (execlists_num_ports(execlists) > 1 &&
>> pending[1] is always defined, the minimum submission is one slot, with
>> pending[1] as the sentinel NULL.
>>
>>> + execlists->pending[1] &&
>>> + !atomic_xchg(&execlists->overload, 1))
>>> + intel_gt_pm_active_begin(&engine->i915->gt);
>>
>> engine->gt
>>
>
> Applied your suggestions above locally, will probably wait to have a few
> more changes batched up before sending a v2.
>
>>> }
>>>
>>> static bool ctx_single_port_submission(const struct intel_context *ce)
>>> @@ -2213,6 +2218,12 @@ cancel_port_requests(struct intel_engine_execlists * const execlists)
>>> clear_ports(execlists->inflight, ARRAY_SIZE(execlists->inflight));
>>>
>>> WRITE_ONCE(execlists->active, execlists->inflight);
>>> +
>>> + if (atomic_xchg(&execlists->overload, 0)) {
>>> + struct intel_engine_cs *engine =
>>> + container_of(execlists, typeof(*engine), execlists);
>>> + intel_gt_pm_active_end(&engine->i915->gt);
>>> + }
>>> }
>>>
>>> static inline void
>>> @@ -2386,6 +2397,9 @@ static void process_csb(struct intel_engine_cs *engine)
>>> /* port0 completed, advanced to port1 */
>>> trace_ports(execlists, "completed", execlists->active);
>>>
>>> + if (atomic_xchg(&execlists->overload, 0))
>>> + intel_gt_pm_active_end(&engine->i915->gt);
>>
>> So this looses track if we preempt a dual-ELSP submission with a
>> single-ELSP submission (and never go back to dual).
>>
>
> Yes, good point. You're right that if a dual-ELSP submission gets
> preempted by a single-ELSP submission "overload" will remain signaled
> until the first completion interrupt arrives (e.g. from the preempting
> submission).
>
>> If you move this to the end of the loop and check
>>
>> if (!execlists->active[1] && atomic_xchg(&execlists->overload, 0))
>> intel_gt_pm_active_end(engine->gt);
>>
>> so that it covers both preemption/promotion and completion.
>>
>
> That sounds reasonable.
>
>> However, that will fluctuate quite rapidly. (And runs the risk of
>> exceeding the sentinel.)
>>
>> An alternative approach would be to couple along
>> schedule_in/schedule_out
>>
>> atomic_set(overload, -1);
>>
>> __execlists_schedule_in:
>> if (!atomic_fetch_inc(overload)
>> intel_gt_pm_active_begin(engine->gt);
>> __execlists_schedule_out:
>> if (!atomic_dec_return(overload)
>> intel_gt_pm_active_end(engine->gt);
>>
>> which would mean we are overloaded as soon as we try to submit an
>> overlapping ELSP.
>>
>
> That sounds good to me too, and AFAICT would have roughly the same
> behavior as this metric except for the preemption corner case you
> mention above. I'll try this and verify that I get approximately the
> same performance numbers.
>
This suggestion seems to lead to some minor regressions, I'm
investigating the issue. Will send a v2 as soon as I have something
along the lines of what you suggested running with equivalent
performance to v1.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 227 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/intel-gfx/attachments/20200318/243e980b/attachment.sig>
More information about the Intel-gfx
mailing list