[Intel-gfx] [PATCH v2 1/6] drm/i915: Stop tracking timeline->inflight_seqnos

Tue Apr 24 13:55:51 UTC 2018

On 24/04/2018 12:28, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-04-24 12:17:15)
>>
>> On 24/04/2018 11:40, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2018-04-24 11:14:21)
>>>>
>>>> On 23/04/2018 19:08, Chris Wilson wrote:
>>>>> -static int reserve_engine(struct intel_engine_cs *engine)
>>>>> +static int reserve_gt(struct drm_i915_private *i915)
>>>>>     {
>>>>> -     struct drm_i915_private *i915 = engine->i915;
>>>>> -     u32 active = ++engine->timeline->inflight_seqnos;
>>>>> -     u32 seqno = engine->timeline->seqno;
>>>>>         int ret;
>>>>>     
>>>>> -     /* Reservation is fine until we need to wrap around */
>>>>> -     if (unlikely(add_overflows(seqno, active))) {
>>>>> +     /*
>>>>> +      * Reservation is fine until we may need to wrap around
>>>>> +      *
>>>>> +      * By incrementing the serial for every request, we know that no
>>>>> +      * individual engine may exceed that serial (as each is reset to 0
>>>>> +      * on any wrap). This protects even the most pessimistic of migrations
>>>>> +      * of every request from all engines onto just one.
>>>>> +      */
>>>>
>>>> I didn't really figure out what was wrong with v1? Neither could handle
>>>> more than four billion of simultaneously active requests - but I thought
>>>> that should not concern us. :)
>>>
>>> It was still using the local engine->timeline.seqno as it's base. If we
>>> swapped from one at 0 to another at U32_MAX, we would overflow much
>>> later in submission; after the point of no return.
>>
>> By swapped you already refer to engine change? Ok, I can see that yes.
>> In this case global counter does prevent that.
>>
>> In the light of that, what is your current thinking with regards to
>> mixing engine classes?
> 
> That classes are a hw limitation that doesn't impact on balancing itself,
> just which engines the user is allowed to put into the same group.
> 
>> If the thinking is still to only allow within a class then per-class
>> seqno counter would be an option.
> 
> The goal of localising the seqno here was to try and reduce the locking
> requirements (or at least make it easier to reduce them in future).
> Whether it's one u32 across all engines, or one u32 across a few isn't
> enough for me to worry. The breadcrumb tracking should be happy enough
> (sorted by i915_seqno_passed rather than absolute u32) so the only
> limitation in wrapping should be gen7 HW semaphores. Hmm, with a bit of
> thought, I believe we can reduce the wrap logic to simply skip semaphore
> sync inside the danger zone. Would be worth the effort.

I was thinking about reducing the number of global seqno resets as much 
as we can in general. For instance would it be possible to keep using 
the gt.active_requests together with a new gt.max_engine_seqno? The 
latter would be the maximum last allocated seqno from the engine 
timelines. This way reset would be much less frequent if the load is 
distributed over engines (divided by num engines less frequent).

Regards,

Tvrtko