[Intel-gfx] [PATCH v2 1/6] drm/i915: Stop tracking timeline->inflight_seqnos

Tue Apr 24 15:58:04 UTC 2018

Quoting Tvrtko Ursulin (2018-04-24 15:48:10)
> 
> On 24/04/2018 15:04, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-04-24 14:55:51)
> >>
> >> On 24/04/2018 12:28, Chris Wilson wrote:
> >>> Quoting Tvrtko Ursulin (2018-04-24 12:17:15)
> >>>>
> >>>> On 24/04/2018 11:40, Chris Wilson wrote:
> >>>>> Quoting Tvrtko Ursulin (2018-04-24 11:14:21)
> >>>>>>
> >>>>>> On 23/04/2018 19:08, Chris Wilson wrote:
> >>>>>>> -static int reserve_engine(struct intel_engine_cs *engine)
> >>>>>>> +static int reserve_gt(struct drm_i915_private *i915)
> >>>>>>>      {
> >>>>>>> -     struct drm_i915_private *i915 = engine->i915;
> >>>>>>> -     u32 active = ++engine->timeline->inflight_seqnos;
> >>>>>>> -     u32 seqno = engine->timeline->seqno;
> >>>>>>>          int ret;
> >>>>>>>      
> >>>>>>> -     /* Reservation is fine until we need to wrap around */
> >>>>>>> -     if (unlikely(add_overflows(seqno, active))) {
> >>>>>>> +     /*
> >>>>>>> +      * Reservation is fine until we may need to wrap around
> >>>>>>> +      *
> >>>>>>> +      * By incrementing the serial for every request, we know that no
> >>>>>>> +      * individual engine may exceed that serial (as each is reset to 0
> >>>>>>> +      * on any wrap). This protects even the most pessimistic of migrations
> >>>>>>> +      * of every request from all engines onto just one.
> >>>>>>> +      */
> >>>>>>
> >>>>>> I didn't really figure out what was wrong with v1? Neither could handle
> >>>>>> more than four billion of simultaneously active requests - but I thought
> >>>>>> that should not concern us. :)
> >>>>>
> >>>>> It was still using the local engine->timeline.seqno as it's base. If we
> >>>>> swapped from one at 0 to another at U32_MAX, we would overflow much
> >>>>> later in submission; after the point of no return.
> >>>>
> >>>> By swapped you already refer to engine change? Ok, I can see that yes.
> >>>> In this case global counter does prevent that.
> >>>>
> >>>> In the light of that, what is your current thinking with regards to
> >>>> mixing engine classes?
> >>>
> >>> That classes are a hw limitation that doesn't impact on balancing itself,
> >>> just which engines the user is allowed to put into the same group.
> >>>
> >>>> If the thinking is still to only allow within a class then per-class
> >>>> seqno counter would be an option.
> >>>
> >>> The goal of localising the seqno here was to try and reduce the locking
> >>> requirements (or at least make it easier to reduce them in future).
> >>> Whether it's one u32 across all engines, or one u32 across a few isn't
> >>> enough for me to worry. The breadcrumb tracking should be happy enough
> >>> (sorted by i915_seqno_passed rather than absolute u32) so the only
> >>> limitation in wrapping should be gen7 HW semaphores. Hmm, with a bit of
> >>> thought, I believe we can reduce the wrap logic to simply skip semaphore
> >>> sync inside the danger zone. Would be worth the effort.
> >>
> >> I was thinking about reducing the number of global seqno resets as much
> >> as we can in general. For instance would it be possible to keep using
> >> the gt.active_requests together with a new gt.max_engine_seqno? The
> >> latter would be the maximum last allocated seqno from the engine
> >> timelines. This way reset would be much less frequent if the load is
> >> distributed over engines (divided by num engines less frequent).
> > 
> > I win with a divide by 0 with removing the global seqno and wrap. :-p
> > 
> > The frequency we are talking about is a short wrap (will take as long as
> > the active request takes to sync) approximately every 47 days divided by
> > N engines. The cost of contention on struct_mutex must surely outweigh
> > that during those 47/N days...
> 
> You may be right, but where did 47 days come from? :)

U32_MAX * assumed throughput of 1ms per request.
-Chris