[Intel-gfx] [PATCH 09/20] drm/i915/gem: Assign context id for async work

Tue Jul 14 14:01:15 UTC 2020

Quoting Tvrtko Ursulin (2020-07-13 13:22:19)
> 
> On 09/07/2020 13:07, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2020-07-09 12:59:51)
> >>
> >> On 09/07/2020 12:07, Chris Wilson wrote:
> >>> Quoting Tvrtko Ursulin (2020-07-09 12:01:29)
> >>>>
> >>>> On 08/07/2020 16:36, Chris Wilson wrote:
> >>>>> Quoting Tvrtko Ursulin (2020-07-08 15:24:20)
> >>>>>> And what is the effective behaviour you get with N contexts - emit N
> >>>>>> concurrent operations and for N + 1 block in execbuf?
> >>>>>
> >>>>> Each context defines a timeline. A task is not ready to run until the
> >>>>> task before it in its timeline is completed. So we don't block in
> >>>>> execbuf, the scheduler waits until the request is ready before putting
> >>>>> it into the HW queues -- i.e. the number chain of fences with everything
> >>>>> that entails about ensuring it runs to completion [whether successfully
> >>>>> or not, if not we then rely on the error propagation to limit the damage
> >>>>> and report it back to the user if they kept a fence around to inspect].
> >>>>
> >>>> Okay but what is the benefit of N contexts in this series, before the
> >>>> work is actually spread over ctx async width CPUs? Is there any? If not
> >>>> I would prefer this patch is delayed until the time some actual
> >>>> parallelism is ready to be added.
> >>>
> >>> We currently submit an unbounded amount of work. This patch is added
> >>> along with its user to restrict the amount of work allowed to run in
> >>> parallel, and also is used to [crudely] serialise the multiple threads
> >>> attempting to allocate space in the vm when we completely exhaust that
> >>> address space. We need at least one fence-context id for each user, this
> >>> took the opportunity to generalise that to N ids for each user.
> >>
> >> Right, this is what I asked at the beginning - restricting amount of
> >> work run in parallel - does mean there is some "blocking"/serialisation
> >> during execbuf? Or it is all async but then what is restricted?
> > 
> > It's all* async, so the number of workqueues we utilise is restricted,
> > and so limits the number of CPUs we allow the one context to spread
> > across with multiple execbufs.
> > 
> > *fsvo all.
> 
> Okay.
> 
> Related topic - have we ever thought about what happens when fence 
> context id wraps? I know it's 64-bit, and even with this patch giving 
> out num_cpus blocks, it still feels impossible that it would wrap in 
> normal use. But I wonder if malicious client could create/destroy 
> contexts to cause a wrap and then how well we handle it. I am probably 
> just underestimating today how big 64-bit is and how many ioctls that 
> would require..

I've had cold sweats. We will get silent glitches. I *don't* think we
will corrupt kernel data and oops, but we will corrupt user data.
-Chris