[Intel-gfx] [PATCH 5/5] drm/i915: Cancel non-persistent contexts on close
Bloomfield, Jon
jon.bloomfield at intel.com
Wed Aug 7 15:29:55 UTC 2019
> -----Original Message-----
> From: Chris Wilson <chris at chris-wilson.co.uk>
> Sent: Wednesday, August 7, 2019 8:08 AM
> To: Bloomfield, Jon <jon.bloomfield at intel.com>; intel-
> gfx at lists.freedesktop.org
> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>; Winiarski, Michal
> <michal.winiarski at intel.com>
> Subject: RE: [PATCH 5/5] drm/i915: Cancel non-persistent contexts on close
>
> Quoting Bloomfield, Jon (2019-08-07 15:33:51)
> [skip to end]
> > We didn't explore the idea of terminating orphaned contexts though
> (where none of their resources are referenced by any other contexts). Is
> there a reason why this is not feasible? In the case of compute (certainly
> HPC) workloads, there would be no compositor taking the output so this
> might be a solution.
>
> Sounds easier said than done. We have to go through each request and
> determine it if has an external reference (or if the object holding the
> reference has an external reference) to see if the output would be
> visible to a third party. Sounds like a conservative GC :|
> (Coming to that conclusion suggests that we should structure the request
> tracking to make reparenting easier.)
>
> We could take a pid-1 approach and move all the orphan timelines over to
> a new parent purely responsible for them. That honestly doesn't seem to
> achieve anything. (We are still stuck with tasks on the GPU and no way
> to kill them.)
>
> In comparison, persistence is a rarely used "feature" and cleaning up on
> context close fits in nicely with the process model. It just works as
> most users/clients would expect. (Although running in non-persistent
> by default hasn't show anything to explode on the desktop, it's too easy
> to construct scenarios where persistence turns out to be an advantage,
> particularly with chains of clients (the compositor model).) Between the
> two modes, we should have most bases covered, it's hard to argue for a
> third way (that is until someone has a usecase!)
> -Chris
Ok, makes sense. Thanks.
But have we converged on a decision :-)
As I said, requiring compute umd optin should be ok for the immediate HPC issue, but I'd personally argue that it's valid to change the contract for hangcheck=0 and switch the default to non-persistent.
Jon
More information about the Intel-gfx
mailing list