[Intel-gfx] [PATCH 5/5] drm/i915: Cancel non-persistent contexts on close

Bloomfield, Jon jon.bloomfield at intel.com
Wed Aug 7 17:12:12 UTC 2019


> -----Original Message-----
> From: Chris Wilson <chris at chris-wilson.co.uk>
> Sent: Wednesday, August 7, 2019 9:51 AM
> To: Bloomfield, Jon <jon.bloomfield at intel.com>; intel-
> gfx at lists.freedesktop.org
> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>; Winiarski, Michal
> <michal.winiarski at intel.com>
> Subject: RE: [PATCH 5/5] drm/i915: Cancel non-persistent contexts on close
> 
> Quoting Bloomfield, Jon (2019-08-07 16:29:55)
> > > -----Original Message-----
> > > From: Chris Wilson <chris at chris-wilson.co.uk>
> > > Sent: Wednesday, August 7, 2019 8:08 AM
> > > To: Bloomfield, Jon <jon.bloomfield at intel.com>; intel-
> > > gfx at lists.freedesktop.org
> > > Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>; Winiarski, Michal
> > > <michal.winiarski at intel.com>
> > > Subject: RE: [PATCH 5/5] drm/i915: Cancel non-persistent contexts on close
> > >
> > > Quoting Bloomfield, Jon (2019-08-07 15:33:51)
> > > [skip to end]
> > > > We didn't explore the idea of terminating orphaned contexts though
> > > (where none of their resources are referenced by any other contexts). Is
> > > there a reason why this is not feasible? In the case of compute (certainly
> > > HPC) workloads, there would be no compositor taking the output so this
> > > might be a solution.
> > >
> > > Sounds easier said than done. We have to go through each request and
> > > determine it if has an external reference (or if the object holding the
> > > reference has an external reference) to see if the output would be
> > > visible to a third party. Sounds like a conservative GC :|
> > > (Coming to that conclusion suggests that we should structure the request
> > > tracking to make reparenting easier.)
> > >
> > > We could take a pid-1 approach and move all the orphan timelines over to
> > > a new parent purely responsible for them. That honestly doesn't seem to
> > > achieve anything. (We are still stuck with tasks on the GPU and no way
> > > to kill them.)
> > >
> > > In comparison, persistence is a rarely used "feature" and cleaning up on
> > > context close fits in nicely with the process model. It just works as
> > > most users/clients would expect. (Although running in non-persistent
> > > by default hasn't show anything to explode on the desktop, it's too easy
> > > to construct scenarios where persistence turns out to be an advantage,
> > > particularly with chains of clients (the compositor model).) Between the
> > > two modes, we should have most bases covered, it's hard to argue for a
> > > third way (that is until someone has a usecase!)
> > > -Chris
> >
> > Ok, makes sense. Thanks.
> >
> > But have we converged on a decision :-)
> >
> > As I said, requiring compute umd optin should be ok for the immediate HPC
> issue, but I'd personally argue that it's valid to change the contract for
> hangcheck=0 and switch the default to non-persistent.
> 
> Could you tender
> 
> diff --git a/runtime/os_interface/linux/drm_neo.cpp
> b/runtime/os_interface/linux/drm_neo.cpp
> index 31deb68b..8a9af363 100644
> --- a/runtime/os_interface/linux/drm_neo.cpp
> +++ b/runtime/os_interface/linux/drm_neo.cpp
> @@ -141,11 +141,22 @@ void Drm::setLowPriorityContextParam(uint32_t
> drmContextId) {
>      UNRECOVERABLE_IF(retVal != 0);
>  }
> 
> +void setNonPersistent(uint32_t drmContextId) {
> +    drm_i915_gem_context_param gcp = {};
> +    gcp.ctx_id = drmContextId;
> +    gcp.param = 0xb; /* I915_CONTEXT_PARAM_PERSISTENCE; */
> +
> +    ioctl(DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &gcp);
> +}
> +
>  uint32_t Drm::createDrmContext() {
>      drm_i915_gem_context_create gcc = {};
>      auto retVal = ioctl(DRM_IOCTL_I915_GEM_CONTEXT_CREATE, &gcc);
>      UNRECOVERABLE_IF(retVal != 0);
> 
> +    /* enable cleanup of resources on process termination */
> +    setNonPersistent(gcc.ctx_id);
> +
>      return gcc.ctx_id;
>  }
> 
> to interested parties?
> -Chris
Yes, that's exactly what I had in mind. I think it's enough to resolve the HPC challenges.


More information about the Intel-gfx mailing list