[Intel-gfx] [PATCH v2] drm/i915/gem: Move obj->lut_list under its own lock

Chris Wilson chris at chris-wilson.co.uk
Tue Jun 30 19:28:32 UTC 2020


Quoting Ruhl, Michael J (2020-06-30 20:11:16)
> >-----Original Message-----
> >From: Intel-gfx <intel-gfx-bounces at lists.freedesktop.org> On Behalf Of Chris
> >Wilson
> >Sent: Monday, June 29, 2020 7:36 AM
> >To: intel-gfx at lists.freedesktop.org
> >Cc: Chris Wilson <chris at chris-wilson.co.uk>
> >Subject: [Intel-gfx] [PATCH v2] drm/i915/gem: Move obj->lut_list under its
> >own lock
> >
> >The obj->lut_list is traversed when the object is closed as the file
> >table is destroyed during process termination. As this occurs before we
> >kill any outstanding context if, due to some bug or another, the closure
> >is blocked, then we fail to shootdown any inflight operations
> >potentially leaving the GPU spinning forever. As we only need to guard
> >the list against concurrent closures and insertions, the hold is short
> >and merits being treated as a simple spinlock.
> 
> The comment:
> 
> /* Break long locks, and carefully continue on from this spot */
> 
> seems to be contrary to your "the hold is short" comment.  Is calling out
> this apparent worst case scenario (i.e. how it could happen), worth
> documenting?

It's paranoia, because list iterating can be slow and historically slow
object closure has been reported as an issue. Only a few objects will be
shared between multiple contexts, and even then you would only expect a
couple of contexts to share an object. One of the OglDrvCtx would show up
here, which convinced us to move to the per-object lists to shrink the
number of elements walked.

Even the close race igts do not cause the list to become long enough to
schedule, but it would be possible to create an object that was shared
by 64k contexts. Just not wise in practice. [However, I should make sure
an igt does hit the bookmark.]

> >@@ -104,21 +105,29 @@ void i915_gem_close_object(struct
> >drm_gem_object *gem, struct drm_file *file)
> > {
> >       struct drm_i915_gem_object *obj = to_intel_bo(gem);
> >       struct drm_i915_file_private *fpriv = file->driver_priv;
> >+      struct i915_lut_handle bookmark = {};
> >       struct i915_mmap_offset *mmo, *mn;
> >       struct i915_lut_handle *lut, *ln;
> >       LIST_HEAD(close);
> >
> >-      i915_gem_object_lock(obj);
> >+      spin_lock(&obj->lut_lock);
> >       list_for_each_entry_safe(lut, ln, &obj->lut_list, obj_link) {
> >               struct i915_gem_context *ctx = lut->ctx;
> >
> >-              if (ctx->file_priv != fpriv)
> >-                      continue;
> >+              if (ctx && ctx->file_priv == fpriv) {
> 
> Null checking ctx was not done before.  I think this changed with the possible cond_resched?

The bookmark introduces the lut->ctx == NULL to identify it as being
special, hence the need to check.
-Chris


More information about the Intel-gfx mailing list