[Intel-gfx] [PATCH] drm/i915: Reduce context HW ID lifetime

Tue Sep 4 13:48:08 UTC 2018

Quoting Tvrtko Ursulin (2018-09-03 10:59:01)
> 
> On 31/08/2018 13:36, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-08-30 17:23:43)
> >>
> >> On 30/08/2018 11:24, Chris Wilson wrote:
> >>> +static int assign_hw_id(struct drm_i915_private *i915, unsigned int *out)
> >>> +{
> >>> +     int ret;
> >>> +
> >>> +     lockdep_assert_held(&i915->contexts.mutex);
> >>> +
> >>> +     ret = new_hw_id(i915, GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
> >>> +     if (unlikely(ret < 0)) {
> >>> +             ret = steal_hw_id(i915);
> >>> +             if (ret < 0) /* once again for the correct erro code */
> >>
> >> errno
> >>
> >>> +                     ret = new_hw_id(i915, GFP_KERNEL);
> >>
> >> Hmm.. shouldn't you try GFP_KERNEL before attempting to steal? Actually
> >> I think you should branch based on -ENOSPC (steal) vs -ENOMEM (retry
> >> with GFP_KERNEL). Which would actually mean something like:
> > 
> > I was applying the same strategy as we use elsewhere. Penalise any
> > driver cache before hitting reclaim.
> > 
> > I think that is fair from an application of soft backpressure point of
> > view. (Lack of backpressure is probably a sore point for many.)
> 
> My concern was lack of a phase which avoids hw id stealing for loads 
> with few contexts but heavy memory pressure. Sounded like a thing worth 
> "robustifying" against - you don't think so?

Do we care much at the point where we fail to direct reclaim a page for
the ida allocator?

It's a tough call, and I think erring on the side of the rest of the
system vs new requests is best overall in an enlightened self-interest
pov. I completely agree we can construct cases where giving up amounts
to priority-inversion and an unfortunate DoS of important clients, but
my gut feeling is that they typical desktop would remain more responsive
with i915 giving up first.

Thank goodness we are not RT.
-Chris