[Intel-gfx] [PATCH 2/3] drm/i915: Priority boost for locked waits

Mon Jan 23 11:50:09 UTC 2017

On Mon, Jan 23, 2017 at 11:41:03AM +0000, Tvrtko Ursulin wrote:
> 
> On 23/01/2017 10:51, Chris Wilson wrote:
> >On Mon, Jan 23, 2017 at 10:43:10AM +0000, Tvrtko Ursulin wrote:
> >>>@@ -3285,6 +3291,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
> >>>		ret = i915_gem_object_wait(obj,
> >>>					   I915_WAIT_INTERRUPTIBLE |
> >>>					   I915_WAIT_LOCKED |
> >>>+					   I915_WAIT_PRIORITY |
> >>>					   I915_WAIT_ALL,
> >>>					   MAX_SCHEDULE_TIMEOUT,
> >>>					   NULL);
> >>
> >>As mentioned before, is this not a concern? Is it not letting any
> >>userspace boost their prio to max by just calling set cache level
> >>after execbuf?
> >
> >Not any more, set-cache-ioctl now does an explicit unlocked wait first
> >before hitting this wait. Also, the likely cause is though page-flip
> >after execbuf on a fresh bo, which is a stall we don't want.
> 
> Ok I've missed that change.
> 
> >>>--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> >>>+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> >>>@@ -2158,7 +2158,9 @@ static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
> >>>		return -ENOSPC;
> >>>
> >>>	timeout = i915_wait_request(target,
> >>>-				    I915_WAIT_INTERRUPTIBLE | I915_WAIT_LOCKED,
> >>>+				    I915_WAIT_INTERRUPTIBLE |
> >>>+				    I915_WAIT_LOCKED |
> >>>+				    I915_WAIT_PRIORITY,
> >>>				    MAX_SCHEDULE_TIMEOUT);
> >>
> >>This one also look worrying unless I am missing something. Allowing
> >>clients who fill the ring to promote their priority?
> >
> >Yes. They only boost priority for very, very old requests and more
> >importantly these clients are now stalling the entire *system* and not
> >just themselves anymore. So there is an implicit priority inversion
> >through struct_mutex. The only long term solution is avoiding
> >inter-client locks - we still may have inversion on any shared resource,
> >most likely objects, but we can at least reduce the contention by
> >splitting and avoid struct_mutex.
> 
> How do you know they are stalling the entire system - haven't they
> just filled up their ringbuff? So the target request will be one of
> theirs.

struct_mutex is our BKL. I view anything taking it with suspicion,
because it stops other clients, pageflips, eventually everything.

> Once scheduler is able to do fair timeslicing or something,
> especially then we should not allow clients to prioritise themselves
> by just filling their ringbuf.

That is still missing the impact on the system of holding struct_mutex
for any period of time.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre