[Intel-gfx] [PATCH 2/3] drm/i915: Priority boost for locked waits

Mon Jan 23 11:56:09 UTC 2017

On 23/01/2017 11:50, Chris Wilson wrote:
> On Mon, Jan 23, 2017 at 11:41:03AM +0000, Tvrtko Ursulin wrote:
>>
>> On 23/01/2017 10:51, Chris Wilson wrote:
>>> On Mon, Jan 23, 2017 at 10:43:10AM +0000, Tvrtko Ursulin wrote:
>>>>> @@ -3285,6 +3291,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>>>>> 		ret = i915_gem_object_wait(obj,
>>>>> 					   I915_WAIT_INTERRUPTIBLE |
>>>>> 					   I915_WAIT_LOCKED |
>>>>> +					   I915_WAIT_PRIORITY |
>>>>> 					   I915_WAIT_ALL,
>>>>> 					   MAX_SCHEDULE_TIMEOUT,
>>>>> 					   NULL);
>>>>
>>>> As mentioned before, is this not a concern? Is it not letting any
>>>> userspace boost their prio to max by just calling set cache level
>>>> after execbuf?
>>>
>>> Not any more, set-cache-ioctl now does an explicit unlocked wait first
>>> before hitting this wait. Also, the likely cause is though page-flip
>>> after execbuf on a fresh bo, which is a stall we don't want.
>>
>> Ok I've missed that change.
>>
>>>>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>>>>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>>>>> @@ -2158,7 +2158,9 @@ static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
>>>>> 		return -ENOSPC;
>>>>>
>>>>> 	timeout = i915_wait_request(target,
>>>>> -				    I915_WAIT_INTERRUPTIBLE | I915_WAIT_LOCKED,
>>>>> +				    I915_WAIT_INTERRUPTIBLE |
>>>>> +				    I915_WAIT_LOCKED |
>>>>> +				    I915_WAIT_PRIORITY,
>>>>> 				    MAX_SCHEDULE_TIMEOUT);
>>>>
>>>> This one also look worrying unless I am missing something. Allowing
>>>> clients who fill the ring to promote their priority?
>>>
>>> Yes. They only boost priority for very, very old requests and more
>>> importantly these clients are now stalling the entire *system* and not
>>> just themselves anymore. So there is an implicit priority inversion
>>> through struct_mutex. The only long term solution is avoiding
>>> inter-client locks - we still may have inversion on any shared resource,
>>> most likely objects, but we can at least reduce the contention by
>>> splitting and avoid struct_mutex.
>>
>> How do you know they are stalling the entire system - haven't they
>> just filled up their ringbuff? So the target request will be one of
>> theirs.
>
> struct_mutex is our BKL. I view anything taking it with suspicion,
> because it stops other clients, pageflips, eventually everything.
>
>> Once scheduler is able to do fair timeslicing or something,
>> especially then we should not allow clients to prioritise themselves
>> by just filling their ringbuf.
>
> That is still missing the impact on the system of holding struct_mutex
> for any period of time.

True, but I don't think priority boosting is a solution for that. I 
think we should rather think about approaches where the clients who fill 
up their ringbufs wait outside the struct_mutex then.

Regards,

Tvrtko