[Intel-gfx] [PATCH 3/7] drm/i915: Keep a count of requests submitted from userspace

Mon Apr 9 10:17:04 UTC 2018

On 09/04/2018 10:25, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-04-09 10:11:53)
>>
>> On 06/04/2018 21:17, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2018-04-05 13:39:19)
>>>> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>>>
>>>> Keep a count of requests submitted from userspace and not yet runnable due
>>>> unresolved dependencies.
>>>>
>>>> v2: Rename and move under the container struct. (Chris Wilson)
>>>> v3: Rebase.
>>>>
>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>>> ---
>>>>    drivers/gpu/drm/i915/i915_request.c     | 3 +++
>>>>    drivers/gpu/drm/i915/intel_engine_cs.c  | 3 ++-
>>>>    drivers/gpu/drm/i915/intel_ringbuffer.h | 8 ++++++++
>>>>    3 files changed, 13 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
>>>> index 5c01291ad1cc..152321655fe6 100644
>>>> --- a/drivers/gpu/drm/i915/i915_request.c
>>>> +++ b/drivers/gpu/drm/i915/i915_request.c
>>>> @@ -640,6 +640,7 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
>>>>                   rcu_read_lock();
>>>>                   request->engine->submit_request(request);
>>>>                   rcu_read_unlock();
>>>> +               atomic_dec(&request->engine->request_stats.queued);
>>>
>>> But we use atomic here? Might as well use atomic for
>>> request_stats.runnable here as well?
>>
>> I admit it can read a bit uneven.
>>
>> For runnable I wanted to avoid another atomic by putting it under the
>> engine timeline lock.
>>
>> But for queued I did not want to start taking the same lock when adding
>> a request.
>>
>> Your proposal to make runnable atomic_t and move to submit_notify would
>> indeed simplify things, but at a cost of one more atomic in that path.
>> Perhaps the code path is heavy enough for one new atomic to be
>> completely hidden in it, and code simplification to win?
> 
> It also solidifies that we are moving from one counter to the next.
> (There must be some common 64b cmpxchg for doing that!) Going from +1
> locked operations to +2 here isn't the end of the world, but I can
> certainly appreciated trying to keep the number down (especially for aux
> information like stats).
> 
> now = atomic64_read(&stats.queued_runnable);
> do {
> 	old = now;
> 	new_queued = upper_32_bits(old) - 1;
> 	new_runnable = lower_32_bits(old) + 1;
> 	now = atomic64_cmpxchg(&stats.queued_runnable,
> 				old, (new_runnable | (u64)new_queued << 32));
> } while (now != old);

Hm don't know, have to be careful with these retry loops. More 
importantly I am not sure if it isn't an overkill.

> Downside being that we either then use atomic64 throughout or we mix
> atomic32/atomic64 knowing that we're on x86. (I feel like someone else
> must have solved this problem in a much neater way, before they went to
> per-cpu stats ;)

Is the winky implying you know who and where? :) We have three potential 
solutions now, even for if the winky is suggesting something.

For me it is still a choice between what I have versus simplifying the 
code paths by going another atomic_t.

Regards,

Tvrtko