[Intel-gfx] [PATCH 3/7] drm/i915: Keep a count of requests submitted from userspace
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Mon Apr 9 10:17:04 UTC 2018
On 09/04/2018 10:25, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-04-09 10:11:53)
>>
>> On 06/04/2018 21:17, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2018-04-05 13:39:19)
>>>> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>>>
>>>> Keep a count of requests submitted from userspace and not yet runnable due
>>>> unresolved dependencies.
>>>>
>>>> v2: Rename and move under the container struct. (Chris Wilson)
>>>> v3: Rebase.
>>>>
>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>>> ---
>>>> drivers/gpu/drm/i915/i915_request.c | 3 +++
>>>> drivers/gpu/drm/i915/intel_engine_cs.c | 3 ++-
>>>> drivers/gpu/drm/i915/intel_ringbuffer.h | 8 ++++++++
>>>> 3 files changed, 13 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
>>>> index 5c01291ad1cc..152321655fe6 100644
>>>> --- a/drivers/gpu/drm/i915/i915_request.c
>>>> +++ b/drivers/gpu/drm/i915/i915_request.c
>>>> @@ -640,6 +640,7 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
>>>> rcu_read_lock();
>>>> request->engine->submit_request(request);
>>>> rcu_read_unlock();
>>>> + atomic_dec(&request->engine->request_stats.queued);
>>>
>>> But we use atomic here? Might as well use atomic for
>>> request_stats.runnable here as well?
>>
>> I admit it can read a bit uneven.
>>
>> For runnable I wanted to avoid another atomic by putting it under the
>> engine timeline lock.
>>
>> But for queued I did not want to start taking the same lock when adding
>> a request.
>>
>> Your proposal to make runnable atomic_t and move to submit_notify would
>> indeed simplify things, but at a cost of one more atomic in that path.
>> Perhaps the code path is heavy enough for one new atomic to be
>> completely hidden in it, and code simplification to win?
>
> It also solidifies that we are moving from one counter to the next.
> (There must be some common 64b cmpxchg for doing that!) Going from +1
> locked operations to +2 here isn't the end of the world, but I can
> certainly appreciated trying to keep the number down (especially for aux
> information like stats).
>
> now = atomic64_read(&stats.queued_runnable);
> do {
> old = now;
> new_queued = upper_32_bits(old) - 1;
> new_runnable = lower_32_bits(old) + 1;
> now = atomic64_cmpxchg(&stats.queued_runnable,
> old, (new_runnable | (u64)new_queued << 32));
> } while (now != old);
Hm don't know, have to be careful with these retry loops. More
importantly I am not sure if it isn't an overkill.
> Downside being that we either then use atomic64 throughout or we mix
> atomic32/atomic64 knowing that we're on x86. (I feel like someone else
> must have solved this problem in a much neater way, before they went to
> per-cpu stats ;)
Is the winky implying you know who and where? :) We have three potential
solutions now, even for if the winky is suggesting something.
For me it is still a choice between what I have versus simplifying the
code paths by going another atomic_t.
Regards,
Tvrtko
More information about the Intel-gfx
mailing list