[Intel-gfx] [PATCH 3/7] drm/i915: Keep a count of requests submitted from userspace
Chris Wilson
chris at chris-wilson.co.uk
Mon Apr 9 09:25:13 UTC 2018
Quoting Tvrtko Ursulin (2018-04-09 10:11:53)
>
> On 06/04/2018 21:17, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-04-05 13:39:19)
> >> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >>
> >> Keep a count of requests submitted from userspace and not yet runnable due
> >> unresolved dependencies.
> >>
> >> v2: Rename and move under the container struct. (Chris Wilson)
> >> v3: Rebase.
> >>
> >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >> ---
> >> drivers/gpu/drm/i915/i915_request.c | 3 +++
> >> drivers/gpu/drm/i915/intel_engine_cs.c | 3 ++-
> >> drivers/gpu/drm/i915/intel_ringbuffer.h | 8 ++++++++
> >> 3 files changed, 13 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> >> index 5c01291ad1cc..152321655fe6 100644
> >> --- a/drivers/gpu/drm/i915/i915_request.c
> >> +++ b/drivers/gpu/drm/i915/i915_request.c
> >> @@ -640,6 +640,7 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
> >> rcu_read_lock();
> >> request->engine->submit_request(request);
> >> rcu_read_unlock();
> >> + atomic_dec(&request->engine->request_stats.queued);
> >
> > But we use atomic here? Might as well use atomic for
> > request_stats.runnable here as well?
>
> I admit it can read a bit uneven.
>
> For runnable I wanted to avoid another atomic by putting it under the
> engine timeline lock.
>
> But for queued I did not want to start taking the same lock when adding
> a request.
>
> Your proposal to make runnable atomic_t and move to submit_notify would
> indeed simplify things, but at a cost of one more atomic in that path.
> Perhaps the code path is heavy enough for one new atomic to be
> completely hidden in it, and code simplification to win?
It also solidifies that we are moving from one counter to the next.
(There must be some common 64b cmpxchg for doing that!) Going from +1
locked operations to +2 here isn't the end of the world, but I can
certainly appreciated trying to keep the number down (especially for aux
information like stats).
now = atomic64_read(&stats.queued_runnable);
do {
old = now;
new_queued = upper_32_bits(old) - 1;
new_runnable = lower_32_bits(old) + 1;
now = atomic64_cmpxchg(&stats.queued_runnable,
old, (new_runnable | (u64)new_queued << 32));
} while (now != old);
Downside being that we either then use atomic64 throughout or we mix
atomic32/atomic64 knowing that we're on x86. (I feel like someone else
must have solved this problem in a much neater way, before they went to
per-cpu stats ;)
-Chris
More information about the Intel-gfx
mailing list