[Intel-gfx] [RFC 11/14] drm/i915: Engine busy time tracking
Chris Wilson
chris at chris-wilson.co.uk
Wed Jul 19 10:46:31 UTC 2017
Quoting Tvrtko Ursulin (2017-07-19 10:12:33)
>
> On 18/07/2017 16:19, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2017-07-18 15:36:15)
> >> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >>
> >> Track total time requests have been executing on the hardware.
> >>
> >> To make this cheap it is hidden behind a static branch with the
> >> intention that it is only enabled when there is a consumer
> >> listening. This means that in the default off case the total
> >> cost of the tracking is just a few no-op instructions on the
> >> fast paths.
> >
> >> +static inline void intel_engine_context_in(struct intel_engine_cs *engine)
> >> +{
> >> + if (static_branch_unlikely(&i915_engine_stats_key)) {
> >> + unsigned long flags;
> >> +
> >> + spin_lock_irqsave(&engine->stats.lock, flags);
> >
> > What's the purpose of this lock? RMW is ordered by virtue of the tasklet
> > (only one cpu can be doing the rmw at any time). Did I miss another
> > user?
>
> Hm it should be a plain spin_lock and a _bh variant on the external API.
>
> But the purpose is to allow atomic sampling of accumulated busy time
> plus the current engine status.
You can do that with a lockless read, if you treat it as a seqlock and
the total as the latch. Something like
u64 total, latch, start;
total = engine->stats.total;
do {
latch = total;
start = engine->stats.start;
smp_rmb();
total = engine->stats.total;
} while (total != latch);
total += ktime_now() - latch;
Assuming x86-64. If we only have 32b atomic reads, it is less fun.
> >> + if (engine->stats.ref++ == 0)
> >> + engine->stats.start = ktime_get_real_ns();
> >
> > Use ktime_get_raw() and leave the conversion to ns to the caller.
>
> You mean both store in ktime_t and don't fetch the wall time but
> monotonic? Do you say this because perf pmu perhaps needs monotonic or
> for some other reason?
We only want monotonic (otherwise we will observe time going backwards), but
whether or not we want the clock source compensation? I was under the
impression we could defer that compensation until later.
> > What is the cost of a ktime nowadays?
>
> I don't see it in the profiles, so not much I think.
readtsc is there, but not enough to worry. The effect on execution
latency is in the noise on initial inspection.
-Chris
More information about the Intel-gfx
mailing list