[Intel-gfx] [RFC 11/14] drm/i915: Engine busy time tracking

Chris Wilson chris at chris-wilson.co.uk
Wed Jul 19 10:46:31 UTC 2017


Quoting Tvrtko Ursulin (2017-07-19 10:12:33)
> 
> On 18/07/2017 16:19, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2017-07-18 15:36:15)
> >> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >>
> >> Track total time requests have been executing on the hardware.
> >>
> >> To make this cheap it is hidden behind a static branch with the
> >> intention that it is only enabled when there is a consumer
> >> listening. This means that in the default off case the total
> >> cost of the tracking is just a few no-op instructions on the
> >> fast paths.
> > 
> >> +static inline void intel_engine_context_in(struct intel_engine_cs *engine)
> >> +{
> >> +       if (static_branch_unlikely(&i915_engine_stats_key)) {
> >> +               unsigned long flags;
> >> +
> >> +               spin_lock_irqsave(&engine->stats.lock, flags);
> > 
> > What's the purpose of this lock? RMW is ordered by virtue of the tasklet
> > (only one cpu can be doing the rmw at any time). Did I miss another
> > user?
> 
> Hm it should be a plain spin_lock and a _bh variant on the external API.
> 
> But the purpose is to allow atomic sampling of accumulated busy time 
> plus the current engine status.

You can do that with a lockless read, if you treat it as a seqlock and
the total as the latch. Something like

	u64 total, latch, start;

	total = engine->stats.total;
	do {
		latch = total;
		start = engine->stats.start;
		smp_rmb();
		total = engine->stats.total;
	} while (total != latch);
	total += ktime_now() - latch;

Assuming x86-64. If we only have 32b atomic reads, it is less fun.

> >> +               if (engine->stats.ref++ == 0)
> >> +                       engine->stats.start = ktime_get_real_ns();
> > 
> > Use ktime_get_raw() and leave the conversion to ns to the caller.
> 
> You mean both store in ktime_t and don't fetch the wall time but 
> monotonic? Do you say this because perf pmu perhaps needs monotonic or 
> for some other reason?

We only want monotonic (otherwise we will observe time going backwards), but
whether or not we want the clock source compensation? I was under the
impression we could defer that compensation until later.

> > What is the cost of a ktime nowadays?
> 
> I don't see it in the profiles, so not much I think.

readtsc is there, but not enough to worry. The effect on execution
latency is in the noise on initial inspection.
-Chris


More information about the Intel-gfx mailing list