[Intel-gfx] [PATCH v3] drm/i915/pmu: Reconstruct active state on starting busy-stats

Fri Jan 12 09:40:40 UTC 2018

On 12/01/2018 09:09, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-01-12 08:16:19)
>>
>> On 11/01/2018 07:30, Chris Wilson wrote:
>>> We have a hole in our busy-stat accounting if the pmu is enabled during
>>> a long running batch, the pmu will not start accumulating busy-time
>>> until the next context switch. This then fails tests that are only
>>> sampling a single batch.
>>>
>>> v2: Count each active port just once (context in/out events are only on
>>> the first and last assigment to a port).
>>> v3: Avoid hardcoding knowlege of 2 submission ports
>>>
>>> Fixes: 30e17b7847f5 ("drm/i915: Engine busy time tracking")
>>> Testcase: igt/perf_pmu/busy-start
>>> Testcase: igt/perf_pmu/busy-double-start
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/intel_engine_cs.c | 16 +++++++++++++++-
>>>    1 file changed, 15 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
>>> index 6bb51a502b8b..d790bdc227ff 100644
>>> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
>>> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
>>> @@ -1951,8 +1951,22 @@ int intel_enable_engine_stats(struct intel_engine_cs *engine)
>>>        spin_lock_irqsave(&engine->stats.lock, flags);
>>>        if (engine->stats.enabled == ~0)
>>>                goto busy;
>>> -     if (engine->stats.enabled++ == 0)
>>> +     if (engine->stats.enabled++ == 0) {
>>> +             struct intel_engine_execlists *execlists = &engine->execlists;
>>> +             const struct execlist_port *port = execlists->port;
>>> +             unsigned int num_ports = execlists_num_ports(execlists);
>>> +
>>>                engine->stats.enabled_at = ktime_get();
>>> +
>>> +             /* XXX submission method oblivious? */
>>> +             while (num_ports-- && port_isset(port)) {
>>> +                     engine->stats.active++;
>>> +                     port++;
>>> +             }
>>
>> Argh, engine->timeline->lock is required to safely to this. But it needs
>> to be outside the engine->stats.lock. I can't think of any problems
>> doing it at the moment. What do you think?
> 
> This isn't even protected by engine->timeline->lock. Writes to the ports
> are serialised by the tasklet alone. Note that this is a pure read, so

Yes, my bad.

> it won't explode, just potentially miscalculate active and never end.
> However, if you look at the other side, update of stats.active by the
> tasklet is serialised by stats.lock, is that enough to argue with?

Nice one for early morning.. :)

port0 context complete
context_out - not enabled, no-op
stats enable - port0 busy, active = 1
port0 clear
submit
context_in - active = 2 !!! BAD
port0 set

So a problem, no? Do we need to change the ordering of port updates vs 
context_in/out calls?

port0 context complete
stats enable - port0 busy, active = 1
a) context_out - enabled, active = 0
port0 clear
b) context_out - enabled, active = 0
submit
context_in - active = 1 GOOD
port0 set

 From the submit side, current ordering:

port0 submit
context_in - not enabled, no-op
stats enable - port0 not busy, active = 0
port0 set
port0 context complete
context_out - active = -1 !!! BAD

Opposite ordering:

port0 submit

a) stats enable - port0 not busy, active = 0
a) port0 set
a) context_in - active = 1 OK

b) port0 set
b) stats enable - port0 busy, active = 1
b) context_in - active = 2 !!! BAD

So submit side doesn't work in either case, unless I am missing 
something. Would need the pair of port manipulation and context_in to be 
atomic.

Regards,

Tvrtko