[Intel-gfx] [PATCH v3] drm/i915/pmu: Reconstruct active state on starting busy-stats
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Fri Jan 12 09:40:40 UTC 2018
On 12/01/2018 09:09, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-01-12 08:16:19)
>>
>> On 11/01/2018 07:30, Chris Wilson wrote:
>>> We have a hole in our busy-stat accounting if the pmu is enabled during
>>> a long running batch, the pmu will not start accumulating busy-time
>>> until the next context switch. This then fails tests that are only
>>> sampling a single batch.
>>>
>>> v2: Count each active port just once (context in/out events are only on
>>> the first and last assigment to a port).
>>> v3: Avoid hardcoding knowlege of 2 submission ports
>>>
>>> Fixes: 30e17b7847f5 ("drm/i915: Engine busy time tracking")
>>> Testcase: igt/perf_pmu/busy-start
>>> Testcase: igt/perf_pmu/busy-double-start
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>> ---
>>> drivers/gpu/drm/i915/intel_engine_cs.c | 16 +++++++++++++++-
>>> 1 file changed, 15 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
>>> index 6bb51a502b8b..d790bdc227ff 100644
>>> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
>>> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
>>> @@ -1951,8 +1951,22 @@ int intel_enable_engine_stats(struct intel_engine_cs *engine)
>>> spin_lock_irqsave(&engine->stats.lock, flags);
>>> if (engine->stats.enabled == ~0)
>>> goto busy;
>>> - if (engine->stats.enabled++ == 0)
>>> + if (engine->stats.enabled++ == 0) {
>>> + struct intel_engine_execlists *execlists = &engine->execlists;
>>> + const struct execlist_port *port = execlists->port;
>>> + unsigned int num_ports = execlists_num_ports(execlists);
>>> +
>>> engine->stats.enabled_at = ktime_get();
>>> +
>>> + /* XXX submission method oblivious? */
>>> + while (num_ports-- && port_isset(port)) {
>>> + engine->stats.active++;
>>> + port++;
>>> + }
>>
>> Argh, engine->timeline->lock is required to safely to this. But it needs
>> to be outside the engine->stats.lock. I can't think of any problems
>> doing it at the moment. What do you think?
>
> This isn't even protected by engine->timeline->lock. Writes to the ports
> are serialised by the tasklet alone. Note that this is a pure read, so
Yes, my bad.
> it won't explode, just potentially miscalculate active and never end.
> However, if you look at the other side, update of stats.active by the
> tasklet is serialised by stats.lock, is that enough to argue with?
Nice one for early morning.. :)
port0 context complete
context_out - not enabled, no-op
stats enable - port0 busy, active = 1
port0 clear
submit
context_in - active = 2 !!! BAD
port0 set
So a problem, no? Do we need to change the ordering of port updates vs
context_in/out calls?
port0 context complete
stats enable - port0 busy, active = 1
a) context_out - enabled, active = 0
port0 clear
b) context_out - enabled, active = 0
submit
context_in - active = 1 GOOD
port0 set
From the submit side, current ordering:
port0 submit
context_in - not enabled, no-op
stats enable - port0 not busy, active = 0
port0 set
port0 context complete
context_out - active = -1 !!! BAD
Opposite ordering:
port0 submit
a) stats enable - port0 not busy, active = 0
a) port0 set
a) context_in - active = 1 OK
b) port0 set
b) stats enable - port0 busy, active = 1
b) context_in - active = 2 !!! BAD
So submit side doesn't work in either case, unless I am missing
something. Would need the pair of port manipulation and context_in to be
atomic.
Regards,
Tvrtko
More information about the Intel-gfx
mailing list