[Intel-gfx] [RFC 08/12] drm/i915: Expose per-engine client busyness

Wed Mar 11 10:17:21 UTC 2020

On 10/03/2020 20:12, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-03-10 20:04:23)
>>
>> On 10/03/2020 18:32, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2020-03-09 18:31:25)
>>>> +static ssize_t
>>>> +show_client_busy(struct device *kdev, struct device_attribute *attr, char *buf)
>>>> +{
>>>> +       struct i915_engine_busy_attribute *i915_attr =
>>>> +               container_of(attr, typeof(*i915_attr), attr);
>>>> +       unsigned int class = i915_attr->engine_class;
>>>> +       struct i915_drm_client *client = i915_attr->client;
>>>> +       u64 total = atomic64_read(&client->past_runtime[class]);
>>>> +       struct list_head *list = &client->ctx_list;
>>>> +       struct i915_gem_context *ctx;
>>>> +
>>>> +       rcu_read_lock();
>>>> +       list_for_each_entry_rcu(ctx, list, client_link) {
>>>> +               total += atomic64_read(&ctx->past_runtime[class]);
>>>> +               total += pphwsp_busy_add(ctx, class);
>>>> +       }
>>>> +       rcu_read_unlock();
>>>> +
>>>> +       total *= RUNTIME_INFO(i915_attr->i915)->cs_timestamp_period_ns;
>>>
>>> Planning early retirement? In 600 years, they'll have forgotten how to
>>> email ;)
>>
>> Shruggety shrug. :) I am guessing you would prefer both internal
>> representations (sw and pphwsp runtimes) to be consistently in
>> nanoseconds? I thought why multiply at various places when once at the
>> readout time is enough.
> 
> It's fine. I was just double checking overflow, and then remembered the
> end result is 64b nanoseconds.
> 
> Keep the internal representation convenient for accumulation, and the
> conversion at the boundary.
>   
>> And I should mention again how I am not sure at the moment how to meld
>> the two stats into one more "perfect" output.
> 
> One of the things that crossed my mind was wondering if it was possible
> to throw in a pulse before reading the stats (if active etc). Usual
> dilemma with non-preemptible contexts, so probably not worth it as those
> hogs will remain hogs.
> 
> And I worry about the disparity between sw busy and hw runtime.

How about I stop tracking accumulated sw runtime and just use it for the 
active portion. So reporting back hw runtime + sw active runtime. In 
other words sw tracking only covers the portion between context_in and 
context_out. Sounds worth a try.

Regards,

Tvrtko