[Intel-gfx] [RFC v2 0/6] Queued/runnable/running engine stats

Mon Jan 22 18:52:54 UTC 2018

Quoting Tvrtko Ursulin (2018-01-22 18:43:52)
> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> 
> Per-engine queue depths are an interesting metric for analyzing the system load
> and also for users who wish to use it to load balance their submissions based
> on it.
> 
> In this version I have split the metrics into three separate counters:
> 
> 1. QUEUED - From execbuf time to request being runnable - runnable meaning until
>             dependencies have been resolved and fences signaled.
> 2. RUNNABLE - From runnable to running on the GPU.
> 3. RUNNING - Running on the GPU.
> 
> When inspected with perf stat the output looks roughly like this:
> 
> #           time             counts unit events
>    201.160490145               0.01      i915/rcs0-queued/
>    201.160490145              19.13      i915/rcs0-runnable/
>    201.160490145               2.39      i915/rcs0-running/
> 
> The reported numbers are average queue depths for the last query period.
> 
> Having split out metrics should be more flexible for all users, and it is still
> possible to fetch an atomic snapshot of all using the perf groups for those
> wanting to combine them.
> 
> For users wanting instantanous numbers instead of averaged, we could potentially
> expose them using the query API Lionel is working on.
> (https://patchwork.freedesktop.org/series/36622/)
> 
> For instance a query packet could look like:
> 
> #define DRM_I915_QUERY_ENGINE_QUEUES            0x04
> 
> struct drm_i915_query_engine_queues {
>         __u8 class;
>         __u8 instance
> 
>         __u8 pad[2];
> 
>         __u32 queued;
>         __u32 runnable;
>         __u32 running;
> };
> 
> I also have patches to expose this via intel-gpu-top, using the perf API.

Can you stick a ewma loadavg just after the hostname in intel-gpu-overlay,
pretty please? :)
-Chris