[Mesa-dev] Perfetto CPU/GPU tracing

Tamminen, Eero T eero.t.tamminen at intel.com
Fri Feb 12 13:01:22 UTC 2021


On Thu, 2021-02-11 at 17:39 -0800, John Bates wrote:
> I recently opened issue 4262
> <https://gitlab.freedesktop.org/mesa/mesa/-/issues/4262> to begin the
> discussion on integrating perfetto into mesa.
> *Background*
> System-wide tracing is an invaluable tool for developers to find and
> fix
> performance problems. The perfetto project enables a combined view of
> trace
> data from kernel ftrace, GPU driver and various manually-instrumented
> tracepoints throughout the application and system.

Unlike some other Linux tracing solutions, Perfetto appears to be for
Android / Chrome(OS?), and not available from in common Linux distro

So, why Perfetto instead of one of the other solutions, e.g. from ones
mentioned here:

And, if tracing API is added to Mesa, shouldn't it support also
tracepoints for other tracing solutions?

I mean, code added to drivers themselves preferably should not have
anything perfetto/percetto specific.  Tracing system specific code
should be only in one place (even if it's just macros in common header).

> This helps developers
> quickly answer questions like:
>    - How long are frames taking?

That doesn't require any changes to Mesa.  Just set uprobe for suitable
buffer swap function [1], and parse kernel ftrace events.  This way
starting tracing doesn't require even restarting the tracked processes.

[1] glXSwapBuffers, eglSwapBuffers, eglSwapBuffersWithDamageEXT, 

[2] Many apps resolve "vkQueuePresentKHR" Vulkan API loader wrapper
function and call the backend function like "anv_QueuePresentKHR"
directly, so it's  better to track latter instead.

>    - What caused a particular frame drop?
>    - Is it CPU bound or GPU bound?

That doesn't require adding tracepoints to Mesa, just checking CPU & GPU
utilization (which is lower level thing).

>    - Did a CPU core frequency drop cause something to go slower than
> usual?

Note that nowadays actual CPU frequencies are often controlled by HW /
firmware, so you don't necessarily get any ftrace event from freq
change, you would need to poll MSR registers instead (which is
privileged operation, and polling can easily miss changes).

>    - Is something else running that is stealing CPU or GPU time? Could
> I
>    fix that with better thread/context priorities?
>    - Are all CPU cores being used effectively? Do I need
> sched_setaffinity
>    to keep my thread on a big or little core?

I don't think these to require adding tracepoints to Mesa either...

>    - What’s the latency between CPU frame submit and GPU start?

I think this would require tracepoints in kernel GPU code more than in

	- Eero

> *What Does Mesa + Perfetto Provide?*
> Mesa is in a unique position to produce GPU trace data for several GPU
> vendors without requiring the developer to build and install
> additional
> tools like gfx-pps <https://gitlab.freedesktop.org/Fahien/gfx-pps>.
> The key is making it easy for developers to use. Ideally, perfetto is
> eventually available by default in mesa so that if your system has
> perfetto
> traced running, you just need to run perfetto (perhaps along with
> setting
> an environment variable) with the mesa categories to see:
>    - GPU processing timeline events.
>    - GPU counters.
>    - CPU events for potentially slow functions in mesa like shader
> compiles.
> Example of what this data might look like (with fake GPU events):
> [image: percetto-gpu-example.png]
> *Runtime Characteristics*
>    - ~500KB additional binary size. Even with using only the basic
> features
>    of perfetto, it will increase the binary size of mesa by about
> 500KB.
>    - Background thread. Perfetto uses a background thread for
> communication
>    with the system tracing daemon (traced) to advertise trace data and
> get
>    notification of trace start/stop.
>    - Runtime overhead when disabled is designed to be optimal with one
>    predicted branch, typically a few CPU cycles
> <https://perfetto.dev/docs/instrumentation/track-events#performance>
> per
>    event. While enabled, the overhead can be around 1 us per event.
> *Integration Challenges*
>    - The perfetto SDK is C++ and designed around macros, lambdas,
> inline
>    templates, etc. There are ongoing discussions on providing an
> official
>    perfetto C API, but it is not yet clear when this will land on the
> perfetto
>    roadmap.
>    - The perfetto SDK is an amalgamated .h and .cc that adds up to
> 100K
>    lines of code.
>    - Anything that includes perfetto.h takes a long time to compile.
>    - The current Perfetto SDK design is incompatible with being a
> shared
>    library behind a C API.
> *Percetto*
> The percetto library <https://github.com/olvaffe/percetto> was
> recently
> implemented to provide an interim C API for perfetto. It provides
> efficient
> support for scoped trace events, multiple categories, counters, custom
> timestamps, and debug data annotations. Percetto also provides some
> features that are important to mesa, but not available yet with
> perfetto
> SDK:
>    - Trace events from multiple perfetto instances in separate shared
>    libraries (like mesa and virglrenderer) show correctly in a single
> process
>    and thread view.
>    - Counter tracks and macro API.
> Percetto is missing API for perfetto's GPU DataSource and counter
> support,
> but that feature could be implemented next if it is important for
> mesa.
> With the existing percetto API mesa could present GPU trace data as
> named
> 'slice' events and int64_t counters with custom timestamps as shown in
> the
> image above (based on this sample
> <https://github.com/olvaffe/percetto/blob/main/examples/timestamps.c>)
> .
> *Mesa Integration Alternatives*
> Note: we have some pressing needs for performance analysis in Chrome
> OS, so
> I'm intentionally leaving out the alternative of waiting for an
> official
> perfetto C API. Of course, once that C API is available it would
> become an
> option to migrate to it from any of the alternatives below.
> Ordered by difficulty with easiest first:
>    1. Statically link with percetto as an optional external dependency
> (virglrenderer
>    now has this approach
>    <     
> https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/480
> >
>    ).
>    - Pros: API already supports most common tracing needs. Tested and
> used
>       by an increasing number of CrOS components.
>       - Cons: External dependency for optional mesa build option.
>    2. Embed Perfetto SDK + a Percetto fork/copy.
>       - Pros: API already supports most common tracing needs. No added
>       external dependency for mesa.
>       - Cons: Percetto code divergence, bug fixes need to land in two
> trees.
>    3. Embed Perfetto SDK + custom C wrapper.
>       - Pros: Tailored API for mesa's needs.
>       - Cons: Nontrivial development efforts and maintenance.
>    4. Generate C stubs for the Perfetto protobuf and reimplement the
>    Perfetto SDK in C.
>       - Pros: Tailored API for mesa's needs. Possible smaller binary
> impact
>       from simpler implementation.
>       - Cons: Significant development efforts and maintenance.
> Regardless of the integration direction, I expect we would disable
> perfetto
> in the default build for now to minimize disruption.
> I like #1, because there are some nontrivial subtleties to the C
> wrapper
> that provide both API conveniences and runtime performance that would
> need
> to be reimplemented or maintained with the other options. I will also
> volunteer to do #1 or #2, but I'm not sure I have time for #3 or #4
> :D.
> Any other thoughts on how best to integrate perfetto into mesa?
> -jb
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

More information about the mesa-dev mailing list