[Mesa-dev] Perfetto CPU/GPU tracing

Mark Janes mark.a.janes at intel.com
Fri Feb 12 20:03:38 UTC 2021


I've recently been using GPUVis to look at trace events.  On Intel
platforms, GPUVis incorporates ftrace events from the i915 driver,
performance metrics from igt-gpu-tools, and userspace ftrace markers
that I locally hack up in Mesa.

It is very easy to compile the GPUVis UI.  Userspace instrumentation
requires a single C/C++ header.  You don't have to access an external
web service to analyze trace data (a big no-no for devs working on
preproduction hardware).

Is it possible to build and run the Perfetto UI locally?  Can it display
arbitrary trace events that are written to
/sys/kernel/tracing/trace_marker ?  Can it be extended to show i915 and
i915-perf-recorder events?

John Bates <jbates at chromium.org> writes:

> I recently opened issue 4262
> <https://gitlab.freedesktop.org/mesa/mesa/-/issues/4262> to begin the
> discussion on integrating perfetto into mesa.
>
> *Background*
>
> System-wide tracing is an invaluable tool for developers to find and fix
> performance problems. The perfetto project enables a combined view of trace
> data from kernel ftrace, GPU driver and various manually-instrumented
> tracepoints throughout the application and system. This helps developers
> quickly answer questions like:
>
>    - How long are frames taking?
>    - What caused a particular frame drop?
>    - Is it CPU bound or GPU bound?
>    - Did a CPU core frequency drop cause something to go slower than usual?
>    - Is something else running that is stealing CPU or GPU time? Could I
>    fix that with better thread/context priorities?
>    - Are all CPU cores being used effectively? Do I need sched_setaffinity
>    to keep my thread on a big or little core?
>    - What’s the latency between CPU frame submit and GPU start?
>
> *What Does Mesa + Perfetto Provide?*
>
> Mesa is in a unique position to produce GPU trace data for several GPU
> vendors without requiring the developer to build and install additional
> tools like gfx-pps <https://gitlab.freedesktop.org/Fahien/gfx-pps>.
>
> The key is making it easy for developers to use. Ideally, perfetto is
> eventually available by default in mesa so that if your system has perfetto
> traced running, you just need to run perfetto (perhaps along with setting
> an environment variable) with the mesa categories to see:
>
>    - GPU processing timeline events.
>    - GPU counters.
>    - CPU events for potentially slow functions in mesa like shader compiles.
>
> Example of what this data might look like (with fake GPU events):
> [image: percetto-gpu-example.png]
>
> *Runtime Characteristics*
>
>    - ~500KB additional binary size. Even with using only the basic features
>    of perfetto, it will increase the binary size of mesa by about 500KB.
>    - Background thread. Perfetto uses a background thread for communication
>    with the system tracing daemon (traced) to advertise trace data and get
>    notification of trace start/stop.
>    - Runtime overhead when disabled is designed to be optimal with one
>    predicted branch, typically a few CPU cycles
>    <https://perfetto.dev/docs/instrumentation/track-events#performance> per
>    event. While enabled, the overhead can be around 1 us per event.
>
> *Integration Challenges*
>
>    - The perfetto SDK is C++ and designed around macros, lambdas, inline
>    templates, etc. There are ongoing discussions on providing an official
>    perfetto C API, but it is not yet clear when this will land on the perfetto
>    roadmap.
>    - The perfetto SDK is an amalgamated .h and .cc that adds up to 100K
>    lines of code.
>    - Anything that includes perfetto.h takes a long time to compile.
>    - The current Perfetto SDK design is incompatible with being a shared
>    library behind a C API.
>
> *Percetto*
>
> The percetto library <https://github.com/olvaffe/percetto> was recently
> implemented to provide an interim C API for perfetto. It provides efficient
> support for scoped trace events, multiple categories, counters, custom
> timestamps, and debug data annotations. Percetto also provides some
> features that are important to mesa, but not available yet with perfetto
> SDK:
>
>    - Trace events from multiple perfetto instances in separate shared
>    libraries (like mesa and virglrenderer) show correctly in a single process
>    and thread view.
>    - Counter tracks and macro API.
>
> Percetto is missing API for perfetto's GPU DataSource and counter support,
> but that feature could be implemented next if it is important for mesa.
> With the existing percetto API mesa could present GPU trace data as named
> 'slice' events and int64_t counters with custom timestamps as shown in the
> image above (based on this sample
> <https://github.com/olvaffe/percetto/blob/main/examples/timestamps.c>).
>
> *Mesa Integration Alternatives*
>
> Note: we have some pressing needs for performance analysis in Chrome OS, so
> I'm intentionally leaving out the alternative of waiting for an official
> perfetto C API. Of course, once that C API is available it would become an
> option to migrate to it from any of the alternatives below.
>
> Ordered by difficulty with easiest first:
>
>    1. Statically link with percetto as an optional external dependency
> (virglrenderer
>    now has this approach
>    <https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/480>
>    ).
>    - Pros: API already supports most common tracing needs. Tested and used
>       by an increasing number of CrOS components.
>       - Cons: External dependency for optional mesa build option.
>    2. Embed Perfetto SDK + a Percetto fork/copy.
>       - Pros: API already supports most common tracing needs. No added
>       external dependency for mesa.
>       - Cons: Percetto code divergence, bug fixes need to land in two trees.
>    3. Embed Perfetto SDK + custom C wrapper.
>       - Pros: Tailored API for mesa's needs.
>       - Cons: Nontrivial development efforts and maintenance.
>    4. Generate C stubs for the Perfetto protobuf and reimplement the
>    Perfetto SDK in C.
>       - Pros: Tailored API for mesa's needs. Possible smaller binary impact
>       from simpler implementation.
>       - Cons: Significant development efforts and maintenance.
>
> Regardless of the integration direction, I expect we would disable perfetto
> in the default build for now to minimize disruption.
>
> I like #1, because there are some nontrivial subtleties to the C wrapper
> that provide both API conveniences and runtime performance that would need
> to be reimplemented or maintained with the other options. I will also
> volunteer to do #1 or #2, but I'm not sure I have time for #3 or #4 :D.
>
> Any other thoughts on how best to integrate perfetto into mesa?
>
> -jb
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


More information about the mesa-dev mailing list