[Mesa-dev] Perfetto CPU/GPU tracing
Mark Janes
mark.a.janes at intel.com
Fri Feb 12 20:03:38 UTC 2021
I've recently been using GPUVis to look at trace events. On Intel
platforms, GPUVis incorporates ftrace events from the i915 driver,
performance metrics from igt-gpu-tools, and userspace ftrace markers
that I locally hack up in Mesa.
It is very easy to compile the GPUVis UI. Userspace instrumentation
requires a single C/C++ header. You don't have to access an external
web service to analyze trace data (a big no-no for devs working on
preproduction hardware).
Is it possible to build and run the Perfetto UI locally? Can it display
arbitrary trace events that are written to
/sys/kernel/tracing/trace_marker ? Can it be extended to show i915 and
i915-perf-recorder events?
John Bates <jbates at chromium.org> writes:
> I recently opened issue 4262
> <https://gitlab.freedesktop.org/mesa/mesa/-/issues/4262> to begin the
> discussion on integrating perfetto into mesa.
>
> *Background*
>
> System-wide tracing is an invaluable tool for developers to find and fix
> performance problems. The perfetto project enables a combined view of trace
> data from kernel ftrace, GPU driver and various manually-instrumented
> tracepoints throughout the application and system. This helps developers
> quickly answer questions like:
>
> - How long are frames taking?
> - What caused a particular frame drop?
> - Is it CPU bound or GPU bound?
> - Did a CPU core frequency drop cause something to go slower than usual?
> - Is something else running that is stealing CPU or GPU time? Could I
> fix that with better thread/context priorities?
> - Are all CPU cores being used effectively? Do I need sched_setaffinity
> to keep my thread on a big or little core?
> - What’s the latency between CPU frame submit and GPU start?
>
> *What Does Mesa + Perfetto Provide?*
>
> Mesa is in a unique position to produce GPU trace data for several GPU
> vendors without requiring the developer to build and install additional
> tools like gfx-pps <https://gitlab.freedesktop.org/Fahien/gfx-pps>.
>
> The key is making it easy for developers to use. Ideally, perfetto is
> eventually available by default in mesa so that if your system has perfetto
> traced running, you just need to run perfetto (perhaps along with setting
> an environment variable) with the mesa categories to see:
>
> - GPU processing timeline events.
> - GPU counters.
> - CPU events for potentially slow functions in mesa like shader compiles.
>
> Example of what this data might look like (with fake GPU events):
> [image: percetto-gpu-example.png]
>
> *Runtime Characteristics*
>
> - ~500KB additional binary size. Even with using only the basic features
> of perfetto, it will increase the binary size of mesa by about 500KB.
> - Background thread. Perfetto uses a background thread for communication
> with the system tracing daemon (traced) to advertise trace data and get
> notification of trace start/stop.
> - Runtime overhead when disabled is designed to be optimal with one
> predicted branch, typically a few CPU cycles
> <https://perfetto.dev/docs/instrumentation/track-events#performance> per
> event. While enabled, the overhead can be around 1 us per event.
>
> *Integration Challenges*
>
> - The perfetto SDK is C++ and designed around macros, lambdas, inline
> templates, etc. There are ongoing discussions on providing an official
> perfetto C API, but it is not yet clear when this will land on the perfetto
> roadmap.
> - The perfetto SDK is an amalgamated .h and .cc that adds up to 100K
> lines of code.
> - Anything that includes perfetto.h takes a long time to compile.
> - The current Perfetto SDK design is incompatible with being a shared
> library behind a C API.
>
> *Percetto*
>
> The percetto library <https://github.com/olvaffe/percetto> was recently
> implemented to provide an interim C API for perfetto. It provides efficient
> support for scoped trace events, multiple categories, counters, custom
> timestamps, and debug data annotations. Percetto also provides some
> features that are important to mesa, but not available yet with perfetto
> SDK:
>
> - Trace events from multiple perfetto instances in separate shared
> libraries (like mesa and virglrenderer) show correctly in a single process
> and thread view.
> - Counter tracks and macro API.
>
> Percetto is missing API for perfetto's GPU DataSource and counter support,
> but that feature could be implemented next if it is important for mesa.
> With the existing percetto API mesa could present GPU trace data as named
> 'slice' events and int64_t counters with custom timestamps as shown in the
> image above (based on this sample
> <https://github.com/olvaffe/percetto/blob/main/examples/timestamps.c>).
>
> *Mesa Integration Alternatives*
>
> Note: we have some pressing needs for performance analysis in Chrome OS, so
> I'm intentionally leaving out the alternative of waiting for an official
> perfetto C API. Of course, once that C API is available it would become an
> option to migrate to it from any of the alternatives below.
>
> Ordered by difficulty with easiest first:
>
> 1. Statically link with percetto as an optional external dependency
> (virglrenderer
> now has this approach
> <https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/480>
> ).
> - Pros: API already supports most common tracing needs. Tested and used
> by an increasing number of CrOS components.
> - Cons: External dependency for optional mesa build option.
> 2. Embed Perfetto SDK + a Percetto fork/copy.
> - Pros: API already supports most common tracing needs. No added
> external dependency for mesa.
> - Cons: Percetto code divergence, bug fixes need to land in two trees.
> 3. Embed Perfetto SDK + custom C wrapper.
> - Pros: Tailored API for mesa's needs.
> - Cons: Nontrivial development efforts and maintenance.
> 4. Generate C stubs for the Perfetto protobuf and reimplement the
> Perfetto SDK in C.
> - Pros: Tailored API for mesa's needs. Possible smaller binary impact
> from simpler implementation.
> - Cons: Significant development efforts and maintenance.
>
> Regardless of the integration direction, I expect we would disable perfetto
> in the default build for now to minimize disruption.
>
> I like #1, because there are some nontrivial subtleties to the C wrapper
> that provide both API conveniences and runtime performance that would need
> to be reimplemented or maintained with the other options. I will also
> volunteer to do #1 or #2, but I'm not sure I have time for #3 or #4 :D.
>
> Any other thoughts on how best to integrate perfetto into mesa?
>
> -jb
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
More information about the mesa-dev
mailing list