[Mesa-dev] Perfetto CPU/GPU tracing
jbates at chromium.org
Fri Feb 12 17:06:28 UTC 2021
On Fri, Feb 12, 2021 at 5:01 AM Tamminen, Eero T <eero.t.tamminen at intel.com>
> Unlike some other Linux tracing solutions, Perfetto appears to be for
> Android / Chrome(OS?), and not available from in common Linux distro
> So, why Perfetto instead of one of the other solutions, e.g. from ones
> mentioned here:
Good question. Perfetto is for Linux, Android, and Chrome OS. Not sure what
Linux distros provide it besides Android and Chrome OS. It provides
comprehensive tracing solutions from data collection and tools to
convenient web-based UI and analysis as well as interoperation with
other trace data providers. Looking at the tracing summit presentations,
for example, there appear to be some good additional tracing data sources
that could potentially feed into Perfetto trace daemon and UI. But none of
those particular projects are providing a comprehensive solution like
Perfetto is. Lots more detail at perfetto.dev.
> And, if tracing API is added to Mesa, shouldn't it support also
> tracepoints for other tracing solutions?
> I mean, code added to drivers themselves preferably should not have
> anything perfetto/percetto specific. Tracing system specific code
> should be only in one place (even if it's just macros in common header).
I agree it makes sense to keep the macro API implementation in a common
mesa header so that we have the option of changing out the backend. On the
other hand, it can get difficult to maintain more than one tracing backend,
especially when tracing usage goes beyond the simple TRACE_SCOPE(__func__)
macros. For example, with GPU timeline tracks, counters, etc. I would not
expect mesa devs to test their tracing code on more than one tracing
backend, so it would be likely for other backends to regress. So ideally we
could pick one.
> > This helps developers
> > quickly answer questions like:
> > - How long are frames taking?
> That doesn't require any changes to Mesa. Just set uprobe for suitable
> buffer swap function , and parse kernel ftrace events. This way
> starting tracing doesn't require even restarting the tracked processes.
>  glXSwapBuffers, eglSwapBuffers, eglSwapBuffersWithDamageEXT,
>  Many apps resolve "vkQueuePresentKHR" Vulkan API loader wrapper
> function and call the backend function like "anv_QueuePresentKHR"
> directly, so it's better to track latter instead.
> > - What caused a particular frame drop?
> > - Is it CPU bound or GPU bound?
> That doesn't require adding tracepoints to Mesa, just checking CPU & GPU
> utilization (which is lower level thing).
> > - Did a CPU core frequency drop cause something to go slower than
> > usual?
> Note that nowadays actual CPU frequencies are often controlled by HW /
> firmware, so you don't necessarily get any ftrace event from freq
> change, you would need to poll MSR registers instead (which is
> privileged operation, and polling can easily miss changes).
> > - Is something else running that is stealing CPU or GPU time? Could
> > I
> > fix that with better thread/context priorities?
> > - Are all CPU cores being used effectively? Do I need
> > sched_setaffinity
> > to keep my thread on a big or little core?
> I don't think these to require adding tracepoints to Mesa either...
> > - What’s the latency between CPU frame submit and GPU start?
> I think this would require tracepoints in kernel GPU code more than in
> - Eero
> > *What Does Mesa + Perfetto Provide?*
> > Mesa is in a unique position to produce GPU trace data for several GPU
> > vendors without requiring the developer to build and install
> > additional
> > tools like gfx-pps <https://gitlab.freedesktop.org/Fahien/gfx-pps>.
> > The key is making it easy for developers to use. Ideally, perfetto is
> > eventually available by default in mesa so that if your system has
> > perfetto
> > traced running, you just need to run perfetto (perhaps along with
> > setting
> > an environment variable) with the mesa categories to see:
> > - GPU processing timeline events.
> > - GPU counters.
> > - CPU events for potentially slow functions in mesa like shader
> > compiles.
> > Example of what this data might look like (with fake GPU events):
> > [image: percetto-gpu-example.png]
> > *Runtime Characteristics*
> > - ~500KB additional binary size. Even with using only the basic
> > features
> > of perfetto, it will increase the binary size of mesa by about
> > 500KB.
> > - Background thread. Perfetto uses a background thread for
> > communication
> > with the system tracing daemon (traced) to advertise trace data and
> > get
> > notification of trace start/stop.
> > - Runtime overhead when disabled is designed to be optimal with one
> > predicted branch, typically a few CPU cycles
> > <https://perfetto.dev/docs/instrumentation/track-events#performance>
> > per
> > event. While enabled, the overhead can be around 1 us per event.
> > *Integration Challenges*
> > - The perfetto SDK is C++ and designed around macros, lambdas,
> > inline
> > templates, etc. There are ongoing discussions on providing an
> > official
> > perfetto C API, but it is not yet clear when this will land on the
> > perfetto
> > roadmap.
> > - The perfetto SDK is an amalgamated .h and .cc that adds up to
> > 100K
> > lines of code.
> > - Anything that includes perfetto.h takes a long time to compile.
> > - The current Perfetto SDK design is incompatible with being a
> > shared
> > library behind a C API.
> > *Percetto*
> > The percetto library <https://github.com/olvaffe/percetto> was
> > recently
> > implemented to provide an interim C API for perfetto. It provides
> > efficient
> > support for scoped trace events, multiple categories, counters, custom
> > timestamps, and debug data annotations. Percetto also provides some
> > features that are important to mesa, but not available yet with
> > perfetto
> > SDK:
> > - Trace events from multiple perfetto instances in separate shared
> > libraries (like mesa and virglrenderer) show correctly in a single
> > process
> > and thread view.
> > - Counter tracks and macro API.
> > Percetto is missing API for perfetto's GPU DataSource and counter
> > support,
> > but that feature could be implemented next if it is important for
> > mesa.
> > With the existing percetto API mesa could present GPU trace data as
> > named
> > 'slice' events and int64_t counters with custom timestamps as shown in
> > the
> > image above (based on this sample
> > <https://github.com/olvaffe/percetto/blob/main/examples/timestamps.c>)
> > .
> > *Mesa Integration Alternatives*
> > Note: we have some pressing needs for performance analysis in Chrome
> > OS, so
> > I'm intentionally leaving out the alternative of waiting for an
> > official
> > perfetto C API. Of course, once that C API is available it would
> > become an
> > option to migrate to it from any of the alternatives below.
> > Ordered by difficulty with easiest first:
> > 1. Statically link with percetto as an optional external dependency
> > (virglrenderer
> > now has this approach
> > <
> > https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/480
> > >
> > ).
> > - Pros: API already supports most common tracing needs. Tested and
> > used
> > by an increasing number of CrOS components.
> > - Cons: External dependency for optional mesa build option.
> > 2. Embed Perfetto SDK + a Percetto fork/copy.
> > - Pros: API already supports most common tracing needs. No added
> > external dependency for mesa.
> > - Cons: Percetto code divergence, bug fixes need to land in two
> > trees.
> > 3. Embed Perfetto SDK + custom C wrapper.
> > - Pros: Tailored API for mesa's needs.
> > - Cons: Nontrivial development efforts and maintenance.
> > 4. Generate C stubs for the Perfetto protobuf and reimplement the
> > Perfetto SDK in C.
> > - Pros: Tailored API for mesa's needs. Possible smaller binary
> > impact
> > from simpler implementation.
> > - Cons: Significant development efforts and maintenance.
> > Regardless of the integration direction, I expect we would disable
> > perfetto
> > in the default build for now to minimize disruption.
> > I like #1, because there are some nontrivial subtleties to the C
> > wrapper
> > that provide both API conveniences and runtime performance that would
> > need
> > to be reimplemented or maintained with the other options. I will also
> > volunteer to do #1 or #2, but I'm not sure I have time for #3 or #4
> > :D.
> > Any other thoughts on how best to integrate perfetto into mesa?
> > -jb
> > _______________________________________________
> > mesa-dev mailing list
> > mesa-dev at lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the mesa-dev