[Mesa-dev] Perfetto CPU/GPU tracing

Rob Clark robdclark at gmail.com
Sat Feb 13 01:10:13 UTC 2021


On Fri, Feb 12, 2021 at 4:51 PM Mark Janes <mark.a.janes at intel.com> wrote:
>
> Rob Clark <robdclark at gmail.com> writes:
>
> > On Fri, Feb 12, 2021 at 5:01 AM Tamminen, Eero T
> > <eero.t.tamminen at intel.com> wrote:
> >>
> >> Hi,
> >>
> >> On Thu, 2021-02-11 at 17:39 -0800, John Bates wrote:
> >> > I recently opened issue 4262
> >> > <https://gitlab.freedesktop.org/mesa/mesa/-/issues/4262> to begin the
> >> > discussion on integrating perfetto into mesa.
> >> >
> >> > *Background*
> >> >
> >> > System-wide tracing is an invaluable tool for developers to find and
> >> > fix
> >> > performance problems. The perfetto project enables a combined view of
> >> > trace
> >> > data from kernel ftrace, GPU driver and various manually-instrumented
> >> > tracepoints throughout the application and system.
> >>
> >> Unlike some other Linux tracing solutions, Perfetto appears to be for
> >> Android / Chrome(OS?), and not available from in common Linux distro
> >> repos.
> >
> > I don't think there is anything about perfetto that would not be
> > usable in a generic linux distro.. and mesa support for perfetto would
> > perhaps be a compelling reason for distro's to add support
> >
> >> So, why Perfetto instead of one of the other solutions, e.g. from ones
> >> mentioned here:
> >>         https://tracingsummit.org/ts/2018/
> >> ?
> >>
> >> And, if tracing API is added to Mesa, shouldn't it support also
> >> tracepoints for other tracing solutions?
> >
> > perfetto does have systrace collectors
> >
> > And a general comment on perfetto vs other things.. we end up needing
> > to support perfetto regardless (for android and CrOS).. we don't
> > *need* to enable it on generic linux, but I think we should (but maybe
> > using the mode that does not require a system server.. at least
> > initially.. that may limit it's ability to collect systrace and traces
> > from other parts of the system, but that wouldn't depend on distro's
> > enabling perfetto system server).
>
> Perfetto seems like an awful lot of infrastructure to capture trace
> events.  Why not follow the example of GPUVis, and write generic
> trace_markers to ftrace?  It limits impact to Mesa, while allowing any
> trace visualizer to use the trace points.

I'm not really seeing how that would cover anything more than CPU
based events.. which is kind of the smallest part of what I'm
interested in..

> >> I mean, code added to drivers themselves preferably should not have
> >> anything perfetto/percetto specific.  Tracing system specific code
> >> should be only in one place (even if it's just macros in common header).
> >>
> >>
> >> > This helps developers
> >> > quickly answer questions like:
> >> >
> >> >    - How long are frames taking?
> >>
> >> That doesn't require any changes to Mesa.  Just set uprobe for suitable
> >> buffer swap function [1], and parse kernel ftrace events.  This way
> >> starting tracing doesn't require even restarting the tracked processes.
> >>
> >
> > But this doesn't tell you how long the GPU is spending doing what.  My
> > rough idea is to hook up an optional callback to u_tracepoint so we
> > can get generate perfetto traces on the GPU timeline (ie. with
> > timestamps captured from GPU), fwiw
>
> I implemented a feature called INTEL_MEASURE based off of a tool that
> Ken wrote.  It captures render/batch/frame timestamps in a BO, providing
> durations on the GPU timeline.  It works for Iris and Anv.
>
> The approach provides accurate gpu timing, with minimal stalling.  This
> data could be presented in Perfetto or GPUVis.

have a look at u_trace.. it is basically this but implemented in a way
that is (hopefully) useful to other drivers..

(there is some small gallium dependency currently, although at some
point when I'm spending more time on vk optimization I'll hoist it out
of gallium/aux/util, unless someone else gets there first)

BR,
-R

> > BR,
> > -R
> >
> >> [1] glXSwapBuffers, eglSwapBuffers, eglSwapBuffersWithDamageEXT,
> >> anv_QueuePresentKHR[2]..
> >>
> >> [2] Many apps resolve "vkQueuePresentKHR" Vulkan API loader wrapper
> >> function and call the backend function like "anv_QueuePresentKHR"
> >> directly, so it's  better to track latter instead.
> >>
> >>
> >> >    - What caused a particular frame drop?
> >> >    - Is it CPU bound or GPU bound?
> >>
> >> That doesn't require adding tracepoints to Mesa, just checking CPU & GPU
> >> utilization (which is lower level thing).
> >>
> >>
> >> >    - Did a CPU core frequency drop cause something to go slower than
> >> > usual?
> >>
> >> Note that nowadays actual CPU frequencies are often controlled by HW /
> >> firmware, so you don't necessarily get any ftrace event from freq
> >> change, you would need to poll MSR registers instead (which is
> >> privileged operation, and polling can easily miss changes).
> >>
> >>
> >> >    - Is something else running that is stealing CPU or GPU time? Could
> >> > I
> >> >    fix that with better thread/context priorities?
> >> >    - Are all CPU cores being used effectively? Do I need
> >> > sched_setaffinity
> >> >    to keep my thread on a big or little core?
> >>
> >> I don't think these to require adding tracepoints to Mesa either...
> >>
> >>
> >> >    - What’s the latency between CPU frame submit and GPU start?
> >>
> >> I think this would require tracepoints in kernel GPU code more than in
> >> Mesa?
> >>
> >>
> >>         - Eero
> >>
> >>
> >> > *What Does Mesa + Perfetto Provide?*
> >> >
> >> > Mesa is in a unique position to produce GPU trace data for several GPU
> >> > vendors without requiring the developer to build and install
> >> > additional
> >> > tools like gfx-pps <https://gitlab.freedesktop.org/Fahien/gfx-pps>.
> >> >
> >> > The key is making it easy for developers to use. Ideally, perfetto is
> >> > eventually available by default in mesa so that if your system has
> >> > perfetto
> >> > traced running, you just need to run perfetto (perhaps along with
> >> > setting
> >> > an environment variable) with the mesa categories to see:
> >> >
> >> >    - GPU processing timeline events.
> >> >    - GPU counters.
> >> >    - CPU events for potentially slow functions in mesa like shader
> >> > compiles.
> >> >
> >> > Example of what this data might look like (with fake GPU events):
> >> > [image: percetto-gpu-example.png]
> >> >
> >> > *Runtime Characteristics*
> >> >
> >> >    - ~500KB additional binary size. Even with using only the basic
> >> > features
> >> >    of perfetto, it will increase the binary size of mesa by about
> >> > 500KB.
> >> >    - Background thread. Perfetto uses a background thread for
> >> > communication
> >> >    with the system tracing daemon (traced) to advertise trace data and
> >> > get
> >> >    notification of trace start/stop.
> >> >    - Runtime overhead when disabled is designed to be optimal with one
> >> >    predicted branch, typically a few CPU cycles
> >> >
> >> > <https://perfetto.dev/docs/instrumentation/track-events#performance>
> >> > per
> >> >    event. While enabled, the overhead can be around 1 us per event.
> >> >
> >> > *Integration Challenges*
> >> >
> >> >    - The perfetto SDK is C++ and designed around macros, lambdas,
> >> > inline
> >> >    templates, etc. There are ongoing discussions on providing an
> >> > official
> >> >    perfetto C API, but it is not yet clear when this will land on the
> >> > perfetto
> >> >    roadmap.
> >> >    - The perfetto SDK is an amalgamated .h and .cc that adds up to
> >> > 100K
> >> >    lines of code.
> >> >    - Anything that includes perfetto.h takes a long time to compile.
> >> >    - The current Perfetto SDK design is incompatible with being a
> >> > shared
> >> >    library behind a C API.
> >> >
> >> > *Percetto*
> >> >
> >> > The percetto library <https://github.com/olvaffe/percetto> was
> >> > recently
> >> > implemented to provide an interim C API for perfetto. It provides
> >> > efficient
> >> > support for scoped trace events, multiple categories, counters, custom
> >> > timestamps, and debug data annotations. Percetto also provides some
> >> > features that are important to mesa, but not available yet with
> >> > perfetto
> >> > SDK:
> >> >
> >> >    - Trace events from multiple perfetto instances in separate shared
> >> >    libraries (like mesa and virglrenderer) show correctly in a single
> >> > process
> >> >    and thread view.
> >> >    - Counter tracks and macro API.
> >> >
> >> > Percetto is missing API for perfetto's GPU DataSource and counter
> >> > support,
> >> > but that feature could be implemented next if it is important for
> >> > mesa.
> >> > With the existing percetto API mesa could present GPU trace data as
> >> > named
> >> > 'slice' events and int64_t counters with custom timestamps as shown in
> >> > the
> >> > image above (based on this sample
> >> > <https://github.com/olvaffe/percetto/blob/main/examples/timestamps.c>)
> >> > .
> >> >
> >> > *Mesa Integration Alternatives*
> >> >
> >> > Note: we have some pressing needs for performance analysis in Chrome
> >> > OS, so
> >> > I'm intentionally leaving out the alternative of waiting for an
> >> > official
> >> > perfetto C API. Of course, once that C API is available it would
> >> > become an
> >> > option to migrate to it from any of the alternatives below.
> >> >
> >> > Ordered by difficulty with easiest first:
> >> >
> >> >    1. Statically link with percetto as an optional external dependency
> >> > (virglrenderer
> >> >    now has this approach
> >> >    <
> >> > https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/480
> >> > >
> >> >    ).
> >> >    - Pros: API already supports most common tracing needs. Tested and
> >> > used
> >> >       by an increasing number of CrOS components.
> >> >       - Cons: External dependency for optional mesa build option.
> >> >    2. Embed Perfetto SDK + a Percetto fork/copy.
> >> >       - Pros: API already supports most common tracing needs. No added
> >> >       external dependency for mesa.
> >> >       - Cons: Percetto code divergence, bug fixes need to land in two
> >> > trees.
> >> >    3. Embed Perfetto SDK + custom C wrapper.
> >> >       - Pros: Tailored API for mesa's needs.
> >> >       - Cons: Nontrivial development efforts and maintenance.
> >> >    4. Generate C stubs for the Perfetto protobuf and reimplement the
> >> >    Perfetto SDK in C.
> >> >       - Pros: Tailored API for mesa's needs. Possible smaller binary
> >> > impact
> >> >       from simpler implementation.
> >> >       - Cons: Significant development efforts and maintenance.
> >> >
> >> > Regardless of the integration direction, I expect we would disable
> >> > perfetto
> >> > in the default build for now to minimize disruption.
> >> >
> >> > I like #1, because there are some nontrivial subtleties to the C
> >> > wrapper
> >> > that provide both API conveniences and runtime performance that would
> >> > need
> >> > to be reimplemented or maintained with the other options. I will also
> >> > volunteer to do #1 or #2, but I'm not sure I have time for #3 or #4
> >> > :D.
> >> >
> >> > Any other thoughts on how best to integrate perfetto into mesa?
> >> >
> >> > -jb
> >> > _______________________________________________
> >> > mesa-dev mailing list
> >> > mesa-dev at lists.freedesktop.org
> >> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >>
> >> _______________________________________________
> >> mesa-dev mailing list
> >> mesa-dev at lists.freedesktop.org
> >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > _______________________________________________
> > mesa-dev mailing list
> > mesa-dev at lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev


More information about the mesa-dev mailing list