[Mesa-dev] Perfetto CPU/GPU tracing

Rob Clark robdclark at gmail.com
Fri Feb 12 23:26:40 UTC 2021


yes, but that is a limitation of mali which does not apply to a lot of
other drivers ;-)

But AFAIU typically you'd use perfetto with a sort of system server
collecting trace data from various different processes, so the fact
that that mali trace perf counters come from somewhere else doesn't
really matter

And this is about more than just perf cntrs, I plan to wire up the
u_tracepoint stuff to perfetto events (or rather provide a way to hook
up individual tracepoints) so that we can see on a timeline things
like how long the binning pass took, how long tile passes take (broken
down into restore/draw/resolve).  I think we mostly definitely want
perfetto support in mesa.  It can be optional, but I'm hoping linux
distros start enabling perfetto when they have a compelling reason to
(ie. mesa gpu perf analysis)

BR,
-R

On Fri, Feb 12, 2021 at 2:13 PM Alyssa Rosenzweig
<alyssa.rosenzweig at collabora.com> wrote:
>
> My 2c for Mali/Panfrost --
>
> For us, capturing GPU perf counters is orthogonal to rendering. It's
> expected (e.g. with Arm's tools) to do this from a separate process.
> Neither Mesa nor the DDK should require custom instrumentation for the
> low-level data. Fahien's gfx-pps handles this correctly for Panfrost +
> Perfetto as it is. So for us I don't see the value in modifying Mesa for
> tracing.
>
> On Fri, Feb 12, 2021 at 01:34:51PM -0800, John Bates wrote:
> > (responding from correct address this time)
> >
> > On Fri, Feb 12, 2021 at 12:03 PM Mark Janes <mark.a.janes at intel.com> wrote:
> >
> > > I've recently been using GPUVis to look at trace events.  On Intel
> > > platforms, GPUVis incorporates ftrace events from the i915 driver,
> > > performance metrics from igt-gpu-tools, and userspace ftrace markers
> > > that I locally hack up in Mesa.
> > >
> >
> > GPUVis is great. I would love to see that data combined with
> > userspace events without any need for local hacks. Perfetto provides
> > on-demand trace events with lower overhead compared to ftrace, so for
> > example it is acceptable to have production trace instrumentation that can
> > be captured without dev builds. To do that with ftrace it may require a way
> > to enable and disable the ftrace file writes to avoid the overhead when
> > tracing is not in use. This is what Android does with systrace/atrace, for
> > example, it uses Binder to notify processes about trace sessions. Perfetto
> > does that in a more portable way.
> >
> >
> > >
> > > It is very easy to compile the GPUVis UI.  Userspace instrumentation
> > > requires a single C/C++ header.  You don't have to access an external
> > > web service to analyze trace data (a big no-no for devs working on
> > > preproduction hardware).
> > >
> > > Is it possible to build and run the Perfetto UI locally?
> >
> >
> > Yes, local UI builds are possible
> > <https://github.com/google/perfetto/blob/5ff758df67da94d17734c2e70eb6738c4902953e/ui/README.md>.
> > Also confirmed with the perfetto team <https://discord.gg/35ShE3A> that
> > trace data is not uploaded unless you use the 'share' feature.
> >
> >
> > >   Can it display
> > > arbitrary trace events that are written to
> > > /sys/kernel/tracing/trace_marker ?
> >
> >
> > Yes, I believe it does support that via linux.ftrace data source
> > <https://perfetto.dev/docs/quickstart/linux-tracing>. We use that for
> > example to overlay CPU sched data to show what process is on each core
> > throughout the timeline. There are many ftrace event types
> > <https://github.com/google/perfetto/tree/5ff758df67da94d17734c2e70eb6738c4902953e/protos/perfetto/trace/ftrace>
> > in
> > the perfetto protos.
> >
> >
> > > Can it be extended to show i915 and
> > > i915-perf-recorder events?
> > >
> >
> > It can be extended to consume custom data sources. One way this is done is
> > via a bridge daemon, such as traced_probes which is responsible for
> > capturing data from ftrace and /proc during a trace session and sending it
> > to traced. traced is the main perfetto tracing daemon that notifies all
> > trace data sources to start/stop tracing and communicates with user tracing
> > requests via the 'perfetto' command.
> >
> >
> >
> > >
> > > John Bates <jbates at chromium.org> writes:
> > >
> > > > I recently opened issue 4262
> > > > <https://gitlab.freedesktop.org/mesa/mesa/-/issues/4262> to begin the
> > > > discussion on integrating perfetto into mesa.
> > > >
> > > > *Background*
> > > >
> > > > System-wide tracing is an invaluable tool for developers to find and fix
> > > > performance problems. The perfetto project enables a combined view of
> > > trace
> > > > data from kernel ftrace, GPU driver and various manually-instrumented
> > > > tracepoints throughout the application and system. This helps developers
> > > > quickly answer questions like:
> > > >
> > > >    - How long are frames taking?
> > > >    - What caused a particular frame drop?
> > > >    - Is it CPU bound or GPU bound?
> > > >    - Did a CPU core frequency drop cause something to go slower than
> > > usual?
> > > >    - Is something else running that is stealing CPU or GPU time? Could I
> > > >    fix that with better thread/context priorities?
> > > >    - Are all CPU cores being used effectively? Do I need
> > > sched_setaffinity
> > > >    to keep my thread on a big or little core?
> > > >    - What’s the latency between CPU frame submit and GPU start?
> > > >
> > > > *What Does Mesa + Perfetto Provide?*
> > > >
> > > > Mesa is in a unique position to produce GPU trace data for several GPU
> > > > vendors without requiring the developer to build and install additional
> > > > tools like gfx-pps <https://gitlab.freedesktop.org/Fahien/gfx-pps>.
> > > >
> > > > The key is making it easy for developers to use. Ideally, perfetto is
> > > > eventually available by default in mesa so that if your system has
> > > perfetto
> > > > traced running, you just need to run perfetto (perhaps along with setting
> > > > an environment variable) with the mesa categories to see:
> > > >
> > > >    - GPU processing timeline events.
> > > >    - GPU counters.
> > > >    - CPU events for potentially slow functions in mesa like shader
> > > compiles.
> > > >
> > > > Example of what this data might look like (with fake GPU events):
> > > > [image: percetto-gpu-example.png]
> > > >
> > > > *Runtime Characteristics*
> > > >
> > > >    - ~500KB additional binary size. Even with using only the basic
> > > features
> > > >    of perfetto, it will increase the binary size of mesa by about 500KB.
> > > >    - Background thread. Perfetto uses a background thread for
> > > communication
> > > >    with the system tracing daemon (traced) to advertise trace data and
> > > get
> > > >    notification of trace start/stop.
> > > >    - Runtime overhead when disabled is designed to be optimal with one
> > > >    predicted branch, typically a few CPU cycles
> > > >    <https://perfetto.dev/docs/instrumentation/track-events#performance>
> > > per
> > > >    event. While enabled, the overhead can be around 1 us per event.
> > > >
> > > > *Integration Challenges*
> > > >
> > > >    - The perfetto SDK is C++ and designed around macros, lambdas, inline
> > > >    templates, etc. There are ongoing discussions on providing an official
> > > >    perfetto C API, but it is not yet clear when this will land on the
> > > perfetto
> > > >    roadmap.
> > > >    - The perfetto SDK is an amalgamated .h and .cc that adds up to 100K
> > > >    lines of code.
> > > >    - Anything that includes perfetto.h takes a long time to compile.
> > > >    - The current Perfetto SDK design is incompatible with being a shared
> > > >    library behind a C API.
> > > >
> > > > *Percetto*
> > > >
> > > > The percetto library <https://github.com/olvaffe/percetto> was recently
> > > > implemented to provide an interim C API for perfetto. It provides
> > > efficient
> > > > support for scoped trace events, multiple categories, counters, custom
> > > > timestamps, and debug data annotations. Percetto also provides some
> > > > features that are important to mesa, but not available yet with perfetto
> > > > SDK:
> > > >
> > > >    - Trace events from multiple perfetto instances in separate shared
> > > >    libraries (like mesa and virglrenderer) show correctly in a single
> > > process
> > > >    and thread view.
> > > >    - Counter tracks and macro API.
> > > >
> > > > Percetto is missing API for perfetto's GPU DataSource and counter
> > > support,
> > > > but that feature could be implemented next if it is important for mesa.
> > > > With the existing percetto API mesa could present GPU trace data as named
> > > > 'slice' events and int64_t counters with custom timestamps as shown in
> > > the
> > > > image above (based on this sample
> > > > <https://github.com/olvaffe/percetto/blob/main/examples/timestamps.c>).
> > > >
> > > > *Mesa Integration Alternatives*
> > > >
> > > > Note: we have some pressing needs for performance analysis in Chrome OS,
> > > so
> > > > I'm intentionally leaving out the alternative of waiting for an official
> > > > perfetto C API. Of course, once that C API is available it would become
> > > an
> > > > option to migrate to it from any of the alternatives below.
> > > >
> > > > Ordered by difficulty with easiest first:
> > > >
> > > >    1. Statically link with percetto as an optional external dependency
> > > > (virglrenderer
> > > >    now has this approach
> > > >    <
> > > https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/480>
> > > >    ).
> > > >    - Pros: API already supports most common tracing needs. Tested and
> > > used
> > > >       by an increasing number of CrOS components.
> > > >       - Cons: External dependency for optional mesa build option.
> > > >    2. Embed Perfetto SDK + a Percetto fork/copy.
> > > >       - Pros: API already supports most common tracing needs. No added
> > > >       external dependency for mesa.
> > > >       - Cons: Percetto code divergence, bug fixes need to land in two
> > > trees.
> > > >    3. Embed Perfetto SDK + custom C wrapper.
> > > >       - Pros: Tailored API for mesa's needs.
> > > >       - Cons: Nontrivial development efforts and maintenance.
> > > >    4. Generate C stubs for the Perfetto protobuf and reimplement the
> > > >    Perfetto SDK in C.
> > > >       - Pros: Tailored API for mesa's needs. Possible smaller binary
> > > impact
> > > >       from simpler implementation.
> > > >       - Cons: Significant development efforts and maintenance.
> > > >
> > > > Regardless of the integration direction, I expect we would disable
> > > perfetto
> > > > in the default build for now to minimize disruption.
> > > >
> > > > I like #1, because there are some nontrivial subtleties to the C wrapper
> > > > that provide both API conveniences and runtime performance that would
> > > need
> > > > to be reimplemented or maintained with the other options. I will also
> > > > volunteer to do #1 or #2, but I'm not sure I have time for #3 or #4 :D.
> > > >
> > > > Any other thoughts on how best to integrate perfetto into mesa?
> > > >
> > > > -jb
> > > > _______________________________________________
> > > > mesa-dev mailing list
> > > > mesa-dev at lists.freedesktop.org
> > > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > >
>
> > _______________________________________________
> > mesa-dev mailing list
> > mesa-dev at lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


More information about the mesa-dev mailing list