[Mesa-dev] Perfetto CPU/GPU tracing

Alyssa Rosenzweig alyssa.rosenzweig at collabora.com
Fri Feb 12 22:48:38 UTC 2021


Sure, I definitely see the use case for virgl :)

On Fri, Feb 12, 2021 at 02:43:25PM -0800, Chia-I Wu wrote:
> For virgl, where the biggest perf gaps often come from unnecessary CPU
> waits or high latencies of fence signaling, being able to insert
> userspace driver trace events and combine them with kernel ftrace
> events are a big plus.  Admittedly, there is no HW counters and my
> needs are simpler (inserting function begin/end and wait begin/end and
> combining them with virtio-gpu and dma-fence ftrace events).
> 
> On Fri, Feb 12, 2021 at 2:13 PM Alyssa Rosenzweig
> <alyssa.rosenzweig at collabora.com> wrote:
> >
> > My 2c for Mali/Panfrost --
> >
> > For us, capturing GPU perf counters is orthogonal to rendering. It's
> > expected (e.g. with Arm's tools) to do this from a separate process.
> > Neither Mesa nor the DDK should require custom instrumentation for the
> > low-level data. Fahien's gfx-pps handles this correctly for Panfrost +
> > Perfetto as it is. So for us I don't see the value in modifying Mesa for
> > tracing.
> >
> > On Fri, Feb 12, 2021 at 01:34:51PM -0800, John Bates wrote:
> > > (responding from correct address this time)
> > >
> > > On Fri, Feb 12, 2021 at 12:03 PM Mark Janes <mark.a.janes at intel.com> wrote:
> > >
> > > > I've recently been using GPUVis to look at trace events.  On Intel
> > > > platforms, GPUVis incorporates ftrace events from the i915 driver,
> > > > performance metrics from igt-gpu-tools, and userspace ftrace markers
> > > > that I locally hack up in Mesa.
> > > >
> > >
> > > GPUVis is great. I would love to see that data combined with
> > > userspace events without any need for local hacks. Perfetto provides
> > > on-demand trace events with lower overhead compared to ftrace, so for
> > > example it is acceptable to have production trace instrumentation that can
> > > be captured without dev builds. To do that with ftrace it may require a way
> > > to enable and disable the ftrace file writes to avoid the overhead when
> > > tracing is not in use. This is what Android does with systrace/atrace, for
> > > example, it uses Binder to notify processes about trace sessions. Perfetto
> > > does that in a more portable way.
> > >
> > >
> > > >
> > > > It is very easy to compile the GPUVis UI.  Userspace instrumentation
> > > > requires a single C/C++ header.  You don't have to access an external
> > > > web service to analyze trace data (a big no-no for devs working on
> > > > preproduction hardware).
> > > >
> > > > Is it possible to build and run the Perfetto UI locally?
> > >
> > >
> > > Yes, local UI builds are possible
> > > <https://github.com/google/perfetto/blob/5ff758df67da94d17734c2e70eb6738c4902953e/ui/README.md>.
> > > Also confirmed with the perfetto team <https://discord.gg/35ShE3A> that
> > > trace data is not uploaded unless you use the 'share' feature.
> > >
> > >
> > > >   Can it display
> > > > arbitrary trace events that are written to
> > > > /sys/kernel/tracing/trace_marker ?
> > >
> > >
> > > Yes, I believe it does support that via linux.ftrace data source
> > > <https://perfetto.dev/docs/quickstart/linux-tracing>. We use that for
> > > example to overlay CPU sched data to show what process is on each core
> > > throughout the timeline. There are many ftrace event types
> > > <https://github.com/google/perfetto/tree/5ff758df67da94d17734c2e70eb6738c4902953e/protos/perfetto/trace/ftrace>
> > > in
> > > the perfetto protos.
> > >
> > >
> > > > Can it be extended to show i915 and
> > > > i915-perf-recorder events?
> > > >
> > >
> > > It can be extended to consume custom data sources. One way this is done is
> > > via a bridge daemon, such as traced_probes which is responsible for
> > > capturing data from ftrace and /proc during a trace session and sending it
> > > to traced. traced is the main perfetto tracing daemon that notifies all
> > > trace data sources to start/stop tracing and communicates with user tracing
> > > requests via the 'perfetto' command.
> > >
> > >
> > >
> > > >
> > > > John Bates <jbates at chromium.org> writes:
> > > >
> > > > > I recently opened issue 4262
> > > > > <https://gitlab.freedesktop.org/mesa/mesa/-/issues/4262> to begin the
> > > > > discussion on integrating perfetto into mesa.
> > > > >
> > > > > *Background*
> > > > >
> > > > > System-wide tracing is an invaluable tool for developers to find and fix
> > > > > performance problems. The perfetto project enables a combined view of
> > > > trace
> > > > > data from kernel ftrace, GPU driver and various manually-instrumented
> > > > > tracepoints throughout the application and system. This helps developers
> > > > > quickly answer questions like:
> > > > >
> > > > >    - How long are frames taking?
> > > > >    - What caused a particular frame drop?
> > > > >    - Is it CPU bound or GPU bound?
> > > > >    - Did a CPU core frequency drop cause something to go slower than
> > > > usual?
> > > > >    - Is something else running that is stealing CPU or GPU time? Could I
> > > > >    fix that with better thread/context priorities?
> > > > >    - Are all CPU cores being used effectively? Do I need
> > > > sched_setaffinity
> > > > >    to keep my thread on a big or little core?
> > > > >    - What’s the latency between CPU frame submit and GPU start?
> > > > >
> > > > > *What Does Mesa + Perfetto Provide?*
> > > > >
> > > > > Mesa is in a unique position to produce GPU trace data for several GPU
> > > > > vendors without requiring the developer to build and install additional
> > > > > tools like gfx-pps <https://gitlab.freedesktop.org/Fahien/gfx-pps>.
> > > > >
> > > > > The key is making it easy for developers to use. Ideally, perfetto is
> > > > > eventually available by default in mesa so that if your system has
> > > > perfetto
> > > > > traced running, you just need to run perfetto (perhaps along with setting
> > > > > an environment variable) with the mesa categories to see:
> > > > >
> > > > >    - GPU processing timeline events.
> > > > >    - GPU counters.
> > > > >    - CPU events for potentially slow functions in mesa like shader
> > > > compiles.
> > > > >
> > > > > Example of what this data might look like (with fake GPU events):
> > > > > [image: percetto-gpu-example.png]
> > > > >
> > > > > *Runtime Characteristics*
> > > > >
> > > > >    - ~500KB additional binary size. Even with using only the basic
> > > > features
> > > > >    of perfetto, it will increase the binary size of mesa by about 500KB.
> > > > >    - Background thread. Perfetto uses a background thread for
> > > > communication
> > > > >    with the system tracing daemon (traced) to advertise trace data and
> > > > get
> > > > >    notification of trace start/stop.
> > > > >    - Runtime overhead when disabled is designed to be optimal with one
> > > > >    predicted branch, typically a few CPU cycles
> > > > >    <https://perfetto.dev/docs/instrumentation/track-events#performance>
> > > > per
> > > > >    event. While enabled, the overhead can be around 1 us per event.
> > > > >
> > > > > *Integration Challenges*
> > > > >
> > > > >    - The perfetto SDK is C++ and designed around macros, lambdas, inline
> > > > >    templates, etc. There are ongoing discussions on providing an official
> > > > >    perfetto C API, but it is not yet clear when this will land on the
> > > > perfetto
> > > > >    roadmap.
> > > > >    - The perfetto SDK is an amalgamated .h and .cc that adds up to 100K
> > > > >    lines of code.
> > > > >    - Anything that includes perfetto.h takes a long time to compile.
> > > > >    - The current Perfetto SDK design is incompatible with being a shared
> > > > >    library behind a C API.
> > > > >
> > > > > *Percetto*
> > > > >
> > > > > The percetto library <https://github.com/olvaffe/percetto> was recently
> > > > > implemented to provide an interim C API for perfetto. It provides
> > > > efficient
> > > > > support for scoped trace events, multiple categories, counters, custom
> > > > > timestamps, and debug data annotations. Percetto also provides some
> > > > > features that are important to mesa, but not available yet with perfetto
> > > > > SDK:
> > > > >
> > > > >    - Trace events from multiple perfetto instances in separate shared
> > > > >    libraries (like mesa and virglrenderer) show correctly in a single
> > > > process
> > > > >    and thread view.
> > > > >    - Counter tracks and macro API.
> > > > >
> > > > > Percetto is missing API for perfetto's GPU DataSource and counter
> > > > support,
> > > > > but that feature could be implemented next if it is important for mesa.
> > > > > With the existing percetto API mesa could present GPU trace data as named
> > > > > 'slice' events and int64_t counters with custom timestamps as shown in
> > > > the
> > > > > image above (based on this sample
> > > > > <https://github.com/olvaffe/percetto/blob/main/examples/timestamps.c>).
> > > > >
> > > > > *Mesa Integration Alternatives*
> > > > >
> > > > > Note: we have some pressing needs for performance analysis in Chrome OS,
> > > > so
> > > > > I'm intentionally leaving out the alternative of waiting for an official
> > > > > perfetto C API. Of course, once that C API is available it would become
> > > > an
> > > > > option to migrate to it from any of the alternatives below.
> > > > >
> > > > > Ordered by difficulty with easiest first:
> > > > >
> > > > >    1. Statically link with percetto as an optional external dependency
> > > > > (virglrenderer
> > > > >    now has this approach
> > > > >    <
> > > > https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/480>
> > > > >    ).
> > > > >    - Pros: API already supports most common tracing needs. Tested and
> > > > used
> > > > >       by an increasing number of CrOS components.
> > > > >       - Cons: External dependency for optional mesa build option.
> > > > >    2. Embed Perfetto SDK + a Percetto fork/copy.
> > > > >       - Pros: API already supports most common tracing needs. No added
> > > > >       external dependency for mesa.
> > > > >       - Cons: Percetto code divergence, bug fixes need to land in two
> > > > trees.
> > > > >    3. Embed Perfetto SDK + custom C wrapper.
> > > > >       - Pros: Tailored API for mesa's needs.
> > > > >       - Cons: Nontrivial development efforts and maintenance.
> > > > >    4. Generate C stubs for the Perfetto protobuf and reimplement the
> > > > >    Perfetto SDK in C.
> > > > >       - Pros: Tailored API for mesa's needs. Possible smaller binary
> > > > impact
> > > > >       from simpler implementation.
> > > > >       - Cons: Significant development efforts and maintenance.
> > > > >
> > > > > Regardless of the integration direction, I expect we would disable
> > > > perfetto
> > > > > in the default build for now to minimize disruption.
> > > > >
> > > > > I like #1, because there are some nontrivial subtleties to the C wrapper
> > > > > that provide both API conveniences and runtime performance that would
> > > > need
> > > > > to be reimplemented or maintained with the other options. I will also
> > > > > volunteer to do #1 or #2, but I'm not sure I have time for #3 or #4 :D.
> > > > >
> > > > > Any other thoughts on how best to integrate perfetto into mesa?
> > > > >
> > > > > -jb
> > > > > _______________________________________________
> > > > > mesa-dev mailing list
> > > > > mesa-dev at lists.freedesktop.org
> > > > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > > >
> >
> > > _______________________________________________
> > > mesa-dev mailing list
> > > mesa-dev at lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >
> > _______________________________________________
> > mesa-dev mailing list
> > mesa-dev at lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev


More information about the mesa-dev mailing list