[Mesa-dev] Perfetto CPU/GPU tracing

Lionel Landwerlin lionel.g.landwerlin at intel.com
Sat Feb 13 01:08:17 UTC 2021


We're kind of in the same boat for Intel.

Access to GPU perf counters is exclusive to a single process if you want 
to build a timeline of the work (because preemption etc...).

The best information we could add from mesa would a timestamp of when a 
particular drawcall started.
But that's pretty much when timestamps queries are.

Were you thinking of particular GPU generated data you don't get from 
gfx-pps?

Thanks,

-Lionel


On 13/02/2021 00:12, Alyssa Rosenzweig wrote:
> My 2c for Mali/Panfrost --
>
> For us, capturing GPU perf counters is orthogonal to rendering. It's
> expected (e.g. with Arm's tools) to do this from a separate process.
> Neither Mesa nor the DDK should require custom instrumentation for the
> low-level data. Fahien's gfx-pps handles this correctly for Panfrost +
> Perfetto as it is. So for us I don't see the value in modifying Mesa for
> tracing.
>
> On Fri, Feb 12, 2021 at 01:34:51PM -0800, John Bates wrote:
>> (responding from correct address this time)
>>
>> On Fri, Feb 12, 2021 at 12:03 PM Mark Janes <mark.a.janes at intel.com> wrote:
>>
>>> I've recently been using GPUVis to look at trace events.  On Intel
>>> platforms, GPUVis incorporates ftrace events from the i915 driver,
>>> performance metrics from igt-gpu-tools, and userspace ftrace markers
>>> that I locally hack up in Mesa.
>>>
>> GPUVis is great. I would love to see that data combined with
>> userspace events without any need for local hacks. Perfetto provides
>> on-demand trace events with lower overhead compared to ftrace, so for
>> example it is acceptable to have production trace instrumentation that can
>> be captured without dev builds. To do that with ftrace it may require a way
>> to enable and disable the ftrace file writes to avoid the overhead when
>> tracing is not in use. This is what Android does with systrace/atrace, for
>> example, it uses Binder to notify processes about trace sessions. Perfetto
>> does that in a more portable way.
>>
>>
>>> It is very easy to compile the GPUVis UI.  Userspace instrumentation
>>> requires a single C/C++ header.  You don't have to access an external
>>> web service to analyze trace data (a big no-no for devs working on
>>> preproduction hardware).
>>>
>>> Is it possible to build and run the Perfetto UI locally?
>>
>> Yes, local UI builds are possible
>> <https://github.com/google/perfetto/blob/5ff758df67da94d17734c2e70eb6738c4902953e/ui/README.md>.
>> Also confirmed with the perfetto team <https://discord.gg/35ShE3A> that
>> trace data is not uploaded unless you use the 'share' feature.
>>
>>
>>>    Can it display
>>> arbitrary trace events that are written to
>>> /sys/kernel/tracing/trace_marker ?
>>
>> Yes, I believe it does support that via linux.ftrace data source
>> <https://perfetto.dev/docs/quickstart/linux-tracing>. We use that for
>> example to overlay CPU sched data to show what process is on each core
>> throughout the timeline. There are many ftrace event types
>> <https://github.com/google/perfetto/tree/5ff758df67da94d17734c2e70eb6738c4902953e/protos/perfetto/trace/ftrace>
>> in
>> the perfetto protos.
>>
>>
>>> Can it be extended to show i915 and
>>> i915-perf-recorder events?
>>>
>> It can be extended to consume custom data sources. One way this is done is
>> via a bridge daemon, such as traced_probes which is responsible for
>> capturing data from ftrace and /proc during a trace session and sending it
>> to traced. traced is the main perfetto tracing daemon that notifies all
>> trace data sources to start/stop tracing and communicates with user tracing
>> requests via the 'perfetto' command.
>>
>>
>>
>>> John Bates <jbates at chromium.org> writes:
>>>
>>>> I recently opened issue 4262
>>>> <https://gitlab.freedesktop.org/mesa/mesa/-/issues/4262> to begin the
>>>> discussion on integrating perfetto into mesa.
>>>>
>>>> *Background*
>>>>
>>>> System-wide tracing is an invaluable tool for developers to find and fix
>>>> performance problems. The perfetto project enables a combined view of
>>> trace
>>>> data from kernel ftrace, GPU driver and various manually-instrumented
>>>> tracepoints throughout the application and system. This helps developers
>>>> quickly answer questions like:
>>>>
>>>>     - How long are frames taking?
>>>>     - What caused a particular frame drop?
>>>>     - Is it CPU bound or GPU bound?
>>>>     - Did a CPU core frequency drop cause something to go slower than
>>> usual?
>>>>     - Is something else running that is stealing CPU or GPU time? Could I
>>>>     fix that with better thread/context priorities?
>>>>     - Are all CPU cores being used effectively? Do I need
>>> sched_setaffinity
>>>>     to keep my thread on a big or little core?
>>>>     - What’s the latency between CPU frame submit and GPU start?
>>>>
>>>> *What Does Mesa + Perfetto Provide?*
>>>>
>>>> Mesa is in a unique position to produce GPU trace data for several GPU
>>>> vendors without requiring the developer to build and install additional
>>>> tools like gfx-pps <https://gitlab.freedesktop.org/Fahien/gfx-pps>.
>>>>
>>>> The key is making it easy for developers to use. Ideally, perfetto is
>>>> eventually available by default in mesa so that if your system has
>>> perfetto
>>>> traced running, you just need to run perfetto (perhaps along with setting
>>>> an environment variable) with the mesa categories to see:
>>>>
>>>>     - GPU processing timeline events.
>>>>     - GPU counters.
>>>>     - CPU events for potentially slow functions in mesa like shader
>>> compiles.
>>>> Example of what this data might look like (with fake GPU events):
>>>> [image: percetto-gpu-example.png]
>>>>
>>>> *Runtime Characteristics*
>>>>
>>>>     - ~500KB additional binary size. Even with using only the basic
>>> features
>>>>     of perfetto, it will increase the binary size of mesa by about 500KB.
>>>>     - Background thread. Perfetto uses a background thread for
>>> communication
>>>>     with the system tracing daemon (traced) to advertise trace data and
>>> get
>>>>     notification of trace start/stop.
>>>>     - Runtime overhead when disabled is designed to be optimal with one
>>>>     predicted branch, typically a few CPU cycles
>>>>     <https://perfetto.dev/docs/instrumentation/track-events#performance>
>>> per
>>>>     event. While enabled, the overhead can be around 1 us per event.
>>>>
>>>> *Integration Challenges*
>>>>
>>>>     - The perfetto SDK is C++ and designed around macros, lambdas, inline
>>>>     templates, etc. There are ongoing discussions on providing an official
>>>>     perfetto C API, but it is not yet clear when this will land on the
>>> perfetto
>>>>     roadmap.
>>>>     - The perfetto SDK is an amalgamated .h and .cc that adds up to 100K
>>>>     lines of code.
>>>>     - Anything that includes perfetto.h takes a long time to compile.
>>>>     - The current Perfetto SDK design is incompatible with being a shared
>>>>     library behind a C API.
>>>>
>>>> *Percetto*
>>>>
>>>> The percetto library <https://github.com/olvaffe/percetto> was recently
>>>> implemented to provide an interim C API for perfetto. It provides
>>> efficient
>>>> support for scoped trace events, multiple categories, counters, custom
>>>> timestamps, and debug data annotations. Percetto also provides some
>>>> features that are important to mesa, but not available yet with perfetto
>>>> SDK:
>>>>
>>>>     - Trace events from multiple perfetto instances in separate shared
>>>>     libraries (like mesa and virglrenderer) show correctly in a single
>>> process
>>>>     and thread view.
>>>>     - Counter tracks and macro API.
>>>>
>>>> Percetto is missing API for perfetto's GPU DataSource and counter
>>> support,
>>>> but that feature could be implemented next if it is important for mesa.
>>>> With the existing percetto API mesa could present GPU trace data as named
>>>> 'slice' events and int64_t counters with custom timestamps as shown in
>>> the
>>>> image above (based on this sample
>>>> <https://github.com/olvaffe/percetto/blob/main/examples/timestamps.c>).
>>>>
>>>> *Mesa Integration Alternatives*
>>>>
>>>> Note: we have some pressing needs for performance analysis in Chrome OS,
>>> so
>>>> I'm intentionally leaving out the alternative of waiting for an official
>>>> perfetto C API. Of course, once that C API is available it would become
>>> an
>>>> option to migrate to it from any of the alternatives below.
>>>>
>>>> Ordered by difficulty with easiest first:
>>>>
>>>>     1. Statically link with percetto as an optional external dependency
>>>> (virglrenderer
>>>>     now has this approach
>>>>     <
>>> https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/480>
>>>>     ).
>>>>     - Pros: API already supports most common tracing needs. Tested and
>>> used
>>>>        by an increasing number of CrOS components.
>>>>        - Cons: External dependency for optional mesa build option.
>>>>     2. Embed Perfetto SDK + a Percetto fork/copy.
>>>>        - Pros: API already supports most common tracing needs. No added
>>>>        external dependency for mesa.
>>>>        - Cons: Percetto code divergence, bug fixes need to land in two
>>> trees.
>>>>     3. Embed Perfetto SDK + custom C wrapper.
>>>>        - Pros: Tailored API for mesa's needs.
>>>>        - Cons: Nontrivial development efforts and maintenance.
>>>>     4. Generate C stubs for the Perfetto protobuf and reimplement the
>>>>     Perfetto SDK in C.
>>>>        - Pros: Tailored API for mesa's needs. Possible smaller binary
>>> impact
>>>>        from simpler implementation.
>>>>        - Cons: Significant development efforts and maintenance.
>>>>
>>>> Regardless of the integration direction, I expect we would disable
>>> perfetto
>>>> in the default build for now to minimize disruption.
>>>>
>>>> I like #1, because there are some nontrivial subtleties to the C wrapper
>>>> that provide both API conveniences and runtime performance that would
>>> need
>>>> to be reimplemented or maintained with the other options. I will also
>>>> volunteer to do #1 or #2, but I'm not sure I have time for #3 or #4 :D.
>>>>
>>>> Any other thoughts on how best to integrate perfetto into mesa?
>>>>
>>>> -jb
>>>> _______________________________________________
>>>> mesa-dev mailing list
>>>> mesa-dev at lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev




More information about the mesa-dev mailing list