[Mesa-dev] Perfetto CPU/GPU tracing

Lionel Landwerlin lionel.g.landwerlin at intel.com
Sat Feb 13 08:04:47 UTC 2021

On 13/02/2021 04:20, Rob Clark wrote:
> On Fri, Feb 12, 2021 at 5:56 PM Lionel Landwerlin
> <lionel.g.landwerlin at intel.com> wrote:
>> On 13/02/2021 03:38, Rob Clark wrote:
>>> On Fri, Feb 12, 2021 at 5:08 PM Lionel Landwerlin
>>> <lionel.g.landwerlin at intel.com> wrote:
>>>> We're kind of in the same boat for Intel.
>>>> Access to GPU perf counters is exclusive to a single process if you want
>>>> to build a timeline of the work (because preemption etc...).
>>> ugg, does that mean extensions like AMD_performance_monitor doesn't
>>> actually work on intel?
>> It work,s but only a single app can use it at a time.
> I see.. on the freedreno side we haven't really gone down the
> preemption route yet, but we have a way to hook in some safe/restore
> cmdstream

That's why I think, for Intel HW, something like gfx-pps is probably 
best to pull out all the data on a timeline for the entire system.

Then the drivers could just provide timestamp on the timeline to 
annotate it.


>>>> The best information we could add from mesa would a timestamp of when a
>>>> particular drawcall started.
>>>> But that's pretty much when timestamps queries are.
>>>> Were you thinking of particular GPU generated data you don't get from
>>>> gfx-pps?
>>> >From the looks of it, currently I don't get *any* GPU generated data
>>> from gfx-pps ;-)
>> Maybe file a bug? :
>> https://gitlab.freedesktop.org/Fahien/gfx-pps/-/blob/master/src/gpu/intel/intel_driver.cc
>>> We can ofc sample counters from a separate process as well... I have a
>>> curses tool (fdperf) which does this.. but running outside of gpu
>>> cmdstream plus counters losing context across suspend/resume makes it
>>> less than perfect.
>> Our counters are global so to give per application values, we need to
>> post process a stream of HW counter snapshots.
>>>     And something that works the same way as
>>> AMD_performance_monitor under the hook gives a more precise look at
>>> which shaders (for ex) are consuming the most cycles.
>> In our implementation that precision (in particular when a drawcall
>> ends) comes at a stalling cost unfortunately.
> yeah, stalling on our end too for per-draw counter snapshots.. but if
> you are looking for which shaders to optimize that doesn't matter
> *that* much.. they'll be some overhead, but it's not really going to
> change which draws/shaders are expensive.. just mean that you lose out
> on pipelining of the state changes
> BR,
> -R
>>>     For cases where
>>> we can profile a trace, frameretrace and related tools is pretty
>>> great.. but it would be nice to have similar visibility for actual
>>> games (which for me, mostly means android games, since so far no
>>> aarch64 steam store), but also give game developers good tools (or at
>>> least the same tools that they get with other closed src drivers on
>>> android).
>> Sure, but frame analysis is different than live monitoring of the system.
>> On Intel's HW you don't get the same level of details in both cases, and
>> apart for a few timestamps, I think gfx-pps is as good as you gonna get
>> for live stuff.
>> -Lionel
>>> BR,
>>> -R
>>>> Thanks,
>>>> -Lionel
>>>> On 13/02/2021 00:12, Alyssa Rosenzweig wrote:
>>>>> My 2c for Mali/Panfrost --
>>>>> For us, capturing GPU perf counters is orthogonal to rendering. It's
>>>>> expected (e.g. with Arm's tools) to do this from a separate process.
>>>>> Neither Mesa nor the DDK should require custom instrumentation for the
>>>>> low-level data. Fahien's gfx-pps handles this correctly for Panfrost +
>>>>> Perfetto as it is. So for us I don't see the value in modifying Mesa for
>>>>> tracing.
>>>>> On Fri, Feb 12, 2021 at 01:34:51PM -0800, John Bates wrote:
>>>>>> (responding from correct address this time)
>>>>>> On Fri, Feb 12, 2021 at 12:03 PM Mark Janes <mark.a.janes at intel.com> wrote:
>>>>>>> I've recently been using GPUVis to look at trace events.  On Intel
>>>>>>> platforms, GPUVis incorporates ftrace events from the i915 driver,
>>>>>>> performance metrics from igt-gpu-tools, and userspace ftrace markers
>>>>>>> that I locally hack up in Mesa.
>>>>>> GPUVis is great. I would love to see that data combined with
>>>>>> userspace events without any need for local hacks. Perfetto provides
>>>>>> on-demand trace events with lower overhead compared to ftrace, so for
>>>>>> example it is acceptable to have production trace instrumentation that can
>>>>>> be captured without dev builds. To do that with ftrace it may require a way
>>>>>> to enable and disable the ftrace file writes to avoid the overhead when
>>>>>> tracing is not in use. This is what Android does with systrace/atrace, for
>>>>>> example, it uses Binder to notify processes about trace sessions. Perfetto
>>>>>> does that in a more portable way.
>>>>>>> It is very easy to compile the GPUVis UI.  Userspace instrumentation
>>>>>>> requires a single C/C++ header.  You don't have to access an external
>>>>>>> web service to analyze trace data (a big no-no for devs working on
>>>>>>> preproduction hardware).
>>>>>>> Is it possible to build and run the Perfetto UI locally?
>>>>>> Yes, local UI builds are possible
>>>>>> <https://github.com/google/perfetto/blob/5ff758df67da94d17734c2e70eb6738c4902953e/ui/README.md>.
>>>>>> Also confirmed with the perfetto team <https://discord.gg/35ShE3A> that
>>>>>> trace data is not uploaded unless you use the 'share' feature.
>>>>>>>      Can it display
>>>>>>> arbitrary trace events that are written to
>>>>>>> /sys/kernel/tracing/trace_marker ?
>>>>>> Yes, I believe it does support that via linux.ftrace data source
>>>>>> <https://perfetto.dev/docs/quickstart/linux-tracing>. We use that for
>>>>>> example to overlay CPU sched data to show what process is on each core
>>>>>> throughout the timeline. There are many ftrace event types
>>>>>> <https://github.com/google/perfetto/tree/5ff758df67da94d17734c2e70eb6738c4902953e/protos/perfetto/trace/ftrace>
>>>>>> in
>>>>>> the perfetto protos.
>>>>>>> Can it be extended to show i915 and
>>>>>>> i915-perf-recorder events?
>>>>>> It can be extended to consume custom data sources. One way this is done is
>>>>>> via a bridge daemon, such as traced_probes which is responsible for
>>>>>> capturing data from ftrace and /proc during a trace session and sending it
>>>>>> to traced. traced is the main perfetto tracing daemon that notifies all
>>>>>> trace data sources to start/stop tracing and communicates with user tracing
>>>>>> requests via the 'perfetto' command.
>>>>>>> John Bates <jbates at chromium.org> writes:
>>>>>>>> I recently opened issue 4262
>>>>>>>> <https://gitlab.freedesktop.org/mesa/mesa/-/issues/4262> to begin the
>>>>>>>> discussion on integrating perfetto into mesa.
>>>>>>>> *Background*
>>>>>>>> System-wide tracing is an invaluable tool for developers to find and fix
>>>>>>>> performance problems. The perfetto project enables a combined view of
>>>>>>> trace
>>>>>>>> data from kernel ftrace, GPU driver and various manually-instrumented
>>>>>>>> tracepoints throughout the application and system. This helps developers
>>>>>>>> quickly answer questions like:
>>>>>>>>       - How long are frames taking?
>>>>>>>>       - What caused a particular frame drop?
>>>>>>>>       - Is it CPU bound or GPU bound?
>>>>>>>>       - Did a CPU core frequency drop cause something to go slower than
>>>>>>> usual?
>>>>>>>>       - Is something else running that is stealing CPU or GPU time? Could I
>>>>>>>>       fix that with better thread/context priorities?
>>>>>>>>       - Are all CPU cores being used effectively? Do I need
>>>>>>> sched_setaffinity
>>>>>>>>       to keep my thread on a big or little core?
>>>>>>>>       - What’s the latency between CPU frame submit and GPU start?
>>>>>>>> *What Does Mesa + Perfetto Provide?*
>>>>>>>> Mesa is in a unique position to produce GPU trace data for several GPU
>>>>>>>> vendors without requiring the developer to build and install additional
>>>>>>>> tools like gfx-pps <https://gitlab.freedesktop.org/Fahien/gfx-pps>.
>>>>>>>> The key is making it easy for developers to use. Ideally, perfetto is
>>>>>>>> eventually available by default in mesa so that if your system has
>>>>>>> perfetto
>>>>>>>> traced running, you just need to run perfetto (perhaps along with setting
>>>>>>>> an environment variable) with the mesa categories to see:
>>>>>>>>       - GPU processing timeline events.
>>>>>>>>       - GPU counters.
>>>>>>>>       - CPU events for potentially slow functions in mesa like shader
>>>>>>> compiles.
>>>>>>>> Example of what this data might look like (with fake GPU events):
>>>>>>>> [image: percetto-gpu-example.png]
>>>>>>>> *Runtime Characteristics*
>>>>>>>>       - ~500KB additional binary size. Even with using only the basic
>>>>>>> features
>>>>>>>>       of perfetto, it will increase the binary size of mesa by about 500KB.
>>>>>>>>       - Background thread. Perfetto uses a background thread for
>>>>>>> communication
>>>>>>>>       with the system tracing daemon (traced) to advertise trace data and
>>>>>>> get
>>>>>>>>       notification of trace start/stop.
>>>>>>>>       - Runtime overhead when disabled is designed to be optimal with one
>>>>>>>>       predicted branch, typically a few CPU cycles
>>>>>>>>       <https://perfetto.dev/docs/instrumentation/track-events#performance>
>>>>>>> per
>>>>>>>>       event. While enabled, the overhead can be around 1 us per event.
>>>>>>>> *Integration Challenges*
>>>>>>>>       - The perfetto SDK is C++ and designed around macros, lambdas, inline
>>>>>>>>       templates, etc. There are ongoing discussions on providing an official
>>>>>>>>       perfetto C API, but it is not yet clear when this will land on the
>>>>>>> perfetto
>>>>>>>>       roadmap.
>>>>>>>>       - The perfetto SDK is an amalgamated .h and .cc that adds up to 100K
>>>>>>>>       lines of code.
>>>>>>>>       - Anything that includes perfetto.h takes a long time to compile.
>>>>>>>>       - The current Perfetto SDK design is incompatible with being a shared
>>>>>>>>       library behind a C API.
>>>>>>>> *Percetto*
>>>>>>>> The percetto library <https://github.com/olvaffe/percetto> was recently
>>>>>>>> implemented to provide an interim C API for perfetto. It provides
>>>>>>> efficient
>>>>>>>> support for scoped trace events, multiple categories, counters, custom
>>>>>>>> timestamps, and debug data annotations. Percetto also provides some
>>>>>>>> features that are important to mesa, but not available yet with perfetto
>>>>>>>> SDK:
>>>>>>>>       - Trace events from multiple perfetto instances in separate shared
>>>>>>>>       libraries (like mesa and virglrenderer) show correctly in a single
>>>>>>> process
>>>>>>>>       and thread view.
>>>>>>>>       - Counter tracks and macro API.
>>>>>>>> Percetto is missing API for perfetto's GPU DataSource and counter
>>>>>>> support,
>>>>>>>> but that feature could be implemented next if it is important for mesa.
>>>>>>>> With the existing percetto API mesa could present GPU trace data as named
>>>>>>>> 'slice' events and int64_t counters with custom timestamps as shown in
>>>>>>> the
>>>>>>>> image above (based on this sample
>>>>>>>> <https://github.com/olvaffe/percetto/blob/main/examples/timestamps.c>).
>>>>>>>> *Mesa Integration Alternatives*
>>>>>>>> Note: we have some pressing needs for performance analysis in Chrome OS,
>>>>>>> so
>>>>>>>> I'm intentionally leaving out the alternative of waiting for an official
>>>>>>>> perfetto C API. Of course, once that C API is available it would become
>>>>>>> an
>>>>>>>> option to migrate to it from any of the alternatives below.
>>>>>>>> Ordered by difficulty with easiest first:
>>>>>>>>       1. Statically link with percetto as an optional external dependency
>>>>>>>> (virglrenderer
>>>>>>>>       now has this approach
>>>>>>>>       <
>>>>>>> https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/480>
>>>>>>>>       ).
>>>>>>>>       - Pros: API already supports most common tracing needs. Tested and
>>>>>>> used
>>>>>>>>          by an increasing number of CrOS components.
>>>>>>>>          - Cons: External dependency for optional mesa build option.
>>>>>>>>       2. Embed Perfetto SDK + a Percetto fork/copy.
>>>>>>>>          - Pros: API already supports most common tracing needs. No added
>>>>>>>>          external dependency for mesa.
>>>>>>>>          - Cons: Percetto code divergence, bug fixes need to land in two
>>>>>>> trees.
>>>>>>>>       3. Embed Perfetto SDK + custom C wrapper.
>>>>>>>>          - Pros: Tailored API for mesa's needs.
>>>>>>>>          - Cons: Nontrivial development efforts and maintenance.
>>>>>>>>       4. Generate C stubs for the Perfetto protobuf and reimplement the
>>>>>>>>       Perfetto SDK in C.
>>>>>>>>          - Pros: Tailored API for mesa's needs. Possible smaller binary
>>>>>>> impact
>>>>>>>>          from simpler implementation.
>>>>>>>>          - Cons: Significant development efforts and maintenance.
>>>>>>>> Regardless of the integration direction, I expect we would disable
>>>>>>> perfetto
>>>>>>>> in the default build for now to minimize disruption.
>>>>>>>> I like #1, because there are some nontrivial subtleties to the C wrapper
>>>>>>>> that provide both API conveniences and runtime performance that would
>>>>>>> need
>>>>>>>> to be reimplemented or maintained with the other options. I will also
>>>>>>>> volunteer to do #1 or #2, but I'm not sure I have time for #3 or #4 :D.
>>>>>>>> Any other thoughts on how best to integrate perfetto into mesa?
>>>>>>>> -jb
>>>>>>>> _______________________________________________
>>>>>>>> mesa-dev mailing list
>>>>>>>> mesa-dev at lists.freedesktop.org
>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>>>>> _______________________________________________
>>>>>> mesa-dev mailing list
>>>>>> mesa-dev at lists.freedesktop.org
>>>>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>>>> _______________________________________________
>>>>> mesa-dev mailing list
>>>>> mesa-dev at lists.freedesktop.org
>>>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>>> _______________________________________________
>>>> mesa-dev mailing list
>>>> mesa-dev at lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

More information about the mesa-dev mailing list