[Mesa-dev] Perfetto CPU/GPU tracing

Rob Clark robdclark at gmail.com
Sat Feb 13 02:20:58 UTC 2021


On Fri, Feb 12, 2021 at 5:56 PM Lionel Landwerlin
<lionel.g.landwerlin at intel.com> wrote:
>
> On 13/02/2021 03:38, Rob Clark wrote:
> > On Fri, Feb 12, 2021 at 5:08 PM Lionel Landwerlin
> > <lionel.g.landwerlin at intel.com> wrote:
> >> We're kind of in the same boat for Intel.
> >>
> >> Access to GPU perf counters is exclusive to a single process if you want
> >> to build a timeline of the work (because preemption etc...).
> > ugg, does that mean extensions like AMD_performance_monitor doesn't
> > actually work on intel?
>
>
> It work,s but only a single app can use it at a time.
>

I see.. on the freedreno side we haven't really gone down the
preemption route yet, but we have a way to hook in some safe/restore
cmdstream

>
> >
> >> The best information we could add from mesa would a timestamp of when a
> >> particular drawcall started.
> >> But that's pretty much when timestamps queries are.
> >>
> >> Were you thinking of particular GPU generated data you don't get from
> >> gfx-pps?
> > >From the looks of it, currently I don't get *any* GPU generated data
> > from gfx-pps ;-)
>
>
> Maybe file a bug? :
> https://gitlab.freedesktop.org/Fahien/gfx-pps/-/blob/master/src/gpu/intel/intel_driver.cc
>
>
> >
> > We can ofc sample counters from a separate process as well... I have a
> > curses tool (fdperf) which does this.. but running outside of gpu
> > cmdstream plus counters losing context across suspend/resume makes it
> > less than perfect.
>
>
> Our counters are global so to give per application values, we need to
> post process a stream of HW counter snapshots.
>
>
> >    And something that works the same way as
> > AMD_performance_monitor under the hook gives a more precise look at
> > which shaders (for ex) are consuming the most cycles.
>
>
> In our implementation that precision (in particular when a drawcall
> ends) comes at a stalling cost unfortunately.

yeah, stalling on our end too for per-draw counter snapshots.. but if
you are looking for which shaders to optimize that doesn't matter
*that* much.. they'll be some overhead, but it's not really going to
change which draws/shaders are expensive.. just mean that you lose out
on pipelining of the state changes

BR,
-R

>
> >    For cases where
> > we can profile a trace, frameretrace and related tools is pretty
> > great.. but it would be nice to have similar visibility for actual
> > games (which for me, mostly means android games, since so far no
> > aarch64 steam store), but also give game developers good tools (or at
> > least the same tools that they get with other closed src drivers on
> > android).
>
>
> Sure, but frame analysis is different than live monitoring of the system.
>
> On Intel's HW you don't get the same level of details in both cases, and
> apart for a few timestamps, I think gfx-pps is as good as you gonna get
> for live stuff.
>
>
> -Lionel
>
>
> >
> > BR,
> > -R
> >
> >> Thanks,
> >>
> >> -Lionel
> >>
> >>
> >> On 13/02/2021 00:12, Alyssa Rosenzweig wrote:
> >>> My 2c for Mali/Panfrost --
> >>>
> >>> For us, capturing GPU perf counters is orthogonal to rendering. It's
> >>> expected (e.g. with Arm's tools) to do this from a separate process.
> >>> Neither Mesa nor the DDK should require custom instrumentation for the
> >>> low-level data. Fahien's gfx-pps handles this correctly for Panfrost +
> >>> Perfetto as it is. So for us I don't see the value in modifying Mesa for
> >>> tracing.
> >>>
> >>> On Fri, Feb 12, 2021 at 01:34:51PM -0800, John Bates wrote:
> >>>> (responding from correct address this time)
> >>>>
> >>>> On Fri, Feb 12, 2021 at 12:03 PM Mark Janes <mark.a.janes at intel.com> wrote:
> >>>>
> >>>>> I've recently been using GPUVis to look at trace events.  On Intel
> >>>>> platforms, GPUVis incorporates ftrace events from the i915 driver,
> >>>>> performance metrics from igt-gpu-tools, and userspace ftrace markers
> >>>>> that I locally hack up in Mesa.
> >>>>>
> >>>> GPUVis is great. I would love to see that data combined with
> >>>> userspace events without any need for local hacks. Perfetto provides
> >>>> on-demand trace events with lower overhead compared to ftrace, so for
> >>>> example it is acceptable to have production trace instrumentation that can
> >>>> be captured without dev builds. To do that with ftrace it may require a way
> >>>> to enable and disable the ftrace file writes to avoid the overhead when
> >>>> tracing is not in use. This is what Android does with systrace/atrace, for
> >>>> example, it uses Binder to notify processes about trace sessions. Perfetto
> >>>> does that in a more portable way.
> >>>>
> >>>>
> >>>>> It is very easy to compile the GPUVis UI.  Userspace instrumentation
> >>>>> requires a single C/C++ header.  You don't have to access an external
> >>>>> web service to analyze trace data (a big no-no for devs working on
> >>>>> preproduction hardware).
> >>>>>
> >>>>> Is it possible to build and run the Perfetto UI locally?
> >>>> Yes, local UI builds are possible
> >>>> <https://github.com/google/perfetto/blob/5ff758df67da94d17734c2e70eb6738c4902953e/ui/README.md>.
> >>>> Also confirmed with the perfetto team <https://discord.gg/35ShE3A> that
> >>>> trace data is not uploaded unless you use the 'share' feature.
> >>>>
> >>>>
> >>>>>     Can it display
> >>>>> arbitrary trace events that are written to
> >>>>> /sys/kernel/tracing/trace_marker ?
> >>>> Yes, I believe it does support that via linux.ftrace data source
> >>>> <https://perfetto.dev/docs/quickstart/linux-tracing>. We use that for
> >>>> example to overlay CPU sched data to show what process is on each core
> >>>> throughout the timeline. There are many ftrace event types
> >>>> <https://github.com/google/perfetto/tree/5ff758df67da94d17734c2e70eb6738c4902953e/protos/perfetto/trace/ftrace>
> >>>> in
> >>>> the perfetto protos.
> >>>>
> >>>>
> >>>>> Can it be extended to show i915 and
> >>>>> i915-perf-recorder events?
> >>>>>
> >>>> It can be extended to consume custom data sources. One way this is done is
> >>>> via a bridge daemon, such as traced_probes which is responsible for
> >>>> capturing data from ftrace and /proc during a trace session and sending it
> >>>> to traced. traced is the main perfetto tracing daemon that notifies all
> >>>> trace data sources to start/stop tracing and communicates with user tracing
> >>>> requests via the 'perfetto' command.
> >>>>
> >>>>
> >>>>
> >>>>> John Bates <jbates at chromium.org> writes:
> >>>>>
> >>>>>> I recently opened issue 4262
> >>>>>> <https://gitlab.freedesktop.org/mesa/mesa/-/issues/4262> to begin the
> >>>>>> discussion on integrating perfetto into mesa.
> >>>>>>
> >>>>>> *Background*
> >>>>>>
> >>>>>> System-wide tracing is an invaluable tool for developers to find and fix
> >>>>>> performance problems. The perfetto project enables a combined view of
> >>>>> trace
> >>>>>> data from kernel ftrace, GPU driver and various manually-instrumented
> >>>>>> tracepoints throughout the application and system. This helps developers
> >>>>>> quickly answer questions like:
> >>>>>>
> >>>>>>      - How long are frames taking?
> >>>>>>      - What caused a particular frame drop?
> >>>>>>      - Is it CPU bound or GPU bound?
> >>>>>>      - Did a CPU core frequency drop cause something to go slower than
> >>>>> usual?
> >>>>>>      - Is something else running that is stealing CPU or GPU time? Could I
> >>>>>>      fix that with better thread/context priorities?
> >>>>>>      - Are all CPU cores being used effectively? Do I need
> >>>>> sched_setaffinity
> >>>>>>      to keep my thread on a big or little core?
> >>>>>>      - What’s the latency between CPU frame submit and GPU start?
> >>>>>>
> >>>>>> *What Does Mesa + Perfetto Provide?*
> >>>>>>
> >>>>>> Mesa is in a unique position to produce GPU trace data for several GPU
> >>>>>> vendors without requiring the developer to build and install additional
> >>>>>> tools like gfx-pps <https://gitlab.freedesktop.org/Fahien/gfx-pps>.
> >>>>>>
> >>>>>> The key is making it easy for developers to use. Ideally, perfetto is
> >>>>>> eventually available by default in mesa so that if your system has
> >>>>> perfetto
> >>>>>> traced running, you just need to run perfetto (perhaps along with setting
> >>>>>> an environment variable) with the mesa categories to see:
> >>>>>>
> >>>>>>      - GPU processing timeline events.
> >>>>>>      - GPU counters.
> >>>>>>      - CPU events for potentially slow functions in mesa like shader
> >>>>> compiles.
> >>>>>> Example of what this data might look like (with fake GPU events):
> >>>>>> [image: percetto-gpu-example.png]
> >>>>>>
> >>>>>> *Runtime Characteristics*
> >>>>>>
> >>>>>>      - ~500KB additional binary size. Even with using only the basic
> >>>>> features
> >>>>>>      of perfetto, it will increase the binary size of mesa by about 500KB.
> >>>>>>      - Background thread. Perfetto uses a background thread for
> >>>>> communication
> >>>>>>      with the system tracing daemon (traced) to advertise trace data and
> >>>>> get
> >>>>>>      notification of trace start/stop.
> >>>>>>      - Runtime overhead when disabled is designed to be optimal with one
> >>>>>>      predicted branch, typically a few CPU cycles
> >>>>>>      <https://perfetto.dev/docs/instrumentation/track-events#performance>
> >>>>> per
> >>>>>>      event. While enabled, the overhead can be around 1 us per event.
> >>>>>>
> >>>>>> *Integration Challenges*
> >>>>>>
> >>>>>>      - The perfetto SDK is C++ and designed around macros, lambdas, inline
> >>>>>>      templates, etc. There are ongoing discussions on providing an official
> >>>>>>      perfetto C API, but it is not yet clear when this will land on the
> >>>>> perfetto
> >>>>>>      roadmap.
> >>>>>>      - The perfetto SDK is an amalgamated .h and .cc that adds up to 100K
> >>>>>>      lines of code.
> >>>>>>      - Anything that includes perfetto.h takes a long time to compile.
> >>>>>>      - The current Perfetto SDK design is incompatible with being a shared
> >>>>>>      library behind a C API.
> >>>>>>
> >>>>>> *Percetto*
> >>>>>>
> >>>>>> The percetto library <https://github.com/olvaffe/percetto> was recently
> >>>>>> implemented to provide an interim C API for perfetto. It provides
> >>>>> efficient
> >>>>>> support for scoped trace events, multiple categories, counters, custom
> >>>>>> timestamps, and debug data annotations. Percetto also provides some
> >>>>>> features that are important to mesa, but not available yet with perfetto
> >>>>>> SDK:
> >>>>>>
> >>>>>>      - Trace events from multiple perfetto instances in separate shared
> >>>>>>      libraries (like mesa and virglrenderer) show correctly in a single
> >>>>> process
> >>>>>>      and thread view.
> >>>>>>      - Counter tracks and macro API.
> >>>>>>
> >>>>>> Percetto is missing API for perfetto's GPU DataSource and counter
> >>>>> support,
> >>>>>> but that feature could be implemented next if it is important for mesa.
> >>>>>> With the existing percetto API mesa could present GPU trace data as named
> >>>>>> 'slice' events and int64_t counters with custom timestamps as shown in
> >>>>> the
> >>>>>> image above (based on this sample
> >>>>>> <https://github.com/olvaffe/percetto/blob/main/examples/timestamps.c>).
> >>>>>>
> >>>>>> *Mesa Integration Alternatives*
> >>>>>>
> >>>>>> Note: we have some pressing needs for performance analysis in Chrome OS,
> >>>>> so
> >>>>>> I'm intentionally leaving out the alternative of waiting for an official
> >>>>>> perfetto C API. Of course, once that C API is available it would become
> >>>>> an
> >>>>>> option to migrate to it from any of the alternatives below.
> >>>>>>
> >>>>>> Ordered by difficulty with easiest first:
> >>>>>>
> >>>>>>      1. Statically link with percetto as an optional external dependency
> >>>>>> (virglrenderer
> >>>>>>      now has this approach
> >>>>>>      <
> >>>>> https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/480>
> >>>>>>      ).
> >>>>>>      - Pros: API already supports most common tracing needs. Tested and
> >>>>> used
> >>>>>>         by an increasing number of CrOS components.
> >>>>>>         - Cons: External dependency for optional mesa build option.
> >>>>>>      2. Embed Perfetto SDK + a Percetto fork/copy.
> >>>>>>         - Pros: API already supports most common tracing needs. No added
> >>>>>>         external dependency for mesa.
> >>>>>>         - Cons: Percetto code divergence, bug fixes need to land in two
> >>>>> trees.
> >>>>>>      3. Embed Perfetto SDK + custom C wrapper.
> >>>>>>         - Pros: Tailored API for mesa's needs.
> >>>>>>         - Cons: Nontrivial development efforts and maintenance.
> >>>>>>      4. Generate C stubs for the Perfetto protobuf and reimplement the
> >>>>>>      Perfetto SDK in C.
> >>>>>>         - Pros: Tailored API for mesa's needs. Possible smaller binary
> >>>>> impact
> >>>>>>         from simpler implementation.
> >>>>>>         - Cons: Significant development efforts and maintenance.
> >>>>>>
> >>>>>> Regardless of the integration direction, I expect we would disable
> >>>>> perfetto
> >>>>>> in the default build for now to minimize disruption.
> >>>>>>
> >>>>>> I like #1, because there are some nontrivial subtleties to the C wrapper
> >>>>>> that provide both API conveniences and runtime performance that would
> >>>>> need
> >>>>>> to be reimplemented or maintained with the other options. I will also
> >>>>>> volunteer to do #1 or #2, but I'm not sure I have time for #3 or #4 :D.
> >>>>>>
> >>>>>> Any other thoughts on how best to integrate perfetto into mesa?
> >>>>>>
> >>>>>> -jb
> >>>>>> _______________________________________________
> >>>>>> mesa-dev mailing list
> >>>>>> mesa-dev at lists.freedesktop.org
> >>>>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >>>> _______________________________________________
> >>>> mesa-dev mailing list
> >>>> mesa-dev at lists.freedesktop.org
> >>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >>> _______________________________________________
> >>> mesa-dev mailing list
> >>> mesa-dev at lists.freedesktop.org
> >>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >>
> >> _______________________________________________
> >> mesa-dev mailing list
> >> mesa-dev at lists.freedesktop.org
> >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
>


More information about the mesa-dev mailing list