[Mesa-dev] Perfetto CPU/GPU tracing
mark.a.janes at intel.com
Sat Feb 13 00:51:08 UTC 2021
Rob Clark <robdclark at gmail.com> writes:
> On Fri, Feb 12, 2021 at 5:01 AM Tamminen, Eero T
> <eero.t.tamminen at intel.com> wrote:
>> On Thu, 2021-02-11 at 17:39 -0800, John Bates wrote:
>> > I recently opened issue 4262
>> > <https://gitlab.freedesktop.org/mesa/mesa/-/issues/4262> to begin the
>> > discussion on integrating perfetto into mesa.
>> > *Background*
>> > System-wide tracing is an invaluable tool for developers to find and
>> > fix
>> > performance problems. The perfetto project enables a combined view of
>> > trace
>> > data from kernel ftrace, GPU driver and various manually-instrumented
>> > tracepoints throughout the application and system.
>> Unlike some other Linux tracing solutions, Perfetto appears to be for
>> Android / Chrome(OS?), and not available from in common Linux distro
> I don't think there is anything about perfetto that would not be
> usable in a generic linux distro.. and mesa support for perfetto would
> perhaps be a compelling reason for distro's to add support
>> So, why Perfetto instead of one of the other solutions, e.g. from ones
>> mentioned here:
>> And, if tracing API is added to Mesa, shouldn't it support also
>> tracepoints for other tracing solutions?
> perfetto does have systrace collectors
> And a general comment on perfetto vs other things.. we end up needing
> to support perfetto regardless (for android and CrOS).. we don't
> *need* to enable it on generic linux, but I think we should (but maybe
> using the mode that does not require a system server.. at least
> initially.. that may limit it's ability to collect systrace and traces
> from other parts of the system, but that wouldn't depend on distro's
> enabling perfetto system server).
Perfetto seems like an awful lot of infrastructure to capture trace
events. Why not follow the example of GPUVis, and write generic
trace_markers to ftrace? It limits impact to Mesa, while allowing any
trace visualizer to use the trace points.
>> I mean, code added to drivers themselves preferably should not have
>> anything perfetto/percetto specific. Tracing system specific code
>> should be only in one place (even if it's just macros in common header).
>> > This helps developers
>> > quickly answer questions like:
>> > - How long are frames taking?
>> That doesn't require any changes to Mesa. Just set uprobe for suitable
>> buffer swap function , and parse kernel ftrace events. This way
>> starting tracing doesn't require even restarting the tracked processes.
> But this doesn't tell you how long the GPU is spending doing what. My
> rough idea is to hook up an optional callback to u_tracepoint so we
> can get generate perfetto traces on the GPU timeline (ie. with
> timestamps captured from GPU), fwiw
I implemented a feature called INTEL_MEASURE based off of a tool that
Ken wrote. It captures render/batch/frame timestamps in a BO, providing
durations on the GPU timeline. It works for Iris and Anv.
The approach provides accurate gpu timing, with minimal stalling. This
data could be presented in Perfetto or GPUVis.
>>  glXSwapBuffers, eglSwapBuffers, eglSwapBuffersWithDamageEXT,
>>  Many apps resolve "vkQueuePresentKHR" Vulkan API loader wrapper
>> function and call the backend function like "anv_QueuePresentKHR"
>> directly, so it's better to track latter instead.
>> > - What caused a particular frame drop?
>> > - Is it CPU bound or GPU bound?
>> That doesn't require adding tracepoints to Mesa, just checking CPU & GPU
>> utilization (which is lower level thing).
>> > - Did a CPU core frequency drop cause something to go slower than
>> > usual?
>> Note that nowadays actual CPU frequencies are often controlled by HW /
>> firmware, so you don't necessarily get any ftrace event from freq
>> change, you would need to poll MSR registers instead (which is
>> privileged operation, and polling can easily miss changes).
>> > - Is something else running that is stealing CPU or GPU time? Could
>> > I
>> > fix that with better thread/context priorities?
>> > - Are all CPU cores being used effectively? Do I need
>> > sched_setaffinity
>> > to keep my thread on a big or little core?
>> I don't think these to require adding tracepoints to Mesa either...
>> > - What’s the latency between CPU frame submit and GPU start?
>> I think this would require tracepoints in kernel GPU code more than in
>> - Eero
>> > *What Does Mesa + Perfetto Provide?*
>> > Mesa is in a unique position to produce GPU trace data for several GPU
>> > vendors without requiring the developer to build and install
>> > additional
>> > tools like gfx-pps <https://gitlab.freedesktop.org/Fahien/gfx-pps>.
>> > The key is making it easy for developers to use. Ideally, perfetto is
>> > eventually available by default in mesa so that if your system has
>> > perfetto
>> > traced running, you just need to run perfetto (perhaps along with
>> > setting
>> > an environment variable) with the mesa categories to see:
>> > - GPU processing timeline events.
>> > - GPU counters.
>> > - CPU events for potentially slow functions in mesa like shader
>> > compiles.
>> > Example of what this data might look like (with fake GPU events):
>> > [image: percetto-gpu-example.png]
>> > *Runtime Characteristics*
>> > - ~500KB additional binary size. Even with using only the basic
>> > features
>> > of perfetto, it will increase the binary size of mesa by about
>> > 500KB.
>> > - Background thread. Perfetto uses a background thread for
>> > communication
>> > with the system tracing daemon (traced) to advertise trace data and
>> > get
>> > notification of trace start/stop.
>> > - Runtime overhead when disabled is designed to be optimal with one
>> > predicted branch, typically a few CPU cycles
>> > <https://perfetto.dev/docs/instrumentation/track-events#performance>
>> > per
>> > event. While enabled, the overhead can be around 1 us per event.
>> > *Integration Challenges*
>> > - The perfetto SDK is C++ and designed around macros, lambdas,
>> > inline
>> > templates, etc. There are ongoing discussions on providing an
>> > official
>> > perfetto C API, but it is not yet clear when this will land on the
>> > perfetto
>> > roadmap.
>> > - The perfetto SDK is an amalgamated .h and .cc that adds up to
>> > 100K
>> > lines of code.
>> > - Anything that includes perfetto.h takes a long time to compile.
>> > - The current Perfetto SDK design is incompatible with being a
>> > shared
>> > library behind a C API.
>> > *Percetto*
>> > The percetto library <https://github.com/olvaffe/percetto> was
>> > recently
>> > implemented to provide an interim C API for perfetto. It provides
>> > efficient
>> > support for scoped trace events, multiple categories, counters, custom
>> > timestamps, and debug data annotations. Percetto also provides some
>> > features that are important to mesa, but not available yet with
>> > perfetto
>> > SDK:
>> > - Trace events from multiple perfetto instances in separate shared
>> > libraries (like mesa and virglrenderer) show correctly in a single
>> > process
>> > and thread view.
>> > - Counter tracks and macro API.
>> > Percetto is missing API for perfetto's GPU DataSource and counter
>> > support,
>> > but that feature could be implemented next if it is important for
>> > mesa.
>> > With the existing percetto API mesa could present GPU trace data as
>> > named
>> > 'slice' events and int64_t counters with custom timestamps as shown in
>> > the
>> > image above (based on this sample
>> > <https://github.com/olvaffe/percetto/blob/main/examples/timestamps.c>)
>> > .
>> > *Mesa Integration Alternatives*
>> > Note: we have some pressing needs for performance analysis in Chrome
>> > OS, so
>> > I'm intentionally leaving out the alternative of waiting for an
>> > official
>> > perfetto C API. Of course, once that C API is available it would
>> > become an
>> > option to migrate to it from any of the alternatives below.
>> > Ordered by difficulty with easiest first:
>> > 1. Statically link with percetto as an optional external dependency
>> > (virglrenderer
>> > now has this approach
>> > <
>> > https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/480
>> > >
>> > ).
>> > - Pros: API already supports most common tracing needs. Tested and
>> > used
>> > by an increasing number of CrOS components.
>> > - Cons: External dependency for optional mesa build option.
>> > 2. Embed Perfetto SDK + a Percetto fork/copy.
>> > - Pros: API already supports most common tracing needs. No added
>> > external dependency for mesa.
>> > - Cons: Percetto code divergence, bug fixes need to land in two
>> > trees.
>> > 3. Embed Perfetto SDK + custom C wrapper.
>> > - Pros: Tailored API for mesa's needs.
>> > - Cons: Nontrivial development efforts and maintenance.
>> > 4. Generate C stubs for the Perfetto protobuf and reimplement the
>> > Perfetto SDK in C.
>> > - Pros: Tailored API for mesa's needs. Possible smaller binary
>> > impact
>> > from simpler implementation.
>> > - Cons: Significant development efforts and maintenance.
>> > Regardless of the integration direction, I expect we would disable
>> > perfetto
>> > in the default build for now to minimize disruption.
>> > I like #1, because there are some nontrivial subtleties to the C
>> > wrapper
>> > that provide both API conveniences and runtime performance that would
>> > need
>> > to be reimplemented or maintained with the other options. I will also
>> > volunteer to do #1 or #2, but I'm not sure I have time for #3 or #4
>> > :D.
>> > Any other thoughts on how best to integrate perfetto into mesa?
>> > -jb
>> > _______________________________________________
>> > mesa-dev mailing list
>> > mesa-dev at lists.freedesktop.org
>> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
More information about the mesa-dev