[Mesa-dev] Perfetto CPU/GPU tracing

Rob Clark robdclark at gmail.com
Sat Feb 13 02:00:20 UTC 2021


On Fri, Feb 12, 2021 at 5:40 PM John Bates <jbates at chromium.org> wrote:
>
>
>
> On Fri, Feb 12, 2021 at 4:34 PM Rob Clark <robdclark at gmail.com> wrote:
>>
>> On Thu, Feb 11, 2021 at 5:40 PM John Bates <jbates at chromium.org> wrote:
>> >
>>
>> <snip>
>>
>> > Runtime Characteristics
>> >
>> > ~500KB additional binary size. Even with using only the basic features of perfetto, it will increase the binary size of mesa by about 500KB.
>>
>> IMHO, that size is negligible.. looking at freedreno, a mesa build
>> *only* enabling freedreno is already ~6MB.. distros typically use
>> "megadriver" (ie. all the drivers linked into a single .so with hard
>> links for the different  ${driver}_dri.so), which on my fedora laptop
>> is ~21M.  Maybe if anything is relevant it is how much of that
>> actually gets paged into RAM from disk, but I think 500K isn't a thing
>> to worry about too much.
>>
>> > Background thread. Perfetto uses a background thread for communication with the system tracing daemon (traced) to advertise trace data and get notification of trace start/stop.
>>
>> Mesa already tends to have plenty of threads.. some of that depends on
>> the driver, I think currently radeonsi is the threading king, but
>> there are several other drivers working on threaded_context and async
>> compile thread pool.
>>
>> It is worth mentioning that, AFAIU, perfetto can operate in
>> self-server mode, which seems like it would be useful for distros
>> which do not have the system daemon.  I'm not sure if we lose that
>> with percetto?
>
>
> Easy to add, but want to avoid a runtime arg because it would add ~300KB to binary size. Okay if we have an alternate init function though.

I think I could imagine wanting mesa build params to control whether
we want self-server or system-server mode.. ie. if some distros add
system-server support they wouldn't need self-server mode and visa
versa

>
>>
>>
>> > Runtime overhead when disabled is designed to be optimal with one predicted branch, typically a few CPU cycles per event. While enabled, the overhead can be around 1 us per event.
>> >
>> > Integration Challenges
>> >
>> > The perfetto SDK is C++ and designed around macros, lambdas, inline templates, etc. There are ongoing discussions on providing an official perfetto C API, but it is not yet clear when this will land on the perfetto roadmap.
>> > The perfetto SDK is an amalgamated .h and .cc that adds up to 100K lines of code.
>> > Anything that includes perfetto.h takes a long time to compile.
>> > The current Perfetto SDK design is incompatible with being a shared library behind a C API.
>>
>> So, C++ on it's own isn't a showstopper, mesa has plenty of C++ code.
>> But maybe we should verify that MSVC is happy with it, otherwise we
>> need to take a bit more care in some parts of the codebase.
>>
>> As far as compile time, I wonder if we can regenerate the .cc/.h with
>> only the gpu trace parts?  But I wouldn't expect the .h to be
>> something widely included.  For example, for gpu timeline traces in
>> freedreno, I'm expecting it to look like a freedreno_perfetto.cc with
>> extern "C" {} around the callbacks that would hook into the
>> u_tracepoint tracepoints.  That one file would pull in the perfetto
>> .h, and we'd just not build that file if perfetto was disabled.
>
>
> That works for GPU, but I'd like to see some slow CPU functions in traces as well to help reason about performance problems. This ends up peppering the trace header in lots of places.

My point was that we could strip out a whole lot of stuff that is
completely unrelated to mesa.. not sure if it is worth bothering with,
I doubt we'd #include perfetto.h very widely

>> Overall having to add our own extern C wrappers in some places doesn't
>> seem like the *end* of the world.. a bit annoying, but we might end up
>> doing that regardless if other folks want the ability to hook in
>> something other than perfetto?
>
>
> It's more than extern C wrappers if we want to minimize overhead while tracing enabled at compile time. Have a look at percetto.h/cc.

I'm not sure how many distros are not using LTO these days.. I assume
once you have LTO it doesn't really matter anymore?

>>
>>
>> <snip>
>>
>> > Mesa Integration Alternatives
>>
>> I'm kind of leaning towards the "just slurp in the .cc/.h" approach..
>> that is mostly because I expect to initially just add some basic gpu
>> timeline tracepoints, but over time iterate on adding more.. it would
>> be nice to not have to depend on a newer version of an external
>> library at each step.  That is ofc only my $0.02..
>
>
> It's a small initial setup tax, true, but I still think it depends on what perfetto features we plan to use -- for only a couple files doing GPU tracing I agree percetto is unnecessary, but for CPU tracing it gets more complicated.

Definitely the first thing I plan to use is getting render stages onto
a timeline, so I can better see where the GPU time is going.. second
step is probably adding more gpu perfcntr.. and I guess the third
thing is more CPU oriented things like seeing where shader compiles
are happening.  Although threaded_context might also be a thing where
having some more CPU tracing could be useful?

BR,
-R


More information about the mesa-dev mailing list