[Mesa-dev] Perfetto CPU/GPU tracing

Thu Feb 18 11:17:56 UTC 2021

Hey folks,
I'm one of the authors and maintainers of Perfetto, also +skyostil at .
I am really sorry for the giant bulk reply. I'll try to do my best to 
answer the various open questions about Perfetto but I don't know a 
better way than some heavy <snip>-ing given I'm joining the party late.

In short:

- Yep, so far the only distribution we have for the SDK is a C++ 
amalgamation. I am aware that it isn't great for Linux OSS projects, it 
was very optimized for Google projects that have the habit of statically 
linking everything.

- There are plans to move beyond that and have a stable C API (docs 
linked below). But that will take us quite some time. We should probably 
figure out some intermediate solution meanwhile.

- I'd be really keen to learn how Mesa is intending to do tracing. That 
can influence a lot our upcoming design. Begin/end markers are IMO the 
least interesting thing as they tend to work in whatever form and are 
easy to abstract. Richer/structured trace points like 
https://github.com/google/perfetto/tree/master/protos/perfetto/trace/gpu 
( currently used by Android GPU drivers [1]) are more interesting and 
where most of the challenges lie.

- Maybe the discussion here needs to be split into: (1) a shorter-term 
plan to iterate, figure out what works, what doesn't, see how the end 
result looks like; (2) a longer term plan and on how the API surface 
should look like.

I don't have strong opinions on how Mesa should proceed here and you 
don't need an extra cook in the kitchen. If I really had to express a 
handwavy opinion, my best advice would be to start with something you 
can iterate on right now, maybe behind some compile-time flag, and come 
up with a plan on how to turn into a production thing later on. We are 
interested to hear your feedback and adjust the design of our stable C API.

[1] 
https://android-developers.googleblog.com/2020/09/android-gpu-inspector-open-beta.html?m=1

On the tracing library / C++ vendoring / stable C API:

The way the Perfetto SDK is organized today is mainly influenced-by and 
designed-for the way Google handles its projects, which boils down to: 
(i) statically link everything, to minimize the testing matrix; (ii) 
move fast and refactor all dependencies as needed.
It's all about "who pays the maintenance cost and when?". This tends to 
work well in a large company which: (i) has a giant repo which allows 
~atomic cross-project changes; (ii) has the resources to keep everything 
up to date.
I am perfectly aware this is not appealing nor ideal for open source 
projects and, more in general, with the way libraries in Linux 
distributions work. I hear you when you say "vendoring [...] seems a bad 
idea". Yes, it implicitly pushes the burden of up-revving onto the 
"depender" [that's bad]
We are committed to maintain ABI stability of our tracing protocol and 
socket (see https://perfetto.dev/docs/design-docs/api-and-abi). This is 
because Chrome, Android, and now CrOS, and tools like gpuinspector.dev, 
which all statically link perfetto in some form, have strongly different 
release cycles. [that's good]
We also try to not break the C++ API too much, as robdclark@ found 
trying to update through our v3..v12 monthly releases [that's good]. But 
that C++ API has a too wide surface and we can't commit to make that a 
fully stable API. Nor can we make the current C++ ABI stable across a 
.so boundary (the C++ SDK today heavily depends on inlines to allow the 
compiler to see through the layers). [that's bad]

For this reason, we recently started making plans to change the way we 
do things to meet the needs of open source projects and not just 
Google-internal ones. [that's good]
Specifically (Note: to open the docs below you need to join 
https://groups.google.com/forum/#!forum/perfetto-dev to inherit the ACLs):

1. https://bit.ly/perfetto-debian has a plan to distribute tracing 
services and SDK as standard pkg-config based packages (at least for 
Debian. We'll rely on volunteers for other distros)

2. https://bit.ly/perfetto-c has a plan + ongoing discussion for having 
a long-term stable C API in a libperfetto.so . The key here for us 
(Perfetto) is identifying a subset of the wider C++ API that fits the 
bill for projects out there and that we are comfortable maintaining 
longer term.

The one thing I also need to be very clear on, though, is that both the 
perfetto-debian and perfetto-c discussions are very recent and will take 
a while for us to get there. We can't commit to a specific timeline 
right now, but if I had to make an educated estimate I'd say more 
towards end-of-2021. [that's bad]
I'd also be more keen to commit once there are concrete use-cases, 
ideally with iterations/feedback from a project like Mesa.

[obligatory reference at this point: https://youtu.be/Krbl911ZPBA?t=22]

 > eero.t.tamminen@
 > Just set uprobe for suitable buffer swap function [1], and parse 
kernel ftrace events. (paraphrasing for context: "why do we need 
instrumentation points? we can't we just use uprobes instead?")

The problem with uprobes is:
1. It's linux specific. Perhaps not a big problem for Mesa here, but the 
reason why we didn't go there with Perfetto, at least until now, is that 
we need to support all major OSes (Linux, CrOS, Android, Windows, macOS).
2. Even on Linux-based systems, it's really hard to have uprobes enabled 
in production (I am not sure what is the situation for CrOS). In Google, 
we care a lot about being able to trace from production devices without 
reflashing them with dev images, because then we can just tell people 
that are experiencing problems "can you just open chrome://tracing, 
click on a button and give us debugging data?". Re-flashing reduces by 
orders of magnitude the actionable feedback we'd be able to get from 
users / testers.
The challenge of ubprobes is that it relies on dynamic rewriting of 
.text pages. Whenever I mention that, a platform security team reacts 
like the Frau Blucher horses (https://youtu.be/bps5hJ5DQDw?t=10), with 
understandable reasons.

 > eero.t.tamminen@
 > Perfetto appears to be for Android / Chrome(OS?), and not available 
from in common Linux distro repos.

That's right. I am aware of the problem. The plan is to address it with 
bit.ly/perfetto-debian as a starter.

 > mark.a.janes@
 > Perfetto seems like an awful lot of infrastructure to capture trace 
events.  Why not follow the example of GPUVis, and write generic 
trace_markers to ftrace?

In my experience ftrace's trace_marker:
1. Works for very simple types of events (e.g. 
begin-slice/end-slice/counters) but don't work with richer / structured 
event types like the ones linked above, as that gets into 
stringification format, mashaling costs and interop.
2. Writing into the marker has some non-trivial cost (IIRC 1-10 us on 
Android), it involves a kernel into and back from the kernel;
3. It leads to one global ring buffer, where fast events push out slower 
ones, which is particularly problematic if you ever enable sched/* 
events. At the userspace level, instead, we can dynamically route 
different events to different buffers to mitigate this problem (see 
https://perfetto.dev/docs/concepts/config#dynamic-buffer-mapping)

 > dylan@
 > But maybe we should verify that MSVC is happy with it

FYI I have some work in progress to fully support the tracing protocol 
on Windows. With https://r.android.com/1539396, the tracing services and 
the tracing SDK build and run with both clang-cl and MSVC 2019. We are 
figuring out some final details (e.g., whether to use AF_UNIX and 
support only Win10+, or use a TCP socket)

 > robdclark@
 > AFAIU, perfetto can operate in self-server mode, which seems like it 
would be useful for distros which do not have the system daemon.

That's correct. Which is also a reason why the amalgamated source is 
that large, as it also contains all the bits to run the service 
in-process. (If you link with -Wl,--gc-sections or equivalent, they will 
be stripped away though)
See 
https://perfetto.dev/docs/instrumentation/tracing-sdk#in-process-vs-system-mode

 >mark.a.janes@
 >Is it possible to build and run the Perfetto UI locally?

Yes, the UI is fully open source and fully client-only. It can be built 
from the same repo by just running `tools/install-build-deps --ui`  + 
`./ui/build --serve` . See 
https://perfetto.dev/docs/contributing/build-instructions#ui-development

 > Can it display arbitrary trace events that are written to 
/sys/kernel/tracing/trace_marker

Today the only thing that the UI displays, w.r.t trace_marker, are 
events that match the syntax that Android's systrace came up with back 
then. See 
https://cs.android.com/android/platform/superproject/+/master:system/core/libcutils/trace-dev.cpp, 
it boils down to a format like "B|$TGID|event_name" to begin a slice, 
"E|..." to end it and so on.
Curious to hear about other uses of the trace marker and consider 
importing other formats.

 > Can it be extended to show i915 and i915-perf-recorder events?

It depends on what you want to extend:
1.  If you want to extend the import logic and map custom events to 
existing UI concepts, you just need to touch the C++ code in 
//src/trace_processor (that code runs in the Web UI on the client via 
WebAssembly). Good starting examples are:
i. the code that imports ninja-build logs, which is a completely custom 
format, unrelated with perfetto's proto format or ftrace: 
https://cs.android.com/android/platform/superproject/+/master:external/perfetto/src/trace_processor/importers/ninja/ninja_log_parser.cc

ii. the code that deals with way android's usage of the trace_marker 
(those "B|<tgid>|name" mentioned above): 
https://cs.android.com/android/platform/superproject/+/master:external/perfetto/src/trace_processor/importers/systrace/

We are generally happy to accept patches for custom importers, as long 
as they have some testing (an input trace and expected output from 
queries) so we can tell if we break it while refactoring our internal code.

2. If you want to extend the UI logic with custom widgets
This is the part that is tough today and requires a lot of insider 
knowledge. All the code is there, but today is not really architected in 
a contributor-friendly way. We have a longer term plan to allow 
customization of the UI with some extension API on the JS/TS layer, but 
it's still early days for that and getting there will take longer.

 >dylan@
 > especially if the ui code stops working with our forked version

1. You can always build a blessed version of the UI and host it on any 
HTTP server of your choice. Our /ui/build generates a fully static 
HTML+Js+Wasm site. It doesn't need any server-side magic. It doesn't 
have dependencies on Google infra. Even just `python -m http.server` 
will do.

2. We are changing the way our UI deployment works (this month) and will 
always leave the old versions around. This means that, while we will 
keep up-revving ~monthly the main UI instance @ https://ui.perfetto.dev, 
we will allow people to go back to older versions via a link like 
https://ui.perfetto.dev/v1.2.3/. You can see this in action on
https://testing-dot-perfetto-ui.wl.r.appspot.com/v12.1.172/ which I just 
pushed last week to test this new deployment mechanism.

Happy to talk more about all this, either here on the list or on any 
other medium.

If you have some more perfetto-related questions/comment/criticism feel 
free to drop by our discord channel https://discord.gg/35ShE3A or ML 
(https://groups.google.com/g/perfetto-dev).