[Mesa-dev] Perfetto CPU/GPU tracing

Thu Feb 18 15:40:24 UTC 2021

On 18/02/2021 14:35, Tamminen, Eero T wrote:
> Hi,
> 
> On Thu, 2021-02-18 at 12:17 +0100, Primiano Tucci wrote:
> [discussion about Perfetto itself]
> ...
>> eero.t.tamminen@
>> from in common Linux distro repos.
>>
>> That's right. I am aware of the problem. The plan is to address it
>> with
>> bit.ly/perfetto-debian as a starter.
> 
> Glad to hear something is planned for making things easier for distros!
> 
> 
>>> eero.t.tamminen@
>>> Just set uprobe for suitable buffer swap function [1], and parse
>> kernel ftrace events. (paraphrasing for context: "why do we need
>> instrumentation points? we can't we just use uprobes instead?")
>>
>> The problem with uprobes is:
>> 1. It's linux specific. Perhaps not a big problem for Mesa here, but the
>> reason why we didn't go there with Perfetto, at least until now, is that
>> we need to support all major OSes (Linux, CrOS, Android, Windows,
>> macOS).
> 
> The main point here is that uprobe works already without any
> modifications to any libraries (I have script that has been used for FPS
> tracking of daily 3D stack builds for many years).

Marking begin/end of function with timestamp is the easy part and you 
can do with an arbitrarily large set of tracing/debugging tools. In my 
experience things get more interesting when you want to start dumping 
the state of entire subsystems in the trace, so you can reason about 
them after the fact when looking at the trace timeline.

E.g. instrumentation points like this which go beyond the notion of 
dynamic breakpoints like this:

https://chromium.googlesource.com/chromium/src/+/4f331b42066d4e729c28facaa9bd9d4c33c6bbfd/components/viz/common/quads/compositor_render_pass.cc#116

which eventually allow to get tracing features like:
https://www.chromium.org/developers/how-tos/trace-event-profiling-tool/using-frameviewer 

> 
> And other OSes already offer similar functionality.  E.g. DTrace should
> be available both for Mac & Windows.
> 
> 
> Btw. If really needed, you could even implement similar functionality in
> user space on other OSes.    We did something similar at Nokia using
> ptrace before uprobes was a thing:
> 	https://github.com/maemo-tools-old/functracer
> 
> (While possible, because of the need for some tricky architecture
> specific assembly, and certain instructions needing some extra assembly
> fixups when replaced by a breakpoint, it's unlikely to be feasible for
> general tracing though.)
> 
> 
>> 2. Even on Linux-based systems, it's really hard to have uprobes enabled
>> in production (I am not sure what is the situation for CrOS).
>>
>>   In Google,
>> we care a lot about being able to trace from production devices
>> without
>> reflashing them with dev images, because then we can just tell people
>> that are experiencing problems "can you just open chrome://tracing,
>> orders of magnitude the actionable feedback we'd be able to get from
>> users / testers.
> 
> I would think having tracepoint code in the libraries themselves and
> generally enabled for some specific tracing solution like Perfetto, is
> *much* more unlikely.

IMO this one of the key point that you (Mesa) folks need to discuss 
here: whether (i) the trace points are directly tied to Perfetto (or any 
other tracing system) API (I buy the skepticism given the current state 
of things) or (ii) you have some mesa-specific tracing abstraction layer 
and you wire up Perfetto (or whatever else) in some "Mesa tracing 
backend impl", so the dependency surface is minimized.
In my experience, (ii) tends to be a bit more appealing but its 
feasibility depends on "what" do you want to trace, i.e. how much your 
instrumentation points look like begin/end markers and counters (easy) 
or full object state like the link above, in which case the risk is that 
you'll end up doing lot of boilerplate code and double-copies for state 
objects to avoid the direct deps.

Perhaps the best way is to have snippets of code to see how that would 
look like.

> 
> At least both Ubuntu and Fedora default kernels have had uprobes built
> in for *many* years.
> 
> Up to date Fedora 33 kernel:
> $ grep UPROBE /boot/config-5.10.15-200.fc33.x86_64
> CONFIG_ARCH_SUPPORTS_UPROBES=y
> CONFIG_UPROBES=y
> CONFIG_UPROBE_EVENTS=y
> 
> Same on up to date Ubuntu 20.04:
> $ grep UPROBE /boot/config-5.4.0-65-generic
> CONFIG_ARCH_SUPPORTS_UPROBES=y
> CONFIG_UPROBES=y
> CONFIG_UPROBE_EVENTS=y
>

Somebody more knowledgeable about CrOS should chime in, but from a 
codesearch, I don't think they are enabled on CrOS:

https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/third_party/kernel/v5.4/arch/x86/configs/chromiumos-jail-vm-x86_64_defconfig;l=254?q=CONFIG_UPROBES%20-%22%23if%22%20-%22obj-%22&sq=&ss=chromiumos%2Fchromiumos%2Fcodesearch:src%2Fthird_party%2Fkernel%2Fv5.4%2F

> 
>> The challenge of ubprobes is that it relies on dynamic rewriting of
>> .text pages. Whenever I mention that, a platform security team reacts
>> like the Frau Blucher horses (https://youtu.be/bps5hJ5DQDw?t=10), with
>> understandable reasons.
> 
> I'm not sure you've given them accurate picture of it.
> 
> It doesn't require executable code to be writable from user-space,
> library code can remain read-only because kernel can toggle relevant
> page writable for uprobe breakpoint setup and back.

The problem is not who rewrites the .text pages (although, yes, I agree 
that the kernel doing this is better than userspace doing it). The 
problem is:

1. Losing the ability to verify the integrity of system executables. 
Once they diverge from the corresponding file pages on disk, you can't 
tell if some malware/rootkit did alter them or uprobes did. Effectively 
you lose the ability to verify the full chain of bootloader -> system 
image -> file integrity.

2. In general, a mechanism that allows dynamic rewriting of code is a 
wide attack surface, not welcome on production devices (for the same 
reason you can't just attach a debugger on a production device). It is 
very unlikely to fly for non-dev images IMHO. Many system processes 
contain too sensitive information like cookie jar, oauth2 tokens etc.

> 
> # cat /sys/kernel/tracing/uprobe_events
> p:uprobes/glXSwapBuffers /opt/lib/libGL.so.1.2.0:0x000000000003bab0
> 
> # grep -h /opt/lib/libGL.so.1.2.0 /proc/*/maps | sort
> 7f486ab51000-7f486ab6a000 r--p 00000000 08:03 7865435
> /opt/lib/libGL.so.1.2.0
> 7f486ab6a000-7f486abaf000 r-xp 00019000 08:03 7865435
> /opt/lib/libGL.so.1.2.0
> 7f486abaf000-7f486abc6000 r--p 0005e000 08:03 7865435
> /opt/lib/libGL.so.1.2.0
> 7f486abc6000-7f486abc9000 r--p 00074000 08:03 7865435
> /opt/lib/libGL.so.1.2.0
> 7f486abc9000-7f486abca000 rw-p 00077000 08:03 7865435
> /opt/lib/libGL.so.1.2.0
> 7f491438d000-7f49143a6000 r--p 00000000 08:03 7865435
> /opt/lib/libGL.so.1.2.0
> 7f49143a6000-7f49143eb000 r-xp 00019000 08:03 7865435
> /opt/lib/libGL.so.1.2.0
> 7f49143eb000-7f4914402000 r--p 0005e000 08:03 7865435
> /opt/lib/libGL.so.1.2.0
> 7f4914402000-7f4914405000 r--p 00074000 08:03 7865435
> /opt/lib/libGL.so.1.2.0
> 7f4914405000-7f4914406000 rw-p 00077000 08:03 7865435
> /opt/lib/libGL.so.1.2.0
> 7f6296d62000-7f6296d7b000 r--p 00000000 08:03 7865435
> /opt/lib/libGL.so.1.2.0
> ...
> 
> 
>>   > mark.a.janes@
>> events.  Why not follow the example of GPUVis, and write generic
>> trace_markers to ftrace?
>>
>> In my experience ftrace's trace_marker:
>> 1. Works for very simple types of events (e.g.
>> begin-slice/end-slice/counters) but don't work with richer / structured
>> event types like the ones linked above, as that gets into
>> stringification format, mashaling costs and interop.
>> 2. Writing into the marker has some non-trivial cost (IIRC 1-10 us on
>> Android), it involves a kernel into and back from the kernel;
>> 3. It leads to one global ring buffer, where fast events push out slower
>> ones, which is particularly problematic if you ever enable sched/*
>> events. At the userspace level, instead, we can dynamically route
>> different events to different buffers to mitigate this problem (see
>> https://perfetto.dev/docs/concepts/config#dynamic-buffer-mapping)
> 
> Yes, if you need more context, or handle really frequent events, static
> breakpoints are a better choice.
> 
> 
> In case of more frequent events, on Linux one might consider using some
> BPF program to process dynamic tracepoint data so that much smaller
> amount needs to be transferred to user-space.  But I'm not sure whether
> support for attaching BPF to tracepoints is in upstream Linux kernel
> yet.

eBPF, which you can use in recent kernels with tracepoints, solves 
different problem. It solves e.g., (1) dynamic filtering or (2) 
computing aggregations from hi-freq events. It doesn't solve problems 
like "I want to see all scheduling events and all frame-related 
userspace instrumentation points. But given that sched events are so 
hi-traffic I want to put them in a separate buffer, so they don't 
clobber all the rest". Turning scheduling events into a histogram 
(something you can do with eBPF+tracepoints) doesn't really solve cases 
where you want to follow the full scheduling block/wake chain while some 
userspace events taking unexpectedly long.

Primiano
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>