[Mesa-dev] Perfetto CPU/GPU tracing
Primiano Tucci
primiano at chromium.org
Thu Feb 18 20:07:48 UTC 2021
On 18/02/2021 20:26, Tamminen, Eero T wrote:
> Hi,
>
> (This isn't anymore that related to Mesa, but maybe it's still of
> interest.)
>
> On Thu, 2021-02-18 at 16:40 +0100, Primiano Tucci wrote:
>
>> On 18/02/2021 14:35, Tamminen, Eero T wrote:
> [...]
>>> It doesn't require executable code to be writable from user-space,
>>> library code can remain read-only because kernel can toggle relevant
>>> page writable for uprobe breakpoint setup and back.
>>
>> The problem is not who rewrites the .text pages (although, yes, I agree
>> that the kernel doing this is better than userspace doing it). The
>> problem is:
>>
>> 1. Losing the ability to verify the integrity of system executables.
>> tell if some malware/rootkit did alter them or uprobes did. Effectively
>> you lose the ability to verify the full chain of bootloader -> system
>> image -> file integrity.
>
> Why you would lose it?
>
> Integrity checks will succeed when there are no trace points enabled,
> and trace points should be enabled only when you start tracing, so you
> know what is causing integrity check failures (especially when they
> start passing again once you disable tracepoint
If you do this (disabling when tracing) the message out there becomes:
"if you write malware, the very first thing you should do is enabling
tracing, so any system integrity check will be suppressed" :)
Things like uprobes (i.e. anything that can dynamically alter the
execution flow of system processes) is typically available only on
engineering setups, where you have control of the device / kernel /
security settings (Yama, selinux or any other security module), not on
production devices.
I understand that the situation for most (all?) Linux-based distros is
different as you can just sudo. But on many other embedded OSes - at
least Google ones like CrOS and Android - the security model is way
stricter.
We could argue that is bad / undesirable / too draconian but that is
something that any of us has the power to change. At some point each
platform decides where it wants to be in the spectrum of "easy to hack"
and "secure for the user". CrOS model is: you can hack as much as you
want, but you need first to re-flash it in dev-mode.
>> 2. In general, a mechanism that allows dynamic rewriting of code is a
>> wide attack surface, not welcome on production devices (for the same
>> very unlikely to fly for non-dev images IMHO. Many system processes
>> contain too sensitive information like cookie jar, oauth2 tokens etc.
>
> Isn't there any kind of dev-mode which would be required to enable
> things that are normally disallowed?
That requires following steps that are non-trivial for non-tech-savy
users and, more importantly, wiping the device (CrOS calls this
"power-washing") [1].
We can't ask users to reflash their device just to give us a trace when
they are experiencing problems. Many of those problems can't be
reproduced by engineers because depend on some peculiar state the user
is in. A recent example (not related with Mesa): some users were
experiencing an extremely unresponsive (Chrome) UI. After looking at
traces engineers figured out that the root cause (and hence the repro)
was: "you need to have a (chrome) tab which title is long enough to
cause ellipsis and that also has an emoji in the left-most visible part.
The emoji causes invalidation of the cached font measurement (this is
the bug), which causes every UI draw to be awfully slow.
For problems like this (which are very frequent) we really need to ask
users to give us traces. And that needs to be really a one-click thing
for them or they will not be able to help us.
[1]
https://www.chromium.org/chromium-os/chromiumos-design-docs/developer-mode
>
> (like kernel modifying RO mapped user-space process memory pages)
>
>
>>
> [...]
>>> Yes, if you need more context, or handle really frequent events,
>>> static
>>> breakpoints are a better choice.
>>>
>>>
>>> In case of more frequent events, on Linux one might consider using
>>> some
>>> BPF program to process dynamic tracepoint data so that much smaller
>>> amount needs to be transferred to user-space. But I'm not sure
>>> whether
>>> support for attaching BPF to tracepoints is in upstream Linux kernel
>>> yet.
>>
>> eBPF, which you can use in recent kernels with tracepoints, solves
>> different problem. It solves e.g., (1) dynamic filtering or (2)
>> computing aggregations from hi-freq events. It doesn't solve problems
>> like "I want to see all scheduling events and all frame-related
>> userspace instrumentation points. But given that sched events are so
>> hi-traffic I want to put them in a separate buffer, so they don't
>> clobber all the rest". Turning scheduling events into a histogram
>> (something you can do with eBPF+tracepoints) doesn't really solve cases
>> where you want to follow the full scheduling block/wake chain while some
>> userspace events taking unexpectedly long.
>
> You could e.g. filter out all sched events except ones for the process
> you're interested about. That should already provide huge reduction in
> amount of data, for use-cases where scheduling of rest of processes is
> of less interest.
Yeah but in many cases you don't know upfront which are the sched events
that you are interested in until you see the trace. On modern embedded
OSes where everything, even fetching a font or play a notification
sounds, requires some IPC with various services, it's very hard to tell
upfront what the critical path is.
>
> However, I think high frequency kernel tracing is a different use-case
> from user-space tracing, which requires its own tooling [1] (and just
> few user-space trace points to provide context for traced kernel
> activity).
I disagree. In my (mostly Android related) experience what engineers
need is the union of kernel (specifically scheduling) tracing AND
userspace tracing **on the same timeline**. Userspace tracing tells when
(on the timeline) something that was important for the user (or for the
app lifecycle) happened / took too much time. Kernel tracing helps
understanding the real reasons why. This is especially true for cases of
lock contention or priority inversions, where the kernel traces can
explain why things didn't happen in time, which task (and eventually
callstack, via perf_event_open) did signal the mutex that we blocked on,
and so on.
>
>
> - Eero
>
> [1] In corporate setting I would expect this kind of latency
> investigations to be actually HW assisted, otherwise tracing itself
> disturbs the system too much. Ultimately it could be using instruction
> branch tracing to catch *everything*
HW-assisted tracing via LBR is definitely an extremely interesting power
tool. Whether it's a must-have or a nice-to-have depends really on the
classes of problems one needs to investigate.
For system-architecture-issues (interaction between N processes across
IPC, or across VMs) that level of refinement (minimal overhead) is
typically not required. For micro-architecture problems (CPU
pipeline-related, cache efficiency, branch prediction hit ratio and the
like) it is.
> , as both ARM and x86 have HW support for that.
Not really, the situation for ARM is more complicated. IIRC to the day
the only LBR-equivalent on ARM require the Embedded Trace Macrocell
(ETM) hw. But ETM is very expensive in term of silicon area and is
typically present only on pre-production / testing devices. I am not
aware of any production ARM-based SoC that ships ETM.
>
> (Instruction branch tracing doesn't include context, but that can be
> injected separately to the data stream. Because it catches everything,
> one can infer some of the context from the trace itself too. I don't
> think there's any good Open Source post-processing / visualization tools
> for such data though.)
>
More information about the mesa-dev
mailing list