[Mesa-dev] RFC: Memory allocation on Mesa

Tue May 12 18:08:36 UTC 2020

Hi,

On Tue, 2020-05-12 at 14:36 +0000, Jose Fonseca wrote:
> From: mesa-dev <mesa-dev-bounces at lists.freedesktop.org> on behalf of
> Tamminen, Eero T <eero.t.tamminen at intel.com>
> I've done a lot of resource usage analysis at former job[1], but I've
> never had needed anything like that.  malloc etc all reside in a
> separate shared library from Mesa, so calls to them always cross
> dynamic library boundary and therefore all of them can be caught with
> the dynamic linker features (LD_PRELOAD, LD_AUDIT...).
> 
> True, one can easily intercept all mallocs using that sort of dynamic
> linking tricks, when doing full application interception.  (I even
> have done some of the sort on https://github.com/jrfonseca/memtrail ,
> mostly to hunt down memory leaks on LLVM.) But the goal here is to
> intercept the OpenGL/Vulkan driver malloc calls alone.  Not the
> application mallocs.  Which is difficult to segregate when doing
> whole application interception.

Only reason to do this that I can think of would be to report some Mesa
specific metrics to an application at run-time.  But that will need
careful thought on how *applications* are supposed to *use* that data,
otherwise it's just "garbage in, garbage out".

What are the exact use-cases for Mesa specific allocation data?

> For simplicity imagine you have only these shared objects:
>    application (not controlled by us)
>    libVulkan.so
>    libstdcc++.so
>    libc.so
> 
> Now imagine you're intercepting malloc doing some sort of LD_PRELOAD
> interception, and malloc is called.  How do you know if it's a call
> done by the Vulkan driver, hence should call the callback, or one
> done by the application, hence not call the Vulkan allocation
> callback.
> 
> One can look at the caller IP address, but what if the caller is in
> libstdc++ which is used both by Vulkan and the app, is not
> immediately clear which to bill the memory.  One would need to walk
> back the stack completely, which is complicated and not very
> reliable.

Over decade ago it was unreliable, but not anymore.  Even stripped
binaries contain the frame information section (I think it's needed
e.g. for handling C++ exceptions).  So nowadays it's simple to use, and
Glibc has a function for it.

For analysis, you don't do filtering while tracing, as that's too
limiting, you do it in post-processing phase.

If you really need to do run-time per-library filtering, you could do
it based on the backtrace addresses, and whether they fall within the
address range where the library you're interested about is mapped.

For that, you need to overload library loading, so you can catch when
Mesa gets loaded, and find out its (address-space-randomized) load
address range.

> Imagine one guesses wrong -- the malloc interceptor believes the
> malloc call is done by the Vulkan driver, and calls the application
> callback, which then calls malloc again, and the interceptor guesses
> wrong again, therefore an infinite recursion loop.

If you find your own code addresses from the backtrace, STOP. :-)

> Could you be confusing this with trying to catch some Mesa specific
> function, where dynamic linker can catch only calls from application
> to
> Mesa, but not calls within Mesa library itself (as they don't cross
> dynamic library boundary)?
> 
> My goal from the beginning is intercepting all mallocs/frees done by
> Mesa OpenGL/Vulkan driver, and only those.

If this is for analysis, I would do it in post-processing phase (with
sp-rtrace).  Just ask post-processor to filter-in allocation backtraces
going through the library I'm interested about.

...
> Yes, indeed Linux has much better tooling for this than
> Windows.  Note that memory debugging on Windows is just one of our
> needs.  The other being able to run Mesa driver on an embedded system
> with fixed amount of memory (a separate budget for Mesa mallocs.)

If that embedded target is still running something Linux-like, but
doesn't have enough memory for collecting the data, I would just stream
the data out of the device, or store it to a file.  And do analysis in
post-processing phase.

If that's not an option, I would do memory analysis on a system which
has reasonable tools, and "emulate" relevant embedded device functional
differences on it.   If the issue isn't reproducible that way, it may
be thread / data race, which needs different tools (valgrind, mutrace
etc).

	- Eero

PS. In the maemo-tools project, there's also tool called "functracer"
which can on ARM & x86 attach to an already running process and start
collecting resource information. sp-rtrace tooling can post-process its
output.

(It basically does in user-space with ptrace syscall, what ftrace does
in kernel space, which requires some arch-specific assembly, so
unfortunately it's not very portable.)