Proposal to report GPU private memory allocations with sysfs nodes [plain text version]

Mon Oct 28 18:33:57 UTC 2019

On Mon, Oct 28, 2019 at 8:26 AM Jerome Glisse <jglisse at redhat.com> wrote:

> On Fri, Oct 25, 2019 at 11:35:32AM -0700, Yiwei Zhang wrote:
> > Hi folks,
> >
> > This is the plain text version of the previous email in case that was
> > considered as spam.
> >
> > --- Background ---
> > On the downstream Android, vendors used to report GPU private memory
> > allocations with debugfs nodes in their own formats. However, debugfs
> nodes
> > are getting deprecated in the next Android release.
>
> Maybe explain why it is useful first ?
>

Memory is precious on Android mobile platforms. Apps using a large amount of
memory, games, tend to maintain a table for the memory on different devices
with
different prediction models. Private gpu memory allocations is
currently semi-blind
to the apps and the platform as well.

By having the data, the platform can do:
(1) GPU memory profiling as part of the huge Android profiler in progress.
(2) Android system health team can enrich the performance test coverage.
(3) We can collect filed metrics to detect any regression on the gpu
private memory
allocations in the production population.
(4) Shell user can easily dump the allocations in a uniform way across
vendors.
(5) Platform can feed the data to the apps so that apps can do memory
allocations
in a more predictable way.

> >
> > --- Proposal ---
> > We are taking the chance to unify all the vendors to migrate their
> existing
> > debugfs nodes into a standardized sysfs node structure. Then the platform
> > is able to do a bunch of useful things: memory profiling, system health
> > coverage, field metrics, local shell dump, in-app api, etc. This proposal
> > is better served upstream as all GPU vendors can standardize a gpu memory
> > structure and reduce fragmentation across Android and Linux that clients
> > can rely on.
> >
> > --- Detailed design ---
> > The sysfs node structure looks like below:
> > /sys/devices/<ro.gfx.sysfs.0>/<pid>/<type_name>
> > e.g. "/sys/devices/mali0/gpu_mem/606/gl_buffer" and the gl_buffer is a
> node
> > having the comma separated size values: "4096,81920,...,4096".
>
> How does kernel knows what API the allocation is use for ? With the
> open source driver you never specify what API is creating a gem object
> (opengl, vulkan, ...) nor what purpose (transient, shader, ...).
>

Oh, is this a hard requirement for the open source drivers to not bookkeep
any
data from userland? I think the API is just some additional metadata passed
down.

>
> > For the top level root, vendors can choose their own names based on the
> > value of ro.gfx.sysfs.0 the vendors set. (1) For the multiple gpu driver
> > cases, we can use ro.gfx.sysfs.1, ro.gfx.sysfs.2 for the 2nd and 3rd
> KMDs.
> > (2) It's also allowed to put some sub-dir for example "kgsl/gpu_mem" or
> > "mali0/gpu_mem" in the ro.gfx.sysfs.<channel> property if the root name
> > under /sys/devices/ is already created and used for other purposes.
>
> On one side you want to standardize on the other you want to give
> complete freedom on the top level naming scheme. I would rather see a
> consistent naming scheme (ie something more restraint and with little
> place for interpration by individual driver)
>

Thanks for commenting on this. We definitely need some suggestions on the
root
directory. In the multi-gpu case on desktop, is there some existing
consumer to
query "some data" from all the GPUs? How does the tool find all GPUs and
differentiate between them? Is this already standardized?

> For the 2nd level "pid", there are usually just a couple of them per
> > snapshot, since we only takes snapshot for the active ones.
>
> ? Do not understand here, you can have any number of applications with
> GPU objects ? And thus there is no bound on the number of PID. Please
> consider desktop too, i do not know what kind of limitation android
> impose.
>

We are only interested in tracking *active* GPU private allocations. So
yes, any
application currently holding an active GPU context will probably has a
node here.
Since we want to do profiling for specific apps, the data has to be per
application
based. I don't get your concerns here. If it's about the tracking overhead,
it's rare
to see tons of application doing private gpu allocations at the same time.
Could
you help elaborate a bit?

> For the 3rd level "type_name", the type name will be one of the GPU memory
> > object types in lower case, and the value will be a comma separated
> > sequence of size values for all the allocations under that specific type.
> >
> > We especially would like some comments on this part. For the GPU memory
> > object types, we defined 9 different types for Android:
> > (1) UNKNOWN // not accounted for in any other category
> > (2) SHADER // shader binaries
> > (3) COMMAND // allocations which have a lifetime similar to a
> > VkCommandBuffer
> > (4) VULKAN // backing for VkDeviceMemory
> > (5) GL_TEXTURE // GL Texture and RenderBuffer
> > (6) GL_BUFFER // GL Buffer
> > (7) QUERY // backing for query
> > (8) DESCRIPTOR // allocations which have a lifetime similar to a
> > VkDescriptorSet
> > (9) TRANSIENT // random transient things that the driver needs
> >
> > We are wondering if those type enumerations make sense to the upstream
> side
> > as well, or maybe we just deal with our own different type sets. Cuz on
> the
> > Android side, we'll just read those nodes named after the types we
> defined
> > in the sysfs node structure.
>
> See my above point of open source driver and kernel being unaware
> of the allocation purpose and use.
>
> Cheers,
> Jérôme
>
>
Many thanks for the reply!
Yiwei
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20191028/96d7a2dd/attachment-0001.html>