Proposal to report GPU private memory allocations with sysfs nodes [plain text version]

Mon Oct 28 15:26:02 UTC 2019

On Fri, Oct 25, 2019 at 11:35:32AM -0700, Yiwei Zhang wrote:
> Hi folks,
> 
> This is the plain text version of the previous email in case that was
> considered as spam.
> 
> --- Background ---
> On the downstream Android, vendors used to report GPU private memory
> allocations with debugfs nodes in their own formats. However, debugfs nodes
> are getting deprecated in the next Android release.

Maybe explain why it is useful first ?

> 
> --- Proposal ---
> We are taking the chance to unify all the vendors to migrate their existing
> debugfs nodes into a standardized sysfs node structure. Then the platform
> is able to do a bunch of useful things: memory profiling, system health
> coverage, field metrics, local shell dump, in-app api, etc. This proposal
> is better served upstream as all GPU vendors can standardize a gpu memory
> structure and reduce fragmentation across Android and Linux that clients
> can rely on.
> 
> --- Detailed design ---
> The sysfs node structure looks like below:
> /sys/devices/<ro.gfx.sysfs.0>/<pid>/<type_name>
> e.g. "/sys/devices/mali0/gpu_mem/606/gl_buffer" and the gl_buffer is a node
> having the comma separated size values: "4096,81920,...,4096".

How does kernel knows what API the allocation is use for ? With the
open source driver you never specify what API is creating a gem object
(opengl, vulkan, ...) nor what purpose (transient, shader, ...).

> For the top level root, vendors can choose their own names based on the
> value of ro.gfx.sysfs.0 the vendors set. (1) For the multiple gpu driver
> cases, we can use ro.gfx.sysfs.1, ro.gfx.sysfs.2 for the 2nd and 3rd KMDs.
> (2) It's also allowed to put some sub-dir for example "kgsl/gpu_mem" or
> "mali0/gpu_mem" in the ro.gfx.sysfs.<channel> property if the root name
> under /sys/devices/ is already created and used for other purposes.

On one side you want to standardize on the other you want to give
complete freedom on the top level naming scheme. I would rather see a
consistent naming scheme (ie something more restraint and with little
place for interpration by individual driver)
.

> For the 2nd level "pid", there are usually just a couple of them per
> snapshot, since we only takes snapshot for the active ones.

? Do not understand here, you can have any number of applications with
GPU objects ? And thus there is no bound on the number of PID. Please
consider desktop too, i do not know what kind of limitation android
impose.

> For the 3rd level "type_name", the type name will be one of the GPU memory
> object types in lower case, and the value will be a comma separated
> sequence of size values for all the allocations under that specific type.
> 
> We especially would like some comments on this part. For the GPU memory
> object types, we defined 9 different types for Android:
> (1) UNKNOWN // not accounted for in any other category
> (2) SHADER // shader binaries
> (3) COMMAND // allocations which have a lifetime similar to a
> VkCommandBuffer
> (4) VULKAN // backing for VkDeviceMemory
> (5) GL_TEXTURE // GL Texture and RenderBuffer
> (6) GL_BUFFER // GL Buffer
> (7) QUERY // backing for query
> (8) DESCRIPTOR // allocations which have a lifetime similar to a
> VkDescriptorSet
> (9) TRANSIENT // random transient things that the driver needs
>
> We are wondering if those type enumerations make sense to the upstream side
> as well, or maybe we just deal with our own different type sets. Cuz on the
> Android side, we'll just read those nodes named after the types we defined
> in the sysfs node structure.

See my above point of open source driver and kernel being unaware
of the allocation purpose and use.

Cheers,
Jérôme