Proposal to report GPU private memory allocations with sysfs nodes [plain text version]

Tue Nov 12 18:17:10 UTC 2019

Hi folks,

What do you think about:
> For the sysfs approach, I'm assuming the upstream vendors still need
> to provide a pair of UMD and KMD, and this ioctl to label the BO is
> kept as driver private ioctl. Then will each driver just define their
> own set of "label"s and the KMD will only consume the corresponding
> ones so that the sysfs nodes won't change at all? Report zero if
> there's no allocation or re-use under a particular "label".

Best,
Yiwei

On Wed, Nov 6, 2019 at 11:21 AM Yiwei Zhang <zzyiwei at google.com> wrote:
>
> For the sysfs approach, I'm assuming the upstream vendors still need
> to provide a pair of UMD and KMD, and this ioctl to label the BO is
> kept as driver private ioctl. Then will each driver just define their
> own set of "label"s and the KMD will only consume the corresponding
> ones so that the sysfs nodes won't change at all? Report zero if
> there's no allocation or re-use under a particular "label".
>
> A separate thought is that do the GPU memory allocations deserve a
> node under /proc/<pid> for per process tracking? If the structure can
> stay similar to what "maps" or "smaps" are, then we can bookkeep all
> BOs with a label easily. For multi-gpu scenario, maybe having
> something like "/proc/<pid>/gpu_mem/<gpu_id>/maps" along with a global
> table somewhere specifying the {gpu_id, device_name} pairs. Then the
> global GPU allocation summary info still lives under
> "/sys/devices/<device_name>/gpu_mem/". How difficult it is to define
> such procfs node structure? Just curious.
>
> Thanks for all the comments and replies!
>
> Best regards,
> Yiwei
>
>
> On Wed, Nov 6, 2019 at 8:55 AM Rob Clark <robdclark at gmail.com> wrote:
> >
> > On Tue, Nov 5, 2019 at 1:47 AM Daniel Vetter <daniel at ffwll.ch> wrote:
> > >
> > > On Mon, Nov 04, 2019 at 11:34:33AM -0800, Yiwei Zhang wrote:
> > > > Hi folks,
> > > >
> > > > (Daniel, I just moved you to this thread)
> > > >
> > > > Below are the latest thoughts based on all the feedback and comments.
> > > >
> > > > First, I need to clarify on the gpu memory object type enumeration
> > > > thing. We don't want to enforce those enumerations across the upstream
> > > > and Android, and we should just leave those configurable and flexible.
> > > >
> > > > Second, to make this effort also useful to all the other memory
> > > > management tools like PSS. At least an additional node is needed for
> > > > the part of the gpu private allocation not mapped to the
> > > > userspace(invisible to PSS). This is especially critical for the
> > > > downstream Android so that low-memory-killer(lmkd) can be aware of the
> > > > actual total memory for a process and will know how much gets freed up
> > > > if it kills that process. This is an effort to de-mystify the "lost
> > > > ram".
> > > >
> > > > Given above, the new node structure would look like below:
> > > >
> > > > Global nodes:
> > > > /sys/devices/<root>/gpu_mem/global/total /* Total private allocation
> > > > for coherency, this should also include the anonymous memory allocated
> > > > in the kmd  */
> > > > /sys/devices/<root>/gpu_mem/global/total_unmapped /* Account for the
> > > > private allocation not mapped to userspace(not visible for PSS), don't
> > > > need to be coherent with the "total" node. lmkd or equivalent service
> > > > looking at PSS will only look at this node in addition. */
> > > > /sys/devices/<root>/gpu_mem/global/<type1> /* One total value per
> > > > type, this should also include the anonymous memory allocated in the
> > > > kmd(or maybe another anonymous type for global nodes)  */
> > > > /sys/devices/<root>/gpu_mem/global/<type2> /* One total value per type */
> > > > ...
> > > > /sys/devices/<root>/gpu_mem/global/<typeN> /* One total value per type */
> > > >
> > > > Per process nodes:
> > > > /sys/devices/<root>/gpu_mem/proc/<pid>/total /* Total private
> > > > allocation for coherency */
> > > > /sys/devices/<root>/gpu_mem/proc/<pid>/total_unmapped /* Account for
> > > > the private allocation not mapped to userspace(not visible for PSS),
> > > > don't need to be coherent with the "total" node. lmkd or equivalent
> > > > service looking at PSS will only look at this node in addition. */
> > > > /sys/devices/<root>/gpu_mem/proc/<pid>/<type1> /* One total value per type */
> > > > /sys/devices/<root>/gpu_mem/proc/<pid>/<type2> /* One total value per type */
> > > > ...
> > > > /sys/devices/<root>/gpu_mem/proc/<pid>/<typeN> /* One total value per type */
> > > >
> > > > The type1 to typeN for downstream Android will be the enumerations I
> > > > mentioned in the original email which are: unknown, shader,...,
> > > > transient. For the upstream, those can be the labeled BOs or any other
> > > > customized types.
> > > >
> > > > Look forward to the comments and feedback!
> > >
> > > I don't think this will work well, at least for upstream:
> > >
> > > - The labels are currently free-form, baking them back into your structure
> > >   would mean we'd need to do lots of hot add/remove of sysfs directory
> > >   trees. Which sounds like a real bad idea :-/
> >
> > also, a bo's label can change over time if it is re-used for a
> > different purpose.. not sure what the overhead is for add/remove
> > sysfs, but I don't think I want that overhead in the bo_reuse path
> >
> > (maybe that matters less for vk, where we aren't using a userspace bo cache)
> >
> > BR,
> > -R
> >
> > > - Buffer objects aren't attached to pids, but files. And files can be
> > >   shared. If we want to list this somewhere outside of debugfs, we need to
> > >   tie this into the files somehow (so proc), except the underlying files
> > >   are all anon inodes, so this gets really tricky I think to make work
> > >   well.
> > >
> > > Cheers, Daniel
> > >
> > > >
> > > > Best regards,
> > > > Yiwei
> > > >
> > > >
> > > >
> > > >
> > > > On Fri, Nov 1, 2019 at 1:37 AM Pekka Paalanen <ppaalanen at gmail.com> wrote:
> > > > >
> > > > > On Thu, 31 Oct 2019 13:57:00 -0400
> > > > > Kenny Ho <y2kenny at gmail.com> wrote:
> > > > >
> > > > > > Hi Yiwei,
> > > > > >
> > > > > > This is the latest series:
> > > > > > https://patchwork.kernel.org/cover/11120371/
> > > > > >
> > > > > > (I still need to reply some of the feedback.)
> > > > > >
> > > > > > Regards,
> > > > > > Kenny
> > > > > >
> > > > > > On Thu, Oct 31, 2019 at 12:59 PM Yiwei Zhang <zzyiwei at google.com> wrote:
> > > > > > >
> > > > > > > Hi Kenny,
> > > > > > >
> > > > > > > Thanks for the info. Do you mind forwarding the existing discussion to me or have me cc'ed in that thread?
> > > > > > >
> > > > > > > Best,
> > > > > > > Yiwei
> > > > > > >
> > > > > > > On Wed, Oct 30, 2019 at 10:23 PM Kenny Ho <y2kenny at gmail.com> wrote:
> > > > > > >>
> > > > > > >> Hi Yiwei,
> > > > > > >>
> > > > > > >> I am not sure if you are aware, there is an ongoing RFC on adding drm
> > > > > > >> support in cgroup for the purpose of resource tracking.  One of the
> > > > > > >> resource is GPU memory.  It's not exactly the same as what you are
> > > > > > >> proposing (it doesn't track API usage, but it tracks the type of GPU
> > > > > > >> memory from kmd perspective) but perhaps it would be of interest to
> > > > > > >> you.  There are no consensus on it at this point.
> > > > >
> > > > > Hi Yiwei,
> > > > >
> > > > > I'd like to point out an effort to have drivers label BOs for debugging
> > > > > purposes: https://lists.freedesktop.org/archives/dri-devel/2019-October/239727.html
> > > > >
> > > > > I don't know if it would work, but an obvious idea might be to use
> > > > > those labels for tracking the kinds of buffers - a piece of UAPI which I
> > > > > believe you are still missing.
> > > > >
> > > > >
> > > > > Thanks,
> > > > > pq
> > >
> > > --
> > > Daniel Vetter
> > > Software Engineer, Intel Corporation
> > > http://blog.ffwll.ch
> > > _______________________________________________
> > > dri-devel mailing list
> > > dri-devel at lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/dri-devel