[RFC PATCH v3 07/11] drm, cgroup: Add TTM buffer allocation stats
daniel at ffwll.ch
Thu Jun 27 06:01:13 UTC 2019
On Thu, Jun 27, 2019 at 12:06:13AM -0400, Kenny Ho wrote:
> On Wed, Jun 26, 2019 at 12:12 PM Daniel Vetter <daniel at ffwll.ch> wrote:
> > On Wed, Jun 26, 2019 at 11:05:18AM -0400, Kenny Ho wrote:
> > > drm.memory.stats
> > > A read-only nested-keyed file which exists on all cgroups.
> > > Each entry is keyed by the drm device's major:minor. The
> > > following nested keys are defined.
> > >
> > > ====== =============================================
> > > system Host/system memory
> > Shouldn't that be covered by gem bo stats already? Also, system memory is
> > definitely something a lot of non-ttm drivers want to be able to track, so
> > that needs to be separate from ttm.
> The gem bo stats covers all of these type. I am treat the gem stats
> as more of the front end and a hard limit and this set of stats as the
> backing store which can be of various type. How does non-ttm drivers
> identify various memory types?
Not explicitly, they generally just have one. I think i915 currently has
two, system and carveout (with vram getting added).
> > > tt Host memory used by the drm device (GTT/GART)
> > > vram Video RAM used by the drm device
> > > priv Other drm device, vendor specific memory
> > So what's "priv". In general I think we need some way to register the
> > different kinds of memory, e.g. stuff not in your list:
> > - multiple kinds of vram (like numa-style gpus)
> > - cma (for all those non-ttm drivers that's a big one, it's like system
> > memory but also totally different)
> > - any carveouts and stuff
> privs are vendor specific, which is why I have truncated it. For
> example, AMD has AMDGPU_PL_GDS, GWS, OA
> Since we are using keyed file type, we should be able to support
> vendor specific memory type but I am not sure if this is acceptable to
> cgroup upstream. This is why I stick to the 3 memory type that is
> common across all ttm drivers.
I think we'll need custom memory pools, not just priv, and I guess some
naming scheme for them. I think just exposing them as amd-gws, amd-oa,
amd-gds would make sense.
Another thing I wonder about is multi-gpu cards, with multiple gpus and
each their own vram and other device-specific resources. For those we'd
have node0.vram and node1.vram too (on top of maybe an overall vram node,
> > I think with all the ttm refactoring going on I think we need to de-ttm
> > the interface functions here a bit. With Gerd Hoffmans series you can just
> > use a gem_bo pointer here, so what's left to do is have some extracted
> > structure for tracking memory types. I think Brian Welty has some ideas
> > for this, even in patch form. Would be good to keep him on cc at least for
> > the next version. We'd need to explicitly hand in the ttm_mem_reg (or
> > whatever the specific thing is going to be).
> I assume Gerd Hoffman's series you are referring to is this one?
There's a newer one, much more complete, but yes that's the work.
> I can certainly keep an eye out for Gerd's refactoring while
> refactoring other parts of this RFC.
> I have added Brian and Gerd to the thread for awareness.
btw just realized that maybe building the interfaces on top of ttm_mem_reg
is maybe not the best. That's what you're using right now, but in a way
that's just the ttm internal detail of how the backing storage is
allocated. I think the structure we need to abstract away is
ttm_mem_type_manager, without any of the actual management details.
btw reminds me: I guess it would be good to have a per-type .total
read-only exposed, so that userspace has an idea of how much there is?
ttm is trying to be agnostic to the allocator that's used to manage a
memory type/resource, so doesn't even know that. But I think something we
need to expose to admins, otherwise they can't meaningfully set limits.
Software Engineer, Intel Corporation
More information about the amd-gfx