[Intel-gfx] [PATCH 16/17] cgroup/drm: Expose memory stats

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Wed Jul 26 16:44:28 UTC 2023


On 21/07/2023 23:21, Tejun Heo wrote:
> On Wed, Jul 12, 2023 at 12:46:04PM +0100, Tvrtko Ursulin wrote:
>>    $ cat drm.memory.stat
>>    card0 region=system total=12898304 shared=0 active=0 resident=12111872 purgeable=167936
>>    card0 region=stolen-system total=0 shared=0 active=0 resident=0 purgeable=0
>>
>> Data is generated on demand for simplicty of implementation ie. no running
>> totals are kept or accounted during migrations and such. Various
>> optimisations such as cheaper collection of data are possible but
>> deliberately left out for now.
>>
>> Overall, the feature is deemed to be useful to container orchestration
>> software (and manual management).
>>
>> Limits, either soft or hard, are not envisaged to be implemented on top of
>> this approach due on demand nature of collecting the stats.
> 
> So, yeah, if you want to add memory controls, we better think through how
> the fd ownership migration should work.

It would be quite easy to make the implicit migration fail - just the 
matter of failing the first ioctl, which is what triggers the migration, 
after the file descriptor access from a new owner.

But I don't think I can really add that in the RFC given I have no hard 
controls or anything like that.

With GPU usage throttling it doesn't really apply, at least I don't 
think it does, since even when migrated to a lower budget group it would 
just get immediately de-prioritized.

I don't think hard GPU time limits are feasible in general, and while 
soft might be, again I don't see that any limiting would necessarily 
have to run immediately on implicit migration.

Second part of the story are hypothetical/future memory controls.

I think first thing to say is that implicit migration is important, but 
it is not really established to use the file descriptor from two places 
or to migrate more than once. It is simply fresh fd which gets sent to 
clients from Xorg, which is one of the legacy ways of doing things.

So we probably can just ignore that given no significant amount of 
memory ownership would be getting migrated.

And for drm.memory.stat I think what I have is good enough - both 
private and shared data get accounted, for any clients that have handles 
to particular buffers.

Maarten was working on memory controls so maybe he would have more 
thoughts on memory ownership and implicit migration.

But I don't think there is anything incompatible with that and 
drm.memory.stats as proposed here, given how the categories reported are 
the established ones from the DRM fdinfo spec, and it is fact of the 
matter that we can have multiple memory regions per driver.

The main thing that would change between this RFC and future memory 
controls in the area of drm.memory.stat is the implementation - it would 
have to get changed under the hood from "collect on query" to "account 
at allocation/free/etc". But that is just implementation details.

Regards,

Tvrtko


More information about the Intel-gfx mailing list