[PATCH v6 4/5] drm: add drm_memory_stats_is_zero

Tvrtko Ursulin tvrtko.ursulin at igalia.com
Thu Nov 7 14:43:55 UTC 2024


On 07/11/2024 14:17, Li, Yunxiang (Teddy) wrote:
> [AMD Official Use Only - AMD Internal Distribution Only]
> 
>> From: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
>> Sent: Thursday, November 7, 2024 5:41
>> On 25/10/2024 18:41, Yunxiang Li wrote:
>>> Add a helper to check if the memory stats is zero, this will be used
>>> to check for memory accounting errors.
>>>
>>> Signed-off-by: Yunxiang Li <Yunxiang.Li at amd.com>
>>> ---
>>>    drivers/gpu/drm/drm_file.c | 9 +++++++++
>>>    include/drm/drm_file.h     | 1 +
>>>    2 files changed, 10 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
>>> index 714e42b051080..75ed701d80f74 100644
>>> --- a/drivers/gpu/drm/drm_file.c
>>> +++ b/drivers/gpu/drm/drm_file.c
>>> @@ -859,6 +859,15 @@ static void print_size(struct drm_printer *p, const char
>> *stat,
>>>      drm_printf(p, "drm-%s-%s:\t%llu%s\n", stat, region, sz, units[u]);
>>>    }
>>>
>>> +int drm_memory_stats_is_zero(const struct drm_memory_stats *stats) {
>>> +   return (stats->shared == 0 &&
>>> +           stats->private == 0 &&
>>> +           stats->resident == 0 &&
>>> +           stats->purgeable == 0 &&
>>> +           stats->active == 0);
>>> +}
>>
>> Could use mem_is_zero() for some value of source/binary compactness.
> 
> Yeah, the patch set started out with that when it's just a function in amdgpu, but Christ didn't like it.

Okay, I don't feel so strongly about the implementation details.

>>> +EXPORT_SYMBOL(drm_memory_stats_is_zero);
>>> +
>>
>> I am not a huge fan of adding this as an interface as the only caller appears to be a
>> sanity check in amdgpu_vm_fini():
>>
>>        if (!amdgpu_vm_stats_is_zero(vm))
>>                dev_err(adev->dev, "VM memory stats is non-zero when fini\n");
>>
>> But I guess there is some value in sanity checking since amdgpu does not have a
>> notion of debug only code (compiled at production and exercised via a test suite).
>>
>> I do suggest to demote the dev_err to notice log level would suffice and be more
>> accurate.
> 
> I think it's very important to have a check like this when we have a known invariant, especially in this case where there's stat tracking code spread out everywhere and we have very little chance of catching a bug right when it happened. And since whenever this check fails we know for sure there is a bug, I don't see the harm of keeping it as an error.
It would indeed be a programming error if it can happen, but from the 
point of view of a driver and system log I think a warning is actually 
right.

Regards,

Tvrtko

> 
> Now that I think about it, I probably want to have the process & task name in here to aid in reproduction.
> 
> Teddy


More information about the amd-gfx mailing list