[PATCH i-g-t] lib/igt_drm_fdinfo: Handle (somewhat) amdgpu memory stats

Lucas De Marchi lucas.demarchi at intel.com
Mon Sep 9 15:40:05 UTC 2024


On Mon, Sep 09, 2024 at 05:27:58PM GMT, Kamil Konieczny wrote:
>Hi Lucas,
>On 2024-09-05 at 07:55:14 -0500, Lucas De Marchi wrote:
>> On Wed, Sep 04, 2024 at 04:59:19PM GMT, Tvrtko Ursulin wrote:
>> > From: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
>> >
>> > Code so far only handles the clients using the common DRM helper.
>> >
>> > Handle the amdgpu driver which uses a slightly different set of keys. More
>> > specifically, outputs drm-memory-<region> instead of drm-resident-<region>.
>> >
>> > With this added gputop starts showing resident memory usage for amdgpu.
>> >
>> > v2:
>> > * Semantics of amdgpu drm-memory- are like resident, not total.
>> >
>> > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
>> > Cc: Alex Deucher <alexander.deucher at amd.com>
>> > Cc: Christian König <christian.koenig at amd.com>
>> > Cc: Rob Clark <robdclark at chromium.org>
>> > Cc: Lucas De Marchi <lucas.demarchi at intel.com>
>> > Reviewed-by: Lucas De Marchi <lucas.demarchi at intel.com> # v1
>>
>> my r-b stands.
>>
>> Lucas De Marchi
>
>Could you look into a regression reported for XeFULL:
>
>igt at xe_drm_fdinfo@utilization-single-full-load-destroy-queue:

this has been going for some time and AFAICS points to a kernel bug when
destroying the exec queue - the time for that client is not accounted.
Kind of hard to reproduce, there must be a race hiding somewhere.

>
>https://intel-gfx-ci.01.org/tree/intel-xe/IGTPW_11696/shard-lnl-2/igt@xe_drm_fdinfo@utilization-single-full-load-destroy-queue.html

In commit 5986b136c97c ("tests/intel/xe_drm_fdinfo: Print timestamp for debug")
I added the timestamp recorded by the GPU to debug this.

(xe_drm_fdinfo:1870) DEBUG: ccs: spinner started
(xe_drm_fdinfo:1870) DEBUG: ccs: spinner ended (timestamp=4811350)
(xe_drm_fdinfo:1870) DEBUG: ccs: sample 1: cycles 0, total_cycles 8060935168
(xe_drm_fdinfo:1870) DEBUG: ccs: sample 2: cycles 0, total_cycles 8065771761
(xe_drm_fdinfo:1870) DEBUG: ccs: percent: 0.000000
(xe_drm_fdinfo:1870) CRITICAL: Test assertion failure function check_results, file ../../../usr/src/igt-gpu-tools/tests/intel/xe_drm_fdinfo.c:504:
(xe_drm_fdinfo:1870) CRITICAL: Failed assertion: 95.0 < percent
(xe_drm_fdinfo:1870) CRITICAL: error: 95.000000 >= 0.000000

And from that we can see the spinner actually executed for similar time
as previous runs. However the exec queue recorded 0 in the exec time.

All this is unrelated to the commit by Tvrtko. It's probably reported as
regression as I ended up renaming the test while fixing the igt bug that
was masking this (apparently) real kernel bug.

Lucas De Marchi

>
>Regards,
>Kamil
>


More information about the igt-dev mailing list