[PATCH 0/5] 0 MHz is not a valid current frequency

Russell, Kent Kent.Russell at amd.com
Tue Oct 19 13:25:50 UTC 2021


[AMD Official Use Only]

It was the rocm-smi -c flag. Maybe some work was done to make it more robust, that would be nice. But the -c flag is supposed to show the current frequency for each clock type. -g would do the same, but just for SCLK.

Kent

From: Tuikov, Luben <Luben.Tuikov at amd.com>
Sent: Tuesday, October 19, 2021 12:27 AM
To: Russell, Kent <Kent.Russell at amd.com>; Deucher, Alexander <Alexander.Deucher at amd.com>; Quan, Evan <Evan.Quan at amd.com>; Lazar, Lijo <Lijo.Lazar at amd.com>; amd-gfx at lists.freedesktop.org
Cc: Kasiviswanathan, Harish <Harish.Kasiviswanathan at amd.com>
Subject: Re: [PATCH 0/5] 0 MHz is not a valid current frequency

Kent,

What is the command which fails?
I can try to duplicate it here.

So far, things I've tried, I cannot make rocm-smi fail. Command arguments?

Regards,
Luben

On 2021-10-18 21:06, Russell, Kent wrote:

[AMD Official Use Only]

The * is required for the rocm-smi's functionality for showing what the current clocks are. We had a bug before where the * was removed, then the SMI died fantastically. Work could be done to try to handle that type of situation, but the SMI has a "show current clocks" and uses the * to determine which one is active

Kent

From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org><mailto:amd-gfx-bounces at lists.freedesktop.org> On Behalf Of Russell, Kent
Sent: Monday, October 18, 2021 9:05 PM
To: Tuikov, Luben <Luben.Tuikov at amd.com><mailto:Luben.Tuikov at amd.com>; Deucher, Alexander <Alexander.Deucher at amd.com><mailto:Alexander.Deucher at amd.com>; Quan, Evan <Evan.Quan at amd.com><mailto:Evan.Quan at amd.com>; Lazar, Lijo <Lijo.Lazar at amd.com><mailto:Lijo.Lazar at amd.com>; amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>
Cc: Kasiviswanathan, Harish <Harish.Kasiviswanathan at amd.com><mailto:Harish.Kasiviswanathan at amd.com>
Subject: RE: [PATCH 0/5] 0 MHz is not a valid current frequency


[AMD Official Use Only]

+Harish, rocm-smi falls under his purview now.

Kent

From: Tuikov, Luben <Luben.Tuikov at amd.com<mailto:Luben.Tuikov at amd.com>>
Sent: Monday, October 18, 2021 4:30 PM
To: Deucher, Alexander <Alexander.Deucher at amd.com<mailto:Alexander.Deucher at amd.com>>; Quan, Evan <Evan.Quan at amd.com<mailto:Evan.Quan at amd.com>>; Lazar, Lijo <Lijo.Lazar at amd.com<mailto:Lijo.Lazar at amd.com>>; amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>; Russell, Kent <Kent.Russell at amd.com<mailto:Kent.Russell at amd.com>>
Subject: Re: [PATCH 0/5] 0 MHz is not a valid current frequency

I think Kent is already seen these patches as he did comment on 1/5 patch.

The v3 version of the patch, posted last week, removes the asterisk to report the lowest frequency as the current frequency, when the current frequency is 0, i.e. when the block is in low power state. Does the tool rely on the asterisk? If this information is necessary could it not use amdgpu_pm_info?

Regards,
Luben

On 2021-10-18 16:19, Deucher, Alexander wrote:

[Public]

We the current behavior (0 for clock) already crashes the tool, so I don't think we can really make things worse.

Alex

________________________________
From: Quan, Evan <Evan.Quan at amd.com><mailto:Evan.Quan at amd.com>
Sent: Thursday, October 14, 2021 10:25 PM
To: Lazar, Lijo <Lijo.Lazar at amd.com><mailto:Lijo.Lazar at amd.com>; Tuikov, Luben <Luben.Tuikov at amd.com><mailto:Luben.Tuikov at amd.com>; amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org> <amd-gfx at lists.freedesktop.org><mailto:amd-gfx at lists.freedesktop.org>; Russell, Kent <Kent.Russell at amd.com><mailto:Kent.Russell at amd.com>
Cc: Deucher, Alexander <Alexander.Deucher at amd.com><mailto:Alexander.Deucher at amd.com>
Subject: RE: [PATCH 0/5] 0 MHz is not a valid current frequency


[AMD Official Use Only]



+Kent who maintains the Rocm tool



From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org><mailto:amd-gfx-bounces at lists.freedesktop.org> On Behalf Of Lazar, Lijo
Sent: Thursday, October 14, 2021 1:07 AM
To: Tuikov, Luben <Luben.Tuikov at amd.com><mailto:Luben.Tuikov at amd.com>; amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>
Cc: Deucher, Alexander <Alexander.Deucher at amd.com><mailto:Alexander.Deucher at amd.com>
Subject: Re: [PATCH 0/5] 0 MHz is not a valid current frequency



[AMD Official Use Only]



[AMD Official Use Only]



>Or maybe just a list without default hint, i.e. no asterisk?



I think this is also fine meaning we are having trouble in determining the current frequency or DPM level. Evan/Alex? Don't know if this will crash the tools.



Thanks,
Lijo

________________________________

From: Tuikov, Luben <Luben.Tuikov at amd.com<mailto:Luben.Tuikov at amd.com>>
Sent: Wednesday, October 13, 2021 9:52:09 PM
To: Lazar, Lijo <Lijo.Lazar at amd.com<mailto:Lijo.Lazar at amd.com>>; amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org> <amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>>
Cc: Deucher, Alexander <Alexander.Deucher at amd.com<mailto:Alexander.Deucher at amd.com>>
Subject: Re: [PATCH 0/5] 0 MHz is not a valid current frequency



On 2021-10-13 00:14, Lazar, Lijo wrote:
>
> On 10/13/2021 8:40 AM, Luben Tuikov wrote:
>> Some ASIC support low-power functionality for the whole ASIC or just
>> an IP block. When in such low-power mode, some sysfs interfaces would
>> report a frequency of 0, e.g.,
>>
>> $cat /sys/class/drm/card0/device/pp_dpm_sclk
>> 0: 500Mhz
>> 1: 0Mhz *
>> 2: 2200Mhz
>> $_
>>
>> An operating frequency of 0 MHz doesn't make sense, and this interface
>> is designed to report only operating clock frequencies, i.e. non-zero,
>> and possibly the current one.
>>
>> When in this low-power state, round to the smallest
>> operating frequency, for this interface, as follows,
>>
> Would rather avoid this -
>
> 1) It is manipulating FW reported value. If at all there is an uncaught
> issue in FW reporting of frequency values, that is masked here.
> 2) Otherwise, if 0MHz is described as GFX power gated case, this
> provides a convenient interface to check if GFX is power gated.
>
> If seeing a '0' is not pleasing, consider changing to something like
>        "NA" - not available (frequency cannot be fetched at the moment).

There's a ROCm tool which literally asserts if the values are not ordered in increasing order. Now since 0 < 550, but 0 is listed as the second entry, the tool simply asserts and crashes.

It is not clear what you'd rather see here:

$cat /sys/class/drm/card0/device/pp_dpm_sclk
0: 550Mhz
1: N/A *
2: 2200MHz
$_

Is this what you want to see? (That'll crash other tools which expect %uMhz.)

Or maybe just a list without default hint, i.e. no asterisk?

$cat /sys/class/drm/card0/device/pp_dpm_sclk
0: 550Mhz
1: 2200MHz
$_

What should the output be?

We want to avoid showing 0, but still show numbers.

Regards,
Luben

>
> Thanks,
> Lijo
>
>> $cat /sys/class/drm/card0/device/pp_dpm_sclk
>> 0: 500Mhz *
>> 1: 2200Mhz
>> $_
>>
>> Luben Tuikov (5):
>>    drm/amd/pm: Slight function rename
>>    drm/amd/pm: Rename cur_value to curr_value
>>    drm/amd/pm: Rename freq_values --> freq_value
>>    dpm/amd/pm: Sienna: 0 MHz is not a current clock frequency
>>    dpm/amd/pm: Navi10: 0 MHz is not a current clock frequency
>>
>>   .../gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c   | 60 +++++++++------
>>   .../amd/pm/swsmu/smu11/sienna_cichlid_ppt.c   | 73 ++++++++++++-------
>>   2 files changed, 86 insertions(+), 47 deletions(-)
>>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20211019/eb3ee800/attachment-0001.htm>


More information about the amd-gfx mailing list