[PATCH] drm/amdgpu: Fix regression in adjusting power table/profile

Matt Coffin mcoffin13 at gmail.com
Fri Jul 31 00:34:14 UTC 2020


Hey Pawel,

I did confirm that this patch *introduced* the issue both with the
bisect, and by testing reverting it.

Now, there's a lot of fragile pieces in the dpm handling, so it could be
this patch's interaction with something else that's causing it and it
may well not be the fault of this code, but this is the patch that
introduced the issue.

I'll have some more time tomorrow to try to get down to root cause here,
so maybe I'll have more to offer then.

Thanks for taking a look,
Matt

On 7/30/20 6:31 PM, Paweł Gronowski wrote:
> Hello Matt,
> 
> Thank you for your testing. It seems that my gpu (RX 570) does not support the
> vc setting so I can not exactly reproduce the issue. However I did trace the
> code path the test case takes and it seems to correctly pass through the while
> loop that parses the input and fails only in amdgpu_dpm_odn_edit_dpm_table.
> The 'parameter' array is populated the same way as the original code did. Since
> the amdgpu_dpm_odn_edit_dpm_table is reached, I think that your problem is
> unfortunately caused by something else.
> 
> 
> Paweł Gronowski
> 
> On Thu, Jul 30, 2020 at 08:49:41AM -0600, Matt Coffin wrote:
>> Hello all, I just did some testing with this applied, and while it no
>> longer returns -EINVAL, running `sudo sh -c 'echo "vc 2 2150 1195" >
>> /sys/class/drm/card1/device/pp_od_clk_voltage'` results in `sh` spiking
>> to, and staying at 100% CPU usage, with no indicating information in
>> `dmesg` from the kernel.
>>
>> It appeared to work at least ONCE, but potentially not after.
>>
>> This is not unique to Navi, and caused the problem on a POLARIS10 card
>> as well.
>>
>> Sorry for the bad news, and thanks for any insight you may have,
>> Matt Coffin
>>
>> On 7/29/20 8:53 PM, Alex Deucher wrote:
>>> On Wed, Jul 29, 2020 at 10:20 PM Paweł Gronowski <me at woland.xyz> wrote:
>>>>
>>>> Regression was introduced in commit 38e0c89a19fd
>>>> ("drm/amdgpu: Fix NULL dereference in dpm sysfs handlers") which
>>>> made the set_pp_od_clk_voltage and set_pp_power_profile_mode return
>>>> -EINVAL for previously valid input. This was caused by an empty
>>>> string (starting at the \0 character) being passed to the kstrtol.
>>>>
>>>> Signed-off-by: Paweł Gronowski <me at woland.xyz>
>>>
>>> Applied.  Thanks!
>>>
>>> Alex
>>>
>>>> ---
>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 9 +++++++--
>>>>  1 file changed, 7 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
>>>> index ebb8a28ff002..cbf623ff03bd 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
>>>> @@ -778,12 +778,14 @@ static ssize_t amdgpu_set_pp_od_clk_voltage(struct device *dev,
>>>>                 tmp_str++;
>>>>         while (isspace(*++tmp_str));
>>>>
>>>> -       while ((sub_str = strsep(&tmp_str, delimiter)) != NULL) {
>>>> +       while ((sub_str = strsep(&tmp_str, delimiter)) && *sub_str) {
>>>>                 ret = kstrtol(sub_str, 0, &parameter[parameter_size]);
>>>>                 if (ret)
>>>>                         return -EINVAL;
>>>>                 parameter_size++;
>>>>
>>>> +               if (!tmp_str)
>>>> +                       break;
>>>>                 while (isspace(*tmp_str))
>>>>                         tmp_str++;
>>>>         }
>>>> @@ -1635,11 +1637,14 @@ static ssize_t amdgpu_set_pp_power_profile_mode(struct device *dev,
>>>>                         i++;
>>>>                 memcpy(buf_cpy, buf, count-i);
>>>>                 tmp_str = buf_cpy;
>>>> -               while ((sub_str = strsep(&tmp_str, delimiter)) != NULL) {
>>>> +               while ((sub_str = strsep(&tmp_str, delimiter)) && *sub_str) {
>>>>                         ret = kstrtol(sub_str, 0, &parameter[parameter_size]);
>>>>                         if (ret)
>>>>                                 return -EINVAL;
>>>>                         parameter_size++;
>>>> +
>>>> +                       if (!tmp_str)
>>>> +                               break;
>>>>                         while (isspace(*tmp_str))
>>>>                                 tmp_str++;
>>>>                 }
>>>> --
>>>> 2.25.1
>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx at lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>
>>
> 
> 
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20200730/10786ffe/attachment-0001.sig>


More information about the amd-gfx mailing list