[PATCH] drm/amdgpu: fix sysfs_emit/sysfs_emit_at warnings

Lazar, Lijo lijo.lazar at amd.com
Wed Sep 8 10:22:14 UTC 2021



On 9/8/2021 3:08 PM, Christian König wrote:
> Am 08.09.21 um 11:29 schrieb Lazar, Lijo:
>>
>>
>> On 9/8/2021 2:32 PM, Yu, Lang wrote:
>>> [AMD Official Use Only]
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: Lazar, Lijo <Lijo.Lazar at amd.com>
>>>> Sent: Wednesday, September 8, 2021 4:55 PM
>>>> To: Yu, Lang <Lang.Yu at amd.com>; Christian König
>>>> <ckoenig.leichtzumerken at gmail.com>; amd-gfx at lists.freedesktop.org
>>>> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>; Huang, Ray
>>>> <Ray.Huang at amd.com>; Tian Tao <tiantao6 at hisilicon.com>
>>>> Subject: Re: [PATCH] drm/amdgpu: fix sysfs_emit/sysfs_emit_at warnings
>>>>
>>>>
>>>>
>>>> On 9/8/2021 1:14 PM, Yu, Lang wrote:
>>>>> [AMD Official Use Only]
>>>>>
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Lazar, Lijo <Lijo.Lazar at amd.com>
>>>>>> Sent: Wednesday, September 8, 2021 3:36 PM
>>>>>> To: Christian König <ckoenig.leichtzumerken at gmail.com>; Yu, Lang
>>>>>> <Lang.Yu at amd.com>; amd-gfx at lists.freedesktop.org
>>>>>> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>; Huang, Ray
>>>>>> <Ray.Huang at amd.com>; Tian Tao <tiantao6 at hisilicon.com>
>>>>>> Subject: Re: [PATCH] drm/amdgpu: fix sysfs_emit/sysfs_emit_at
>>>>>> warnings
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 9/8/2021 12:07 PM, Christian König wrote:
>>>>>>> Am 08.09.21 um 07:56 schrieb Lang Yu:
>>>>>>>> sysfs_emit and sysfs_emit_at requrie a page boundary aligned buf
>>>>>>>> address. Make them happy!
>>>>>>>>
>>>>>>>> Warning Log:
>>>>>>>> [  492.545174] invalid sysfs_emit_at: buf:00000000f19bdfde at:0 [
>>>>>>>> 492.546416] WARNING: CPU: 7 PID: 1304 at fs/sysfs/file.c:765
>>>>>>>> sysfs_emit_at+0x4a/0xa0
>>>>>>>> [  492.654805] Call Trace:
>>>>>>>> [  492.655353]  ? smu_cmn_get_metrics_table+0x40/0x50 [amdgpu] [
>>>>>>>> 492.656780]  vangogh_print_clk_levels+0x369/0x410 [amdgpu] [
>>>>>>>> 492.658245]  vangogh_common_print_clk_levels+0x77/0x80 [amdgpu] [
>>>>>>>> 492.659733]  ? preempt_schedule_common+0x18/0x30 [ 492.660713]
>>>>>>>> smu_print_ppclk_levels+0x65/0x90 [amdgpu] [ 492.662107]
>>>>>>>> amdgpu_get_pp_od_clk_voltage+0x13d/0x190 [amdgpu] [ 492.663620]
>>>>>>>> dev_attr_show+0x1d/0x40
>>>>>>>
>>>>>>> Mhm, that at least partially doesn't looks like the right 
>>>>>>> approach to me.
>>>>>>>
>>>>>>> Why do we have string printing and sysfs code in the hardware
>>>>>>> version specific backend in the first place?
>>>>>>>
>>>>>>
>>>>>> This is a callback meant for printing ASIC specific information to
>>>>>> sysfs node. The buffer passed in sysfs read is passed as it is to 
>>>>>> the callback API.
>>>>>>
>>>>>>> That stuff needs to be implemented for each hardware generation and
>>>>>>> is now cluttered with sysfs buffer offset calculations.
>>>>>>>
>>>>>>
>>>>>> Looks like the warning happened because of this usage.
>>>>>>
>>>>>>                   size = amdgpu_dpm_print_clock_levels(adev, 
>>>>>> OD_SCLK, buf);
>>>>>>                   size += amdgpu_dpm_print_clock_levels(adev, 
>>>>>> OD_MCLK,
>>>>>> buf+size);
>>>>>>                   size += amdgpu_dpm_print_clock_levels(adev,
>>>>>> OD_VDDC_CURVE, buf+size);
>>>>>>                   size += amdgpu_dpm_print_clock_levels(adev,
>>>>>> OD_VDDGFX_OFFSET, buf+size);
>>>>>>                   size += amdgpu_dpm_print_clock_levels(adev,
>>>>>> OD_RANGE,
>>>>>> buf+size);
>>>>>>                   size += amdgpu_dpm_print_clock_levels(adev, 
>>>>>> OD_CCLK,
>>>>>> buf+size);
>>>>>>
>>>>>>
>>>>> [Yu, Lang]
>>>>> Yes. So it is fine we just fix the caller 
>>>>> amdgpu_get_pp_od_clk_voltage like
>>>> following:
>>>>>
>>>>> static ssize_t amdgpu_get_pp_od_clk_voltage(struct device *dev,
>>>>>         struct device_attribute *attr,
>>>>>         char *buf)
>>>>> {
>>>>>     struct drm_device *ddev = dev_get_drvdata(dev);
>>>>>     struct amdgpu_device *adev = drm_to_adev(ddev);
>>>>>     ssize_t size, offset;
>>>>>     int ret, i;
>>>>>     char temp_buf[512];
>>>>>     char clock_type[] = {OD_SCLK, OD_MCLK, OD_VDDC_CURVE,
>>>>>                          OD_VDDGFX_OFFSET, OD_RANGE, OD_CCLK};
>>>>>
>>>>>     if (amdgpu_in_reset(adev))
>>>>>         return -EPERM;
>>>>>     if (adev->in_suspend && !adev->in_runpm)
>>>>>         return -EPERM;
>>>>>
>>>>>     ret = pm_runtime_get_sync(ddev->dev);
>>>>>     if (ret < 0) {
>>>>>         pm_runtime_put_autosuspend(ddev->dev);
>>>>>         return ret;
>>>>>     }
>>>>>
>>>>>     offset = 0;
>>>>>
>>>>>     if (adev->powerplay.pp_funcs->print_clock_levels) {
>>>>>         for (i = 0; i < ARRAY_SIZE(clock_type); i++) {
>>>>>             size = amdgpu_dpm_print_clock_levels(adev,
>>>> clock_type[i], buf);
>>>>>             if (offset + size > PAGE_SIZE)
>>>>>                 break;
>>>>>             memcpy(temp_buf + offset, buf, size);
>>>>>             offset += size;
>>>>>         }
>>>>>         memcpy(buf, temp_buf, offset);
>>>>>         size = offset;
>>>>>     } else {
>>>>>         size = sysfs_emit(buf, "\n");
>>>>>     }
>>>>>     pm_runtime_mark_last_busy(ddev->dev);
>>>>>     pm_runtime_put_autosuspend(ddev->dev);
>>>>>
>>>>>     return size;
>>>>> }
>>>>>
>>>> Prefer to avoid any extra stack or heap usage for buffer. Maybe 
>>>> another arg to
>>>> pass offset along with buf?
>>>>
>>> [Yu, Lang]
>>> Actually, the buf address contains the offset(offset_in_page(buf)) .
>>
>> Though it's not a problem based on codeflow, static analysis tools 
>> might complain.
>>
>>> Or we just rollback to sprintf/snprintf.
>>>
>>
>> snprintf with (PAGE_SIZE-size) may be simpler. I think Darren took the 
>> effort to convert these, he may have some other ideas.
> 
> This is not what I meant. See from the design point of view the 
> print_clock_levels() callback is the bad idea to begin with.
> 
> What we should have instead is a callback which returns the clock as a 
> value which is then printed in the amdgpu_get_pp_od_clk_voltage() function.
> 
> This avoids passing around the buffer and remaining size everywhere and 
> also guarantees that the sysfs have unified printing over all hardware 
> generations.
> 

The scenario is one node used for multiple parameters - OD_SCLK, 
OD_CCLK, OD_VDDGFX_OFFSET etc.(mostly to avoid cluttering sysfs with 
lots of nodes). On top of it, the parameters supported (for ex: CCLK is 
not valid on dGPUs),  the number of levels supported etc. vary across 
ASICs. There has to be multiple calls or the call needs to return 
multiple values for a single parameter (for ex: up to 4, 8 or 16 levels 
of GFXCLK depending on ASIC).

I don't know the history of the callback, mostly it was considered more 
efficient to print it directly rather than fetch and print. Alex/Evan 
may know the details.

Thanks,
Lijo

> Regards,
> Christian.
> 


More information about the amd-gfx mailing list