[PATCH] drm/amdgpu: fix sysfs_emit/sysfs_emit_at warnings

Christian König christian.koenig at amd.com
Wed Sep 8 12:40:18 UTC 2021


Well that's complete nonsense, in that case we should rather go back to 
the s*printf functionality.

The problem is that we are not using the sysfs_emit/sysfs_emit_at as 
intended, e.g. something like this:

offs = sysfs_emit(page, "first...);
offs += sysfs_emit_at(page, offs, "second....);
offs += sysfs_emit_at(page, offs, "third....);
...

This is usually done inside the same function and not spread around in 
the driver.

Regards,
Christian.

Am 08.09.21 um 12:55 schrieb Yu, Lang:
> [AMD Official Use Only]
>
> Or try to send a patch to remove page boundary aligned limitation. Any considerations? Thanks.
>
> int sysfs_emit(char *buf, const char *fmt, ...)
>   {
>          va_list args;
> -       int len;
> +       int len, offset;
>
> -       if (WARN(!buf || offset_in_page(buf),
> +       offset = offset_in_page(buf);
> +
> +       if (WARN(!buf,
>                   "invalid sysfs_emit: buf:%p\n", buf))
>                  return 0;
>
>          va_start(args, fmt);
> -       len = vscnprintf(buf, PAGE_SIZE, fmt, args);
> +       len = vscnprintf(buf, PAGE_SIZE - offset, fmt, args);
>          va_end(args);
>
>          return len;
> @@ -760,14 +762,16 @@ EXPORT_SYMBOL_GPL(sysfs_emit);
>   int sysfs_emit_at(char *buf, int at, const char *fmt, ...)
>   {
>          va_list args;
> -       int len;
> +       int len, offset;
> +
> +       offset = offset_in_page(buf);
>
> -       if (WARN(!buf || offset_in_page(buf) || at < 0 || at >= PAGE_SIZE,
> +       if (WARN(!buf || at < 0 || at + offset >= PAGE_SIZE,
>                   "invalid sysfs_emit_at: buf:%p at:%d\n", buf, at))
>                  return 0;
>
>          va_start(args, fmt);
> -       len = vscnprintf(buf + at, PAGE_SIZE - at, fmt, args);
> +       len = vscnprintf(buf + at, PAGE_SIZE - at - offset, fmt, args);
>          va_end(args);
>
>          return len;
>
> Regards,
> Lang
>
>> -----Original Message-----
>> From: Lazar, Lijo <Lijo.Lazar at amd.com>
>> Sent: Wednesday, September 8, 2021 6:22 PM
>> To: Christian König <ckoenig.leichtzumerken at gmail.com>; Yu, Lang
>> <Lang.Yu at amd.com>; amd-gfx at lists.freedesktop.org
>> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>; Huang, Ray
>> <Ray.Huang at amd.com>; Tian Tao <tiantao6 at hisilicon.com>; Powell, Darren
>> <Darren.Powell at amd.com>
>> Subject: Re: [PATCH] drm/amdgpu: fix sysfs_emit/sysfs_emit_at warnings
>>
>>
>>
>> On 9/8/2021 3:08 PM, Christian König wrote:
>>> Am 08.09.21 um 11:29 schrieb Lazar, Lijo:
>>>>
>>>> On 9/8/2021 2:32 PM, Yu, Lang wrote:
>>>>> [AMD Official Use Only]
>>>>>
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Lazar, Lijo <Lijo.Lazar at amd.com>
>>>>>> Sent: Wednesday, September 8, 2021 4:55 PM
>>>>>> To: Yu, Lang <Lang.Yu at amd.com>; Christian König
>>>>>> <ckoenig.leichtzumerken at gmail.com>; amd-gfx at lists.freedesktop.org
>>>>>> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>; Huang, Ray
>>>>>> <Ray.Huang at amd.com>; Tian Tao <tiantao6 at hisilicon.com>
>>>>>> Subject: Re: [PATCH] drm/amdgpu: fix sysfs_emit/sysfs_emit_at
>>>>>> warnings
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 9/8/2021 1:14 PM, Yu, Lang wrote:
>>>>>>> [AMD Official Use Only]
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Lazar, Lijo <Lijo.Lazar at amd.com>
>>>>>>>> Sent: Wednesday, September 8, 2021 3:36 PM
>>>>>>>> To: Christian König <ckoenig.leichtzumerken at gmail.com>; Yu, Lang
>>>>>>>> <Lang.Yu at amd.com>; amd-gfx at lists.freedesktop.org
>>>>>>>> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>; Huang, Ray
>>>>>>>> <Ray.Huang at amd.com>; Tian Tao <tiantao6 at hisilicon.com>
>>>>>>>> Subject: Re: [PATCH] drm/amdgpu: fix sysfs_emit/sysfs_emit_at
>>>>>>>> warnings
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 9/8/2021 12:07 PM, Christian König wrote:
>>>>>>>>> Am 08.09.21 um 07:56 schrieb Lang Yu:
>>>>>>>>>> sysfs_emit and sysfs_emit_at requrie a page boundary aligned
>>>>>>>>>> buf address. Make them happy!
>>>>>>>>>>
>>>>>>>>>> Warning Log:
>>>>>>>>>> [  492.545174] invalid sysfs_emit_at: buf:00000000f19bdfde at:0
>>>>>>>>>> [ 492.546416] WARNING: CPU: 7 PID: 1304 at fs/sysfs/file.c:765
>>>>>>>>>> sysfs_emit_at+0x4a/0xa0
>>>>>>>>>> [  492.654805] Call Trace:
>>>>>>>>>> [  492.655353]  ? smu_cmn_get_metrics_table+0x40/0x50 [amdgpu]
>>>>>>>>>> [ 492.656780]  vangogh_print_clk_levels+0x369/0x410 [amdgpu] [
>>>>>>>>>> 492.658245]  vangogh_common_print_clk_levels+0x77/0x80 [amdgpu]
>>>>>>>>>> [ 492.659733]  ? preempt_schedule_common+0x18/0x30 [
>>>>>>>>>> 492.660713]
>>>>>>>>>> smu_print_ppclk_levels+0x65/0x90 [amdgpu] [ 492.662107]
>>>>>>>>>> amdgpu_get_pp_od_clk_voltage+0x13d/0x190 [amdgpu] [ 492.663620]
>>>>>>>>>> dev_attr_show+0x1d/0x40
>>>>>>>>> Mhm, that at least partially doesn't looks like the right
>>>>>>>>> approach to me.
>>>>>>>>>
>>>>>>>>> Why do we have string printing and sysfs code in the hardware
>>>>>>>>> version specific backend in the first place?
>>>>>>>>>
>>>>>>>> This is a callback meant for printing ASIC specific information
>>>>>>>> to sysfs node. The buffer passed in sysfs read is passed as it is
>>>>>>>> to the callback API.
>>>>>>>>
>>>>>>>>> That stuff needs to be implemented for each hardware generation
>>>>>>>>> and is now cluttered with sysfs buffer offset calculations.
>>>>>>>>>
>>>>>>>> Looks like the warning happened because of this usage.
>>>>>>>>
>>>>>>>>                    size = amdgpu_dpm_print_clock_levels(adev,
>>>>>>>> OD_SCLK, buf);
>>>>>>>>                    size += amdgpu_dpm_print_clock_levels(adev,
>>>>>>>> OD_MCLK,
>>>>>>>> buf+size);
>>>>>>>>                    size += amdgpu_dpm_print_clock_levels(adev,
>>>>>>>> OD_VDDC_CURVE, buf+size);
>>>>>>>>                    size += amdgpu_dpm_print_clock_levels(adev,
>>>>>>>> OD_VDDGFX_OFFSET, buf+size);
>>>>>>>>                    size += amdgpu_dpm_print_clock_levels(adev,
>>>>>>>> OD_RANGE,
>>>>>>>> buf+size);
>>>>>>>>                    size += amdgpu_dpm_print_clock_levels(adev,
>>>>>>>> OD_CCLK,
>>>>>>>> buf+size);
>>>>>>>>
>>>>>>>>
>>>>>>> [Yu, Lang]
>>>>>>> Yes. So it is fine we just fix the caller
>>>>>>> amdgpu_get_pp_od_clk_voltage like
>>>>>> following:
>>>>>>> static ssize_t amdgpu_get_pp_od_clk_voltage(struct device *dev,
>>>>>>>          struct device_attribute *attr,
>>>>>>>          char *buf)
>>>>>>> {
>>>>>>>      struct drm_device *ddev = dev_get_drvdata(dev);
>>>>>>>      struct amdgpu_device *adev = drm_to_adev(ddev);
>>>>>>>      ssize_t size, offset;
>>>>>>>      int ret, i;
>>>>>>>      char temp_buf[512];
>>>>>>>      char clock_type[] = {OD_SCLK, OD_MCLK, OD_VDDC_CURVE,
>>>>>>>                           OD_VDDGFX_OFFSET, OD_RANGE, OD_CCLK};
>>>>>>>
>>>>>>>      if (amdgpu_in_reset(adev))
>>>>>>>          return -EPERM;
>>>>>>>      if (adev->in_suspend && !adev->in_runpm)
>>>>>>>          return -EPERM;
>>>>>>>
>>>>>>>      ret = pm_runtime_get_sync(ddev->dev);
>>>>>>>      if (ret < 0) {
>>>>>>>          pm_runtime_put_autosuspend(ddev->dev);
>>>>>>>          return ret;
>>>>>>>      }
>>>>>>>
>>>>>>>      offset = 0;
>>>>>>>
>>>>>>>      if (adev->powerplay.pp_funcs->print_clock_levels) {
>>>>>>>          for (i = 0; i < ARRAY_SIZE(clock_type); i++) {
>>>>>>>              size = amdgpu_dpm_print_clock_levels(adev,
>>>>>> clock_type[i], buf);
>>>>>>>              if (offset + size > PAGE_SIZE)
>>>>>>>                  break;
>>>>>>>              memcpy(temp_buf + offset, buf, size);
>>>>>>>              offset += size;
>>>>>>>          }
>>>>>>>          memcpy(buf, temp_buf, offset);
>>>>>>>          size = offset;
>>>>>>>      } else {
>>>>>>>          size = sysfs_emit(buf, "\n");
>>>>>>>      }
>>>>>>>      pm_runtime_mark_last_busy(ddev->dev);
>>>>>>>      pm_runtime_put_autosuspend(ddev->dev);
>>>>>>>
>>>>>>>      return size;
>>>>>>> }
>>>>>>>
>>>>>> Prefer to avoid any extra stack or heap usage for buffer. Maybe
>>>>>> another arg to pass offset along with buf?
>>>>>>
>>>>> [Yu, Lang]
>>>>> Actually, the buf address contains the offset(offset_in_page(buf)) .
>>>> Though it's not a problem based on codeflow, static analysis tools
>>>> might complain.
>>>>
>>>>> Or we just rollback to sprintf/snprintf.
>>>>>
>>>> snprintf with (PAGE_SIZE-size) may be simpler. I think Darren took
>>>> the effort to convert these, he may have some other ideas.
>>> This is not what I meant. See from the design point of view the
>>> print_clock_levels() callback is the bad idea to begin with.
>>>
>>> What we should have instead is a callback which returns the clock as a
>>> value which is then printed in the amdgpu_get_pp_od_clk_voltage() function.
>>>
>>> This avoids passing around the buffer and remaining size everywhere
>>> and also guarantees that the sysfs have unified printing over all
>>> hardware generations.
>>>
>> The scenario is one node used for multiple parameters - OD_SCLK, OD_CCLK,
>> OD_VDDGFX_OFFSET etc.(mostly to avoid cluttering sysfs with lots of nodes). On
>> top of it, the parameters supported (for ex: CCLK is not valid on dGPUs),  the
>> number of levels supported etc. vary across ASICs. There has to be multiple calls
>> or the call needs to return multiple values for a single parameter (for ex: up to 4,
>> 8 or 16 levels of GFXCLK depending on ASIC).
>>
>> I don't know the history of the callback, mostly it was considered more efficient
>> to print it directly rather than fetch and print. Alex/Evan may know the details.
>>
>> Thanks,
>> Lijo
>>
>>> Regards,
>>> Christian.
>>>



More information about the amd-gfx mailing list