Looking for a start point for fixing a bug

Alex Deucher alexdeucher at gmail.com
Mon Aug 11 07:49:40 PDT 2014


On Mon, Aug 11, 2014 at 12:03 AM, Адонай Элохим <algonkvel at gmail.com> wrote:
> 2014-08-11 1:53 GMT+04:00 Niels Ole Salscheider
> <niels_ole at salscheider-online.de>:
>> On Monday 11 August 2014, 01:19:32, Адонай Элохим wrote:
>>> Hello again, hope you are still reading my texts...
>>>
>>> I digged through the code and narrowed down the issue I wanted to fix.
>>> It appears to be related to the `bool thermal_active` dpm struct
>>> member and this piece of code:
>>>
>>> if (rdev->asic->dpm.force_performance_level) {
>>>         if (rdev->pm.dpm.thermal_active) {
>>>             enum radeon_dpm_forced_level level = rdev->pm.dpm.forced_level;
>>>             /* force low perf level for thermal */
>>>             radeon_dpm_force_performance_level(rdev,
>>> RADEON_DPM_FORCED_LEVEL_LOW);
>>>             /* save the user's level */
>>>             rdev->pm.dpm.forced_level = level;
>>>         } else {
>>>             /* otherwise, user selected level */
>>>             radeon_dpm_force_performance_level(rdev,
>>> rdev->pm.dpm.forced_level); }
>>>     }
>>>
>>> I did a double check here - at boot `thermal_active` is `false` and
>>> thus, performance level is properly initiated. But at resume from
>>> suspend `thermal_active` is true and performance level is strictly
>>> bound to low profile.
>>> Besides you cannot change it via echo 1 > /sys/.../force_dpm_level,
>>> again thanks to `thermal_active` checked there.
>>>
>>> Could you explain meaning of this small boolean to me? I'd like to
>>> make a small neat patch fixing this, but I'm scared of doing it in
>>> wrong way.
>>> Sorry if I'm being too persistent.
>>
>> I think thermal_active means that the temperature got too high so that low
>> clocks have to be used.
>>
>> Just some idea, but thermal.work only gets scheduled when the high to low
>> temperature interrupt occurs. When the temperature is too high before suspend
>> (so that thermal_active is true) and it gets low during standby this interrupt
>> will not occur. thermal.work is therefore not scheduled...
>>
>> Ole
>>
>
> You were right, Ole. The driver thinks the temperature is too high.
> Thanks a lot!
> It seems the function ci_set_thermal_temperature_range is missing some lines:
>
>
> diff --git a/drivers/gpu/drm/radeon/ci_dpm.c b/drivers/gpu/drm/radeon/ci_dpm.c
> index 584090a..102a4bc 100644
> --- a/radeon/ci_dpm.c
> +++ b/radeon/ci_dpm.c
> @@ -869,6 +869,9 @@ static int ci_set_thermal_temperature_range(struct
> radeon_device *rdev,
>         WREG32_SMC(CG_THERMAL_CTRL, tmp);
>  #endif
>
> +    rdev->pm.dpm.thermal.min_temp = low_temp;
> +    rdev->pm.dpm.thermal.max_temp = high_temp;
> +
>         return 0;
>  }
>
>
> All other similar callbacks for different families of cards have these
> lines. I wonder if there is any specific case for not doing this...
> How do I propose it as a patch anyway?

You analysis is correct.  Thanks for tracking this down.

Alex

>
>>> Thanks,
>>> Oleg
>>>
>>> 2014-07-22 20:05 GMT+04:00 Alex Deucher <alexdeucher at gmail.com>:
>>> > On Tue, Jul 22, 2014 at 8:39 AM, Адонай Элохим <algonkvel at gmail.com>
>> wrote:
>>> >> Hello all!
>>> >>
>>> >> I have some spare time and knowledge in C to try to fix some bugs I am
>>> >> seeing on my machine.
>>> >> So I've checked out and compiled all git trees that I may need and now
>>> >> I'm
>>> >> beginning to read articles.
>>> >>
>>> >> And this is the point from where I don't know where to go. I want to fix
>>> >> particular bug #79806 [1].
>>> >> For me there are many places where this bug can hide - mesa? dri? radeon
>>> >> kernel module? and I just don't know whether should I start reading
>>> >> articles about mesa hacking or about dri architecture or about kernel
>>> >> module development.
>>> >>
>>> >> Now I think the best thing for me is to start looking through radeon
>>> >> kernel
>>> >> module code (I've got ingenious idea that power management resides there)
>>> >> and read more about its architecture. Is this right? I mean, I just want
>>> >> to
>>> >> find out, is this a right place to start looking at for this bug?
>>> >
>>> > The power management is handled in the kernel driver.  See radeon_pm.c
>>> > and the relevant *_dpm.c files depending on what asic you have.
>>> >
>>> > Alex
>>> >
>>> >> P.S. Sorry for my English in case it's not good, I'm learning it now
>>> >>
>>> >> P.P.S. And thanks for your hard work!
>>> >>
>>> >> -------------------------------------------
>>> >> [1] https://bugs.freedesktop.org/show_bug.cgi?id=79806
>>> >>
>>> >> _______________________________________________
>>> >> dri-devel mailing list
>>> >> dri-devel at lists.freedesktop.org
>>> >> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel at lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>>


More information about the dri-devel mailing list