Looking for a start point for fixing a bug
Niels Ole Salscheider
niels_ole at salscheider-online.de
Sun Aug 10 23:50:32 PDT 2014
On Monday 11 August 2014, 08:03:57, Адонай Элохим wrote:
> 2014-08-11 1:53 GMT+04:00 Niels Ole Salscheider
>
> <niels_ole at salscheider-online.de>:
> > On Monday 11 August 2014, 01:19:32, Адонай Элохим wrote:
> >> Hello again, hope you are still reading my texts...
> >>
> >> I digged through the code and narrowed down the issue I wanted to fix.
> >> It appears to be related to the `bool thermal_active` dpm struct
> >> member and this piece of code:
> >>
> >> if (rdev->asic->dpm.force_performance_level) {
> >>
> >> if (rdev->pm.dpm.thermal_active) {
> >>
> >> enum radeon_dpm_forced_level level =
> >> rdev->pm.dpm.forced_level;
> >> /* force low perf level for thermal */
> >> radeon_dpm_force_performance_level(rdev,
> >>
> >> RADEON_DPM_FORCED_LEVEL_LOW);
> >>
> >> /* save the user's level */
> >> rdev->pm.dpm.forced_level = level;
> >>
> >> } else {
> >>
> >> /* otherwise, user selected level */
> >> radeon_dpm_force_performance_level(rdev,
> >>
> >> rdev->pm.dpm.forced_level); }
> >>
> >> }
> >>
> >> I did a double check here - at boot `thermal_active` is `false` and
> >> thus, performance level is properly initiated. But at resume from
> >> suspend `thermal_active` is true and performance level is strictly
> >> bound to low profile.
> >> Besides you cannot change it via echo 1 > /sys/.../force_dpm_level,
> >> again thanks to `thermal_active` checked there.
> >>
> >> Could you explain meaning of this small boolean to me? I'd like to
> >> make a small neat patch fixing this, but I'm scared of doing it in
> >> wrong way.
> >> Sorry if I'm being too persistent.
> >
> > I think thermal_active means that the temperature got too high so that low
> > clocks have to be used.
> >
> > Just some idea, but thermal.work only gets scheduled when the high to low
> > temperature interrupt occurs. When the temperature is too high before
> > suspend (so that thermal_active is true) and it gets low during standby
> > this interrupt will not occur. thermal.work is therefore not scheduled...
> >
> > Ole
>
> You were right, Ole. The driver thinks the temperature is too high.
> Thanks a lot!
> It seems the function ci_set_thermal_temperature_range is missing some
> lines:
>
>
> diff --git a/drivers/gpu/drm/radeon/ci_dpm.c
> b/drivers/gpu/drm/radeon/ci_dpm.c index 584090a..102a4bc 100644
> --- a/radeon/ci_dpm.c
> +++ b/radeon/ci_dpm.c
> @@ -869,6 +869,9 @@ static int ci_set_thermal_temperature_range(struct
> radeon_device *rdev,
> WREG32_SMC(CG_THERMAL_CTRL, tmp);
> #endif
>
> + rdev->pm.dpm.thermal.min_temp = low_temp;
> + rdev->pm.dpm.thermal.max_temp = high_temp;
> +
> return 0;
> }
>
>
> All other similar callbacks for different families of cards have these
> lines. I wonder if there is any specific case for not doing this...
This seems to be wrong indeed. I wonder why it happens after a suspend -
resume cycle, though. Does the hardware generate a corresponding interrupt
after resume?
There is however still an XXX comment in that function... Maybe Alex can
comment on that.
> How do I propose it as a patch anyway?
Fix it in your local git checkout, commit it (don't forget to pass -s to get
your signed-of-by line) and use git format-patch / git send-email to send it
to this list...
> >> Thanks,
> >> Oleg
> >>
> >> 2014-07-22 20:05 GMT+04:00 Alex Deucher <alexdeucher at gmail.com>:
> >> > On Tue, Jul 22, 2014 at 8:39 AM, Адонай Элохим <algonkvel at gmail.com>
> >
> > wrote:
> >> >> Hello all!
> >> >>
> >> >> I have some spare time and knowledge in C to try to fix some bugs I am
> >> >> seeing on my machine.
> >> >> So I've checked out and compiled all git trees that I may need and now
> >> >> I'm
> >> >> beginning to read articles.
> >> >>
> >> >> And this is the point from where I don't know where to go. I want to
> >> >> fix
> >> >> particular bug #79806 [1].
> >> >> For me there are many places where this bug can hide - mesa? dri?
> >> >> radeon
> >> >> kernel module? and I just don't know whether should I start reading
> >> >> articles about mesa hacking or about dri architecture or about kernel
> >> >> module development.
> >> >>
> >> >> Now I think the best thing for me is to start looking through radeon
> >> >> kernel
> >> >> module code (I've got ingenious idea that power management resides
> >> >> there)
> >> >> and read more about its architecture. Is this right? I mean, I just
> >> >> want
> >> >> to
> >> >> find out, is this a right place to start looking at for this bug?
> >> >
> >> > The power management is handled in the kernel driver. See radeon_pm.c
> >> > and the relevant *_dpm.c files depending on what asic you have.
> >> >
> >> > Alex
> >> >
> >> >> P.S. Sorry for my English in case it's not good, I'm learning it now
> >> >>
> >> >> P.P.S. And thanks for your hard work!
> >> >>
> >> >> -------------------------------------------
> >> >> [1] https://bugs.freedesktop.org/show_bug.cgi?id=79806
> >> >>
> >> >> _______________________________________________
> >> >> dri-devel mailing list
> >> >> dri-devel at lists.freedesktop.org
> >> >> http://lists.freedesktop.org/mailman/listinfo/dri-devel
> >>
> >> _______________________________________________
> >> dri-devel mailing list
> >> dri-devel at lists.freedesktop.org
> >> http://lists.freedesktop.org/mailman/listinfo/dri-devel
More information about the dri-devel
mailing list