Regression in 6.6: trying to set DPMS mode kills radeon (r600)

Holger Hoffstätte holger at applied-asynchrony.com
Tue Dec 19 21:27:50 UTC 2023


On 2023-12-19 19:46, Alex Deucher wrote:
> On Mon, Dec 18, 2023 at 1:52 PM Holger Hoffstätte
> <holger at applied-asynchrony.com> wrote:
>>
>> On 2023-12-16 18:36, Holger Hoffstätte wrote:
>>
>> <snip>
>>> The affected machine is an older SandyBridge dektop with a fanless
>>> r600 Redwood GPU, using the radeon driver. "Recently" - some time
>>> after the last few 6.6.x stable updates - it started to die with GPU
>>> lockups. I first blamed this on standby/resume - because why not? - but
>>> this turned out to be wrong; the real culprit is DPMS.
>>>
>>> I use xfce-power-manager as "screensaver" to turn off the display after
>>> inacitvity. This can be configured in two ways: "suspend" and "poweroff".
>>> I've been using "poweroff" since forever without problems, until now.
>>>
>>> The symptom is that everything works fine until the screensaver kicks in
>>> and tries to turn the monitor off, which sends the radeon driver and the GPU
>>> into a complete tailspin.
>>
>> <snip>
>>
>>> Eventually the screensaver tries to switch off the monitor via DPMS "poweroff" method and
>>> this greatly upsets the GPU:
>>>
>>> Dec 12 20:39:59 ragnarok kernel: radeon 0000:01:00.0: ring 0 stalled for more than 10140msec
>>> Dec 12 20:39:59 ragnarok kernel: radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000002 last fence id 0x0000000000000003 on ring 0)
>>
>> In the meantime I have confirmed that all this is still more complicated:
>> even using the "suspend" method only works after boot, not after a system suspend
>> cycle. Yes, weird but reproducible.
>>
>> I have tried to chase down the problematic release, and as suspected this
>> started to happen with 6.6.5; 6.6.4 is fine.
>>
>> Based on this information I found the offending commits and reverted them
>> in order from 6.6.7, which fixes everything for me:
>>
>> b0399e22ada0 "drm/amd/display: Remove power sequencing check"
>> 45f98fccb1f6 "drm/amd/display: Refactor edp power control"
> 
> Those patches are for amdgpu.  From the logs in your original post,
> you are using the radeon driver.  They two are completely separate
> drivers.  I don't see how those patches could be related.  That code
> would never even execute.

Hi,

I understand the difference between amdgpu and radeon, that's why I was
wondering why those patches would make a difference.

The crash/no-crash behaviour was definitely reproducible - same config
and clean rebuild every time etc. My only guess was that maybe one of the
touched headers got included in the drm-display-helper used by radeon as
well, but that is seemingly not the case either.

In any case, it seems that whatever was going on is fixed in stable-6.6.8-rc1;
at least I haven't been able to reproduce the lockup so far, with various
combinations of display suspend/resume. There's at least one EDID-related patch
in 6.6.8 but I don' understand enough about the various display technologies to
assess whether that could have played a role.

You can probably imagine how frustrating it is to have a GPU that deadlocks while
_not_ doing anything. At least it seems to be working again now, either way.

Thanks for reading!

cheers
Holger


More information about the amd-gfx mailing list