PROBLEM: AMD Ryzen 9 7950X iGPU - Blinking Issue

Felix Richter judge at felixrichter.tech
Sat Jun 3 14:52:00 UTC 2023


Hi Guys,

sorry for the silence from my side. I had a lot of things to take care 
of after returning from vacation. Also I had to wait on the zfs modules 
to be updated to support kernel 6.3 for further testing.

The bad news is that I am still experiencing issues. I have been able to 
get a reproducible trigger for the buggy behavior. The moment I take a 
screenshot or any other program like `wdisplays` accesses the screen 
buffer the screen starts flickering. The only way to reset it is to 
reboot the machine or log out of the desktop.

With this I did a bisection to figure out which commit is responsible 
for this. I attached the logs to the mail. The short version is that I 
identified commit 81d0bcf9900932633d270d5bc4a54ff599c6ebdb as the 
culprit. Seems that there are side effects of having more flexible 
buffer placement for the case of the internal GPU. To verify that this 
actually is the cause of the issue I built the current archlinux kernel 
with an extra patch to revert the commit: 
https://github.com/ju6ge/linux/tree/v6.3.5-ju6ge. The result is that be 
bug is fixed!

Now if this is the desired long term fix I do not know …

Kind regards,
Felix Richter

On 02.05.23 16:12, Linux regression tracking (Thorsten Leemhuis) wrote:
> On 02.05.23 15:48, Felix Richter wrote:
>> On 5/2/23 15:34, Linux regression tracking (Thorsten Leemhuis) wrote:
>>> On 02.05.23 15:13, Alex Deucher wrote:
>>>> On Tue, May 2, 2023 at 7:45 AM Linux regression tracking (Thorsten
>>>> Leemhuis)<regressions at leemhuis.info>  wrote:
>>>>
>>>>> On 30.04.23 13:44, Felix Richter wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I am running into an issue with the integrated GPU of the Ryzen 9
>>>>>> 7950X. It seems to be a regression from kernel version 6.1 to 6.2.
>>>>>> The bug materializes in from of my monitor blinking, meaning it
>>>>>> turns full white shortly. This happens very often so that the
>>>>>> system becomes unpleasant to use.
>>>>>>
>>>>>> I am running the Archlinux Kernel:
>>>>>> The Issue happens on the bleeding edge kernel: 6.2.13
>>>>>> Switching back to the LTS kernel resolves the issue: 6.1.26
>>>>>>
>>>>>> I have two monitors attached to the system. One 42 inch 4k Display
>>>>>> and a 24 inch 1080p Display and am running sway as my desktop.
>>>>>>
>>>>>> Let me know if there is more information I could provide to help
>>>>>> narrow down the issue.
>>>>> Thanks for the report. To be sure the issue doesn't fall through the
>>>>> cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
>>>>> tracking bot:
>>>>>
>>>>> #regzbot ^introduced v6.1..v6.2
>>>>> #regzbot title drm: amdgpu: system becomes unpleasant to use after
>>>>> monitor starts blinking and turns full white
>>>>> #regzbot ignore-activity
>>>>>
>>>>> This isn't a regression? This issue or a fix for it are already
>>>>> discussed somewhere else? It was fixed already? You want to clarify
>>>>> when
>>>>> the regression started to happen? Or point out I got the title or
>>>>> something else totally wrong? Then just reply and tell me -- ideally
>>>>> while also telling regzbot about it, as explained by the page listed in
>>>>> the footer of this mail.
>>>>>
>>>>> Developers: When fixing the issue, remember to add 'Link:' tags
>>>>> pointing
>>>>> to the report (the parent of this mail). See page linked in footer for
>>>>> details.
>>>> This sounds exactly like the issue that was fixed in this patch which
>>>> is already on it's way to Linus:
>>>> https://gitlab.freedesktop.org/agd5f/linux/-/commit/08da182175db4c7f80850354849d95f2670e8cd9
>>> FWIW, you in the flood of emails likely missed that this is the same
>>> thread where you yesterday replied "If the module parameter didn't help
>>> then perhaps you are seeing some other issue.  Can you bisect?". That's
>>> why I decided to add this to the tracking. Or am I missing something
>>> obvious here?
>>>
>>> /me looks around again and can't see anything, but that doesn't have to
>>> mean anything...
>>>
>>> Felix, btw, this guide might help you with the bisection, even if it's
>>> just for kernel compilation:
>>>
>>> https://docs.kernel.org/next/admin-guide/quickly-build-trimmed-linux.html
>>>
>>> And to indirectly reply to your mail from yesterday[1]. You might want
>>> to ignore the arch linux kernel git repo and just do a bisection between
>>> 6.1 and the latest 6.2.y kernel using upstream repos; and if I were you
>>> I'd also try 6.3 or even mainline before that, in case the issue was
>>> fixed already.
>>>
>>> [1]
>>> https://lore.kernel.org/all/04749ee4-0728-92fe-bcb0-a7320279eaac@felixrichter.tech/
>>>
>> Thanks for the pointers, I'll do a bisection on my desktop from 6.1 to
>> the newest commit.
> FWIW, I wonder what you actually mean with "newest commit" here: a
> bisection between 6.1 and mainline HEAD might be a waste of time, *if*
> this is something that only happens in 6.2.y (say due to a broken or
> incomplete backport)
>
>> That was the part I was mostly unsure about … where
>> to start from.
>>
>> I was planning to use PKGBUILD scripts from arch to achieve the same
>> configuration as I would when installing
>> the package and just rewrite the script to use a local copy of the
>> source code instead of the repository.
>> That way I can just use the bisect command, rebuild the package and test
>> again.
> In my experience trying to deal with Linux distro's package managers
> creates more trouble than it's worth.
>
>> But I probably won't be able to finish it this week, since I am on
>> vacation starting tomorrow and will not have access to the computer in
>> question. I will be back next week, by that time the patch Alex is
>> talking about might
>> already be in mainline. So if that fixes it, I will notice and let you
>> know. If not I will do the bisection to figure out what the actual issue
>> is.
> Enjoy your vacation!
>
> Ciao, Thorsten
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bisect_final.log
Type: text/x-log
Size: 2476 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20230603/ea9a1517/attachment.bin>
-------------- next part --------------
81d0bcf9900932633d270d5bc4a54ff599c6ebdb is the first bad commit
commit 81d0bcf9900932633d270d5bc4a54ff599c6ebdb
Author: Alex Deucher <alexander.deucher at amd.com>
Date:   Wed Dec 7 11:08:53 2022 -0500

    drm/amdgpu: make display pinning more flexible (v2)
    
    Only apply the static threshold for Stoney and Carrizo.
    This hardware has certain requirements that don't allow
    mixing of GTT and VRAM.  Newer asics do not have these
    requirements so we should be able to be more flexible
    with where buffers end up.
    
    Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2270
    Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2291
    Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2255
    Acked-by: Luben Tuikov <luben.tuikov at amd.com>
    Reviewed-by: Christian König <christian.koenig at amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
    Cc: stable at vger.kernel.org

 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


More information about the dri-devel mailing list