Non-deterministically boot into dark screen with `amdgpu`

Christian König ckoenig.leichtzumerken at gmail.com
Mon Aug 10 11:46:27 UTC 2020


Hi guys,

Am 10.08.20 um 08:43 schrieb Alexander Monakov:
> Hi,
>
> you should Сс a specialized mailing list and a relevant maintainer,
> otherwise your email is likely to be ignored as LKML is an incredibly
> high-volume list. Adding amd-gfx and Alex Deucher.

Thanks for forwarding this. AFAIK we haven't heard of this bug before, 
but Alex already might know more about it.

> More thoughts below.
>
> On Sun, 9 Aug 2020, Ignat Insarov wrote:
>
>> Hello!
>>
>> This is an issue report. I am not familiar with the Linux kernel
>> development procedure, so please direct me to a more appropriate or
>> specialized medium if this is not the right avenue.
>>
>> My laptop (Ryzen 7 Pro CPU/GPU) boots into dark screen more often than
>> not. Screen blackness correlates with a line in the `systemd` journal
>> that says `RAM width Nbits DDR4`, where N is either 128 (resulting in
>> dark screen) or 64 (resulting in a healthy boot). The number seems to
>> be chosen at random with bias towards 128. This has been going on for
>> a while so here is some statistics:
>>
>> * 356 boots proceed far enough to  attempt mode setting.
>> * 82 boots set RAM width to 64 bits and presumably succeed.
>> * 274 boots set RAM width to 128 bits and presumably fail.
>>
>> The issue is prevented with the `nomodeset` kernel option.
>>
>> I reported this previously (about a year ago) on the forum of my Linux
>> distribution.[1] The issue still persists as of  linux 5.8.0.
>>
>> The details of my graphics controller, as well as some journal
>> excerpts, can be seen at [1]. One thing that has changed since then is
>> that on failure, there now appears a null pointer dereference error. I
>> am attaching the log of kernel messages from the most recent failed
>> boot — please request more information if needed.
>>
>> I appreciate any directions and advice as to how I may go about fixing
>> this annoyance.
>>
>> [1]: https://bbs.archlinux.org/viewtopic.php?id=248273
>
> On the forum you show that in the "success" case there's one less "BIOS
> signature incorrect" message. This implies that amdgpu_get_bios() in
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c
> gets the video BIOS from a different source. If that happens every time
> (one "signature incorrect" message for "success", two for "failure")
> that may be relevant to the problem you're experiencing.
>
> If you don't mind patching and rebuilding the kernel I suggest adding
> debug printks to the aforementioned function to see exactly which methods
> fail with wrong signature and which succeeds.
>
> Also might be worthwhile to check if there's a BIOS update for your laptop.

It might also be a good idea to try the latest amd-staging-drm-next 
branch from Alex repository (bear with me I don't have the link at hand, 
but it should be easy to find).

Opening a bug report or searching the existing ones for something 
similar under https://gitlab.freedesktop.org/drm/amd/-/issues might be a 
good idea as well.

And I completely agree that this sounds like an issue getting the BIOS 
image.

Thanks,
Christian.

>
> Alexander
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20200810/6dd88adc/attachment.htm>


More information about the amd-gfx mailing list