Kernel 5.15.150 black screen with AMD Raven/Picasso GPU

Armin Wolf W_Armin at gmx.de
Sat May 18 23:59:37 UTC 2024


Am 17.05.24 um 03:30 schrieb Barry Kauler:

> Armin, Yifan, Prike,
> I will top-post, so you don't have to scroll down.
> After identifying the commit that causes black screen with my gpu, I
> posted the result to you guys, on May 9.
> It is now May 17 and no reply.
> OK, I have now created a patch that reverts Yifan's commit, compiled
> 5.15.158, and my gpu now works.
> Note, the radeon module is not loaded, so it is not a factor.
> I'm not a kernel developer. I have identified the culprit and it is up
> to you guys to fix it, Yifan especially, as you are the person who has
> created the regression.
> I will attach my patch.
> Regards,
> Barry Kauler

Hi,

sorry for not responding to your findings. I normally do not work with GPU drivers,
so i hoped one of the amdgpu developers would handle this.

I CCeddri-devel at lists.freedesktop.org  and amd-gfx at lists.freedesktop.org so that other
amdgpu developers hear from this issue.

Thanks you for you persistence in finding the offending commit.
Armin Wolf

>
> On Thu, May 9, 2024 at 4:08 PM Barry Kauler <bkauler at gmail.com> wrote:
>> On Fri, May 3, 2024 at 9:03 PM Armin Wolf <W_Armin at gmx.de> wrote:
>>>> ...
>>>> # lspci | grep VGA
>>>> 05:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
>>>> [AMD/ATI] Picasso/Raven 2 [Radeon Vega Series / Radeon Vega Mobile
>>>> Series] (rev c2)
>>>> 05:00.7 Non-VGA unclassified device: Advanced Micro Devices, Inc.
>>>> [AMD] Raven/Raven2/Renoir Non-Sensor Fusion Hub KMDF driver
>>>>
>>>> # lspci -n -k
>>>> ...
>>>> 05:00.0 0300: 1002:15d8 (rev c2)
>>>> Subsystem: 1025:1456
>>>> Kernel driver in use: amdgpu
>>>> Kernel modules: amdgpu
>>>> ...
>>> thanks for informing us of this regression. Since there are four commits affecting
>>> amdgpu in 5.15.150, i suggest that you use "git bisect" to find the faulty commits,
>>> see https://docs.kernel.org/admin-guide/bug-bisect.html for details.
>>>
>>> I think you can speed up the bisecting process by limiting yourself to the AMD DRM
>>> driver directory with "git bisect start -- drivers/gpu/drm/amd", take a look at the
>>> man page of "git bisect" for details.
>>>
>>> Thanks,
>>> Armin Wolf
>> Armin,
>> Thanks for the advice. I am unfamiliar with git on the commandline.
>> Previously only used SmartGit gui.
>> EasyOS requires aufs patch, and for a few days tried to figure out how
>> to use that with git bisect, then gave up. Changed to testing with my
>> "QV" distro, which is more conventional, doesn't need any kernel
>> patches. Managed to get it down to one commit. Here are the steps I
>> followed:
>>
>> # git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
>> # cd linux-stable
>> # git tag -l | grep '5\.15\.150'
>> v5.15.150
>> # git checkout -b my5.15.150 v5.15.150
>> Updating files: 100% (65776/65776), done.
>> Switched to a new branch 'my5.15.150'
>>
>> Copied in my .config then...
>>
>> # make menuconfig
>> # git bisect start -- drivers/gpu/drm/amd
>> # git bisect bad
>> # git bisect good v5.15.149
>> Bisecting: 1 revision left to test after this (roughly 1 step)
>> [b9a61ee2bb2704e42516e3da962f99dfa98f3b20] drm/amdgpu: reset gpu for
>> s3 suspend abort case
>> # make
>> # rm -rf /boot2
>> # mkdir -p /boot2/lib/modules
>> # make INSTALL_MOD_STRIP=1 INSTALL_MOD_PATH=/boot2 modules_install
>> # cp arch/x86/boot/bzImage /boot2/vmlinuz
>> # sync
>> ...QV on Acer laptop, with amdgpu, works!
>> # git bisect good
>> Bisecting: 0 revisions left to test after this (roughly 0 steps)
>> [56b522f4668167096a50c39446d6263c96219f5f] drm/amdgpu: init iommu
>> after amdkfd device init
>> # make
>> # mkdir -p /boot2/lib/modules
>> # make INSTALL_MOD_STRIP=1 INSTALL_MOD_PATH=/boot2 modules_install
>> # cp arch/x86/boot/bzImage /boot2/vmlinuz
>> # sync
>> ...QV on Acer laptop, black screen!
>>
>> # git bisect bad
>> 56b522f4668167096a50c39446d6263c96219f5f is the first bad commit
>> commit 56b522f4668167096a50c39446d6263c96219f5f
>> Author: Yifan Zhang <yifan1.zhang at amd.com>
>> Date:   Tue Sep 28 15:42:35 2021 +0800
>>
>>      drm/amdgpu: init iommu after amdkfd device init
>>
>>      [ Upstream commit 286826d7d976e7646b09149d9bc2899d74ff962b ]
>>
>>      This patch is to fix clinfo failure in Raven/Picasso:
>>
>>      Number of platforms: 1
>>        Platform Profile: FULL_PROFILE
>>        Platform Version: OpenCL 2.2 AMD-APP (3364.0)
>>        Platform Name: AMD Accelerated Parallel Processing
>>        Platform Vendor: Advanced Micro Devices, Inc.
>>        Platform Extensions: cl_khr_icd cl_amd_event_callback
>>
>>        Platform Name: AMD Accelerated Parallel Processing Number of devices: 0
>>
>>      Signed-off-by: Yifan Zhang <yifan1.zhang at amd.com>
>>      Reviewed-by: James Zhu <James.Zhu at amd.com>
>>      Tested-by: James Zhu <James.Zhu at amd.com>
>>      Acked-by: Felix Kuehling <Felix.Kuehling at amd.com>
>>      Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
>>      Signed-off-by: Sasha Levin <sashal at kernel.org>
>>
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++++----
>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> Anything else I should do, to identify what in this commit is the
>> likely culprit?
>> Regards,
>> Barry Kauler


More information about the amd-gfx mailing list