Kernel 5.15.150 black screen with AMD Raven/Picasso GPU

Alex Deucher alexdeucher at gmail.com
Mon May 20 16:22:20 UTC 2024


On Sat, May 18, 2024 at 8:17 PM Armin Wolf <W_Armin at gmx.de> wrote:
>
> Am 17.05.24 um 03:30 schrieb Barry Kauler:
>
> > Armin, Yifan, Prike,
> > I will top-post, so you don't have to scroll down.
> > After identifying the commit that causes black screen with my gpu, I
> > posted the result to you guys, on May 9.
> > It is now May 17 and no reply.
> > OK, I have now created a patch that reverts Yifan's commit, compiled
> > 5.15.158, and my gpu now works.
> > Note, the radeon module is not loaded, so it is not a factor.
> > I'm not a kernel developer. I have identified the culprit and it is up
> > to you guys to fix it, Yifan especially, as you are the person who has
> > created the regression.
> > I will attach my patch.
> > Regards,
> > Barry Kauler
>
> Hi,
>
> sorry for not responding to your findings. I normally do not work with GPU drivers,
> so i hoped one of the amdgpu developers would handle this.
>
> I CCeddri-devel at lists.freedesktop.org  and amd-gfx at lists.freedesktop.org so that other
> amdgpu developers hear from this issue.
>
> Thanks you for you persistence in finding the offending commit.

Likely this patch should not have been ported to 5.15 in the first
place.  The IOMMU requirements have been dropped from the driver for
the last few kernel versions so it is no longer relevant on newer
kernels.

Alex


> Armin Wolf
>
> >
> > On Thu, May 9, 2024 at 4:08 PM Barry Kauler <bkauler at gmail.com> wrote:
> >> On Fri, May 3, 2024 at 9:03 PM Armin Wolf <W_Armin at gmx.de> wrote:
> >>>> ...
> >>>> # lspci | grep VGA
> >>>> 05:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
> >>>> [AMD/ATI] Picasso/Raven 2 [Radeon Vega Series / Radeon Vega Mobile
> >>>> Series] (rev c2)
> >>>> 05:00.7 Non-VGA unclassified device: Advanced Micro Devices, Inc.
> >>>> [AMD] Raven/Raven2/Renoir Non-Sensor Fusion Hub KMDF driver
> >>>>
> >>>> # lspci -n -k
> >>>> ...
> >>>> 05:00.0 0300: 1002:15d8 (rev c2)
> >>>> Subsystem: 1025:1456
> >>>> Kernel driver in use: amdgpu
> >>>> Kernel modules: amdgpu
> >>>> ...
> >>> thanks for informing us of this regression. Since there are four commits affecting
> >>> amdgpu in 5.15.150, i suggest that you use "git bisect" to find the faulty commits,
> >>> see https://docs.kernel.org/admin-guide/bug-bisect.html for details.
> >>>
> >>> I think you can speed up the bisecting process by limiting yourself to the AMD DRM
> >>> driver directory with "git bisect start -- drivers/gpu/drm/amd", take a look at the
> >>> man page of "git bisect" for details.
> >>>
> >>> Thanks,
> >>> Armin Wolf
> >> Armin,
> >> Thanks for the advice. I am unfamiliar with git on the commandline.
> >> Previously only used SmartGit gui.
> >> EasyOS requires aufs patch, and for a few days tried to figure out how
> >> to use that with git bisect, then gave up. Changed to testing with my
> >> "QV" distro, which is more conventional, doesn't need any kernel
> >> patches. Managed to get it down to one commit. Here are the steps I
> >> followed:
> >>
> >> # git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
> >> # cd linux-stable
> >> # git tag -l | grep '5\.15\.150'
> >> v5.15.150
> >> # git checkout -b my5.15.150 v5.15.150
> >> Updating files: 100% (65776/65776), done.
> >> Switched to a new branch 'my5.15.150'
> >>
> >> Copied in my .config then...
> >>
> >> # make menuconfig
> >> # git bisect start -- drivers/gpu/drm/amd
> >> # git bisect bad
> >> # git bisect good v5.15.149
> >> Bisecting: 1 revision left to test after this (roughly 1 step)
> >> [b9a61ee2bb2704e42516e3da962f99dfa98f3b20] drm/amdgpu: reset gpu for
> >> s3 suspend abort case
> >> # make
> >> # rm -rf /boot2
> >> # mkdir -p /boot2/lib/modules
> >> # make INSTALL_MOD_STRIP=1 INSTALL_MOD_PATH=/boot2 modules_install
> >> # cp arch/x86/boot/bzImage /boot2/vmlinuz
> >> # sync
> >> ...QV on Acer laptop, with amdgpu, works!
> >> # git bisect good
> >> Bisecting: 0 revisions left to test after this (roughly 0 steps)
> >> [56b522f4668167096a50c39446d6263c96219f5f] drm/amdgpu: init iommu
> >> after amdkfd device init
> >> # make
> >> # mkdir -p /boot2/lib/modules
> >> # make INSTALL_MOD_STRIP=1 INSTALL_MOD_PATH=/boot2 modules_install
> >> # cp arch/x86/boot/bzImage /boot2/vmlinuz
> >> # sync
> >> ...QV on Acer laptop, black screen!
> >>
> >> # git bisect bad
> >> 56b522f4668167096a50c39446d6263c96219f5f is the first bad commit
> >> commit 56b522f4668167096a50c39446d6263c96219f5f
> >> Author: Yifan Zhang <yifan1.zhang at amd.com>
> >> Date:   Tue Sep 28 15:42:35 2021 +0800
> >>
> >>      drm/amdgpu: init iommu after amdkfd device init
> >>
> >>      [ Upstream commit 286826d7d976e7646b09149d9bc2899d74ff962b ]
> >>
> >>      This patch is to fix clinfo failure in Raven/Picasso:
> >>
> >>      Number of platforms: 1
> >>        Platform Profile: FULL_PROFILE
> >>        Platform Version: OpenCL 2.2 AMD-APP (3364.0)
> >>        Platform Name: AMD Accelerated Parallel Processing
> >>        Platform Vendor: Advanced Micro Devices, Inc.
> >>        Platform Extensions: cl_khr_icd cl_amd_event_callback
> >>
> >>        Platform Name: AMD Accelerated Parallel Processing Number of devices: 0
> >>
> >>      Signed-off-by: Yifan Zhang <yifan1.zhang at amd.com>
> >>      Reviewed-by: James Zhu <James.Zhu at amd.com>
> >>      Tested-by: James Zhu <James.Zhu at amd.com>
> >>      Acked-by: Felix Kuehling <Felix.Kuehling at amd.com>
> >>      Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
> >>      Signed-off-by: Sasha Levin <sashal at kernel.org>
> >>
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++++----
> >>   1 file changed, 4 insertions(+), 4 deletions(-)
> >>
> >> Anything else I should do, to identify what in this commit is the
> >> likely culprit?
> >> Regards,
> >> Barry Kauler


More information about the amd-gfx mailing list