Kernel 5.15.150 black screen with AMD Raven/Picasso GPU

Barry Kauler bkauler at gmail.com
Thu May 23 13:13:00 UTC 2024


On Wed, May 22, 2024 at 12:58 AM Armin Wolf <W_Armin at gmx.de> wrote:
>
> Am 20.05.24 um 18:22 schrieb Alex Deucher:
>
> > On Sat, May 18, 2024 at 8:17 PM Armin Wolf <W_Armin at gmx.de> wrote:
> >> Am 17.05.24 um 03:30 schrieb Barry Kauler:
> >>
> >>> Armin, Yifan, Prike,
> >>> I will top-post, so you don't have to scroll down.
> >>> After identifying the commit that causes black screen with my gpu, I
> >>> posted the result to you guys, on May 9.
> >>> It is now May 17 and no reply.
> >>> OK, I have now created a patch that reverts Yifan's commit, compiled
> >>> 5.15.158, and my gpu now works.
> >>> Note, the radeon module is not loaded, so it is not a factor.
> >>> I'm not a kernel developer. I have identified the culprit and it is up
> >>> to you guys to fix it, Yifan especially, as you are the person who has
> >>> created the regression.
> >>> I will attach my patch.
> >>> Regards,
> >>> Barry Kauler
> >> Hi,
> >>
> >> sorry for not responding to your findings. I normally do not work with GPU drivers,
> >> so i hoped one of the amdgpu developers would handle this.
> >>
> >> I CCeddri-devel at lists.freedesktop.org  and amd-gfx at lists.freedesktop.org so that other
> >> amdgpu developers hear from this issue.
> >>
> >> Thanks you for you persistence in finding the offending commit.
> > Likely this patch should not have been ported to 5.15 in the first
> > place.  The IOMMU requirements have been dropped from the driver for
> > the last few kernel versions so it is no longer relevant on newer
> > kernels.
> >
> > Alex
>
> Barry, can you verify that the latest upstream kernel works on you device?
> If yes, then the commit itself is ok and just the backporting itself was wrong.
>
> Thanks,
> Armin Wolf

Armin,
The unmodified 6.8.1 kernel works ok.
I presume that patch was applied long before 6.8.1 got released and
only got backported to 5.15.x recently.

Regards,
Barry


> >> Armin Wolf
> >>
> >>> On Thu, May 9, 2024 at 4:08 PM Barry Kauler <bkauler at gmail.com> wrote:
> >>>> On Fri, May 3, 2024 at 9:03 PM Armin Wolf <W_Armin at gmx.de> wrote:
> >>>>>> ...
> >>>>>> # lspci | grep VGA
> >>>>>> 05:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
> >>>>>> [AMD/ATI] Picasso/Raven 2 [Radeon Vega Series / Radeon Vega Mobile
> >>>>>> Series] (rev c2)
> >>>>>> 05:00.7 Non-VGA unclassified device: Advanced Micro Devices, Inc.
> >>>>>> [AMD] Raven/Raven2/Renoir Non-Sensor Fusion Hub KMDF driver
> >>>>>>
> >>>>>> # lspci -n -k
> >>>>>> ...
> >>>>>> 05:00.0 0300: 1002:15d8 (rev c2)
> >>>>>> Subsystem: 1025:1456
> >>>>>> Kernel driver in use: amdgpu
> >>>>>> Kernel modules: amdgpu
> >>>>>> ...
> >>>>> thanks for informing us of this regression. Since there are four commits affecting
> >>>>> amdgpu in 5.15.150, i suggest that you use "git bisect" to find the faulty commits,
> >>>>> see https://docs.kernel.org/admin-guide/bug-bisect.html for details.
> >>>>>
> >>>>> I think you can speed up the bisecting process by limiting yourself to the AMD DRM
> >>>>> driver directory with "git bisect start -- drivers/gpu/drm/amd", take a look at the
> >>>>> man page of "git bisect" for details.
> >>>>>
> >>>>> Thanks,
> >>>>> Armin Wolf
> >>>> Armin,
> >>>> Thanks for the advice. I am unfamiliar with git on the commandline.
> >>>> Previously only used SmartGit gui.
> >>>> EasyOS requires aufs patch, and for a few days tried to figure out how
> >>>> to use that with git bisect, then gave up. Changed to testing with my
> >>>> "QV" distro, which is more conventional, doesn't need any kernel
> >>>> patches. Managed to get it down to one commit. Here are the steps I
> >>>> followed:
> >>>>
> >>>> # git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
> >>>> # cd linux-stable
> >>>> # git tag -l | grep '5\.15\.150'
> >>>> v5.15.150
> >>>> # git checkout -b my5.15.150 v5.15.150
> >>>> Updating files: 100% (65776/65776), done.
> >>>> Switched to a new branch 'my5.15.150'
> >>>>
> >>>> Copied in my .config then...
> >>>>
> >>>> # make menuconfig
> >>>> # git bisect start -- drivers/gpu/drm/amd
> >>>> # git bisect bad
> >>>> # git bisect good v5.15.149
> >>>> Bisecting: 1 revision left to test after this (roughly 1 step)
> >>>> [b9a61ee2bb2704e42516e3da962f99dfa98f3b20] drm/amdgpu: reset gpu for
> >>>> s3 suspend abort case
> >>>> # make
> >>>> # rm -rf /boot2
> >>>> # mkdir -p /boot2/lib/modules
> >>>> # make INSTALL_MOD_STRIP=1 INSTALL_MOD_PATH=/boot2 modules_install
> >>>> # cp arch/x86/boot/bzImage /boot2/vmlinuz
> >>>> # sync
> >>>> ...QV on Acer laptop, with amdgpu, works!
> >>>> # git bisect good
> >>>> Bisecting: 0 revisions left to test after this (roughly 0 steps)
> >>>> [56b522f4668167096a50c39446d6263c96219f5f] drm/amdgpu: init iommu
> >>>> after amdkfd device init
> >>>> # make
> >>>> # mkdir -p /boot2/lib/modules
> >>>> # make INSTALL_MOD_STRIP=1 INSTALL_MOD_PATH=/boot2 modules_install
> >>>> # cp arch/x86/boot/bzImage /boot2/vmlinuz
> >>>> # sync
> >>>> ...QV on Acer laptop, black screen!
> >>>>
> >>>> # git bisect bad
> >>>> 56b522f4668167096a50c39446d6263c96219f5f is the first bad commit
> >>>> commit 56b522f4668167096a50c39446d6263c96219f5f
> >>>> Author: Yifan Zhang <yifan1.zhang at amd.com>
> >>>> Date:   Tue Sep 28 15:42:35 2021 +0800
> >>>>
> >>>>       drm/amdgpu: init iommu after amdkfd device init
> >>>>
> >>>>       [ Upstream commit 286826d7d976e7646b09149d9bc2899d74ff962b ]
> >>>>
> >>>>       This patch is to fix clinfo failure in Raven/Picasso:
> >>>>
> >>>>       Number of platforms: 1
> >>>>         Platform Profile: FULL_PROFILE
> >>>>         Platform Version: OpenCL 2.2 AMD-APP (3364.0)
> >>>>         Platform Name: AMD Accelerated Parallel Processing
> >>>>         Platform Vendor: Advanced Micro Devices, Inc.
> >>>>         Platform Extensions: cl_khr_icd cl_amd_event_callback
> >>>>
> >>>>         Platform Name: AMD Accelerated Parallel Processing Number of devices: 0
> >>>>
> >>>>       Signed-off-by: Yifan Zhang <yifan1.zhang at amd.com>
> >>>>       Reviewed-by: James Zhu <James.Zhu at amd.com>
> >>>>       Tested-by: James Zhu <James.Zhu at amd.com>
> >>>>       Acked-by: Felix Kuehling <Felix.Kuehling at amd.com>
> >>>>       Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
> >>>>       Signed-off-by: Sasha Levin <sashal at kernel.org>
> >>>>
> >>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++++----
> >>>>    1 file changed, 4 insertions(+), 4 deletions(-)
> >>>>
> >>>> Anything else I should do, to identify what in this commit is the
> >>>> likely culprit?
> >>>> Regards,
> >>>> Barry Kauler


More information about the amd-gfx mailing list