hyper_bf soft lockup on Azure Gen2 VM when taking kdump or executing kexec

Michael Kelley mhklinux at outlook.com
Mon Feb 3 21:08:04 UTC 2025


From: Thomas Tai <thomas.tai at oracle.com> Sent: Thursday, January 30, 2025 12:44 PM
> 
> > -----Original Message-----
> > From: Michael Kelley <mhklinux at outlook.com>
> > Sent: Thursday, January 30, 2025 3:20 PM
> > To: Thomas Tai <thomas.tai at oracle.com>; mhkelley58 at gmail.com;
> > haiyangz at microsoft.com; wei.liu at kernel.org; decui at microsoft.com;
> > drawat.floss at gmail.com; javierm at redhat.com; Helge Deller
> > <deller at gmx.de>; daniel at ffwll.ch; airlied at gmail.com;
> > tzimmermann at suse.de
> > Cc: dri-devel at lists.freedesktop.org; linux-fbdev at vger.kernel.org; linux-
> > kernel at vger.kernel.org; linux-hyperv at vger.kernel.org
> > Subject: RE: hyper_bf soft lockup on Azure Gen2 VM when taking kdump or
> > executing kexec
> >
> > From: Thomas Tai <thomas.tai at oracle.com> Sent: Thursday, January 30,
> > 2025 10:50 AM
> > >
> > > Sorry for the typo in the subject title. It should have been 'hyperv_fb soft lockup on
> > > Azure Gen2 VM when taking kdump or executing kexec'
> > >
> > > Thomas
> > >
> > > >
> > > > Hi Michael,
> > > >
> > > > We see an issue with the mainline kernel on the Azure Gen 2 VM when
> > > > trying to induce a kernel panic with sysrq commands. The VM would hang
> > > > with soft lockup. A similar issue happens when executing kexec on the VM.
> > > > This issue is seen only with Gen2 VMs(with UEFI boot). Gen1 VMs with bios
> > > > boot are fine.
> > > >
> > > > git bisect identifies the issue is cased by the commit 20ee2ae8c5899
> > > > ("fbdev/hyperv_fb: Fix logic error for Gen2 VMs in hvfb_getmem()" ).
> > > > However, reverting the commit would cause the frame buffer not to work
> > > > on the Gen2 VM.
> > > >
> > > > Do you have any hints on what caused this issue?
> > > >
> > > > To reproduce the issue with kdump:
> > > > - Install mainline kernel on an Azure Gen 2 VM and trigger a kdump
> > > > - echo 1 > /proc/sys/kernel/sysrq
> > > > - echo c > /proc/sysrq-trigger
> > > >
> > > > To reproduce the issue with executing kexec:
> > > > - Install mainline kernel on Azure Gen 2 VM and use kexec
> > > > - sudo kexec -l /boot/vmlinuz --initrd=/boot/initramfs.img --command-
> > > > line="$( cat /proc/cmdline )"
> > > > - sudo kexec -e
> > > >
> > > > Thank you,
> > > > Thomas
> >
> > I will take a look, but it might be early next week before I can do so.
> >
> 
> Thank you, Michael for your help!
> 
> > It looks like your soft lockup log below is from the kdump kernel (or the newly
> > kexec'ed kernel). Can you confirm? Also, this looks like a subset of the full log.
> 
> Yes, the soft lockup log below is from the kdump kernel.
> 
> > Do you have the full serial console log that you could email to me?  Seeing
> > everything might be helpful. Of course, I'll try to repro the problem myself
> > as well.
> 
> I have attached the complete bootup and kdump kernel log.
> 
> File: bootup_and_kdump.log
> Line 1 ... 984 (bootup log)
> Line 990       (kdump kernel booting up)
> Line 1351      (soft lockup)
> 
> Thank you,
> Thomas
> 

I have reproduced the problem in an Azure VM running Oracle Linux
9.4 with the 6.13.0 kernel. Interestingly, the problem does not occur
in a VM running on a locally installed Hyper-V with Ubuntu 20.04 and
the 6.13.0 kernel. There are several differences in the two
environments:  the version of Hyper-V, the VM configuration, the Linux
distro, and the .config file used to build the 6.13.0 kernel. I'll try to
figure out what make the difference, and then the root cause.

Michael


More information about the dri-devel mailing list