hyper_bf soft lockup on Azure Gen2 VM when taking kdump or executing kexec

Tue Feb 11 19:45:48 UTC 2025

From: Maxim Levitsky <mlevitsk at redhat.com> Sent: Monday, February 10, 2025 3:57 PM
> 
> On Mon, 2025-02-10 at 21:35 +0000, Michael Kelley wrote:
> > From: thomas.tai at oracle.com <thomas.tai at oracle.com> Sent: Monday, February 10, 2025 7:08 AM
> > > <snip>
> > >
> > > > > Then the question is why the efifb driver doesn't work in the kdump
> > > > > kernel. Actually, it *does* work in many cases. I built the 6.13.0 kernel
> > > > > on the Oracle Linux 9.4 system, and transferred the kernel image binary
> > > > > and module binaries to an Ubuntu 20.04 VM in Azure. In that VM, the
> > > > > efifb driver is loaded as part of the kdump kernel, and it doesn't cause
> > > > > any problems. But there's an interesting difference. In the Oracle Linux
> > > > > 9.4 VM, the efifb driver finds the framebuffer at 0x40000000, while on
> > > > > the Ubuntu 20.04 VM, it finds the framebuffer at 0x40900000. This
> > > > > difference is due to differences in how the screen_info variable gets
> > > > > setup in the two VMs.
> > > > >
> > > > > When the normal kernel starts in a freshly booted VM, Hyper-V provides
> > > > > the EFI framebuffer at 0x40000000, and it works. But after the Hyper-V
> > > > > FB driver or Hyper-V DRM driver has initialized, Linux has picked a
> > > > > different MMIO address range and told Hyper-V to use the new
> > > > > address range (which often starts at 0x40900000). A kexec does *not*
> > > > > reset Hyper-V's transition to the new range, so when the efifb driver
> > > > > tries to use the framebuffer at 0x40000000, the accesses trap to
> > > > > Hyper-V and probably fail or timeout (I'm not sure of the details). After
> > > > > the guest does some number of these bad references, Hyper-V considers
> > > > > itself to be under attack from an ill-behaving guest, and throttles the
> > > > > guest so that it doesn't run for a few seconds. The throttling repeats,
> > > > > and results in extremely slow running in the kdump kernel.
> > > > >
> > > > > Somehow in the Ubuntu 20.04 VM, the location of the frame buffer
> > > > > as stored in screen_info.lfb_base gets updated to be 0x40900000. I
> > > > > haven't fully debugged how that happens. But with that update, the
> > > > > efifb driver is using the updated framebuffer address and it works. On
> > > > > the Oracle Linux 9.4 system, that update doesn't appear to happen,
> > > > > and the problem occurs.
> > > > >
> > > > > This in an interim update on the problem. I'm still investigating how
> > > > > screen_info.lfb_base is set in the kdump kernel, and why it is different
> > > > > in the Ubuntu 20.04 VM vs. in the Oracle Linux 9.4 VM. Once that is
> > > > > well understood, we can contemplate how to fix the problem. Undoing
> > > > > the revert that is commit 2bebc3cd4870 doesn't seem like the solution
> > > > > since the original code there was reported to cause many other issues.
> > > > > The solution focus will likely be on how to ensure the kdump kernel gets
> > > > > the correct framebuffer address so the efifb driver works, since the
> > > > > framebuffer address changing is a quirk of Hyper-V behavior.
> > > > >
> > > > > If anyone else has insight into what's going on here, please chime in.
> > > > > What I've learned so far is still somewhat tentative.
> > > > >
> > > > Here's what is happening. On Ubuntu 20.04, the kdump image is
> > > > loaded into crash memory using the kexec command. Ubuntu 20.04
> > > > has kexec from the kexec-tools package version 2.0.18-1ubuntu1.1,
> > > > and per the kexec man page, it defaults to using the older kexec_load()
> > > > system call. When using kexec_load(), the contents to be loaded into
> > > > crash memory is constructed in user space by the kexec command.
> > > > The kexec command gets the "screen_info" settings, including the
> > > > physical address of the frame buffer, via the FBIOGET_FSCREENINFO
> > > > ioctl against /dev/fb0. The Hyper-V FB or DRM driver registers itself
> > > > with the fbdev subsystem so that it is /dev/fb0, and the ioctl returns
> > > > the updated framebuffer address. So the efifb driver loads and runs
> > > > correctly.
> > > >
> > > > On Oracle Linux 9.4, the kdump image is also loaded with the
> > > > kexec command, but from kexec-tools package version
> > > > kexec-tools-2.0.28-1.0.10.el9_5.x86_64, which is slightly later than
> > > > the version on Ubuntu 20.04. This newer kexec defaults to using the
> > > > newer kexec_file_load() system call. This system call gets the
> > > > framebuffer address from the screen_info variable in the kernel, which
> > > > has not been updated to reflect the new framebuffer address. Hence
> > > > in the kdump kernel, the efifb driver uses the old framebuffer address,
> > > > and hence the problem.
> > > >
> > > > To further complicate matters, the kexec on Oracle Linux 9.4 seems to
> > > > have a bug when the -c option forces the use of kexec_load() instead
> > > > of kexec_file_load(). As an experiment, I modified the kdumpctl shell
> > > > script to add the "-c" option to kexec, but in that case the value "0x0"
> > > > is passed as the framebuffer address, which is wrong. Furthermore,
> > > > the " screen_info.orig_video_isVGA" value (which I mentioned earlier
> > > > in connection with commit 2bebc3cd4870) is also set to 0, so the
> > > > kdump kernel no longer thinks it has an EFI framebuffer. Hence the
> > > > efifb driver isn't loaded, and the kdump works, though for the wrong
> > > > reasons. If kexec 2.0.18 from Ubuntu is copied onto the Oracle Linux 9.4
> > > > VM, then kdump works as expected, with the efifb driver being loaded
> > > > and using the correct framebuffer address. So something is going wrong
> > > > with kexec 2.0.28 in how it sets up the screen_info when the -c option
> > > > is used. I'll leave the debugging of the kexec bug to someone else.
> > >
> > > Hi Michael,
> > >
> > > Do you think we need to handle Azure Gen2 VM differently in the kexec?
> > >
> > > Or should we change the kexec_file_load() system call to retrieve the correct
> > > framebuffer address?
> >
> > I'm thinking there may be a fix in the Hyper-V FB and Hyper-V DRM drivers.
> > Commit c25a19afb81c may also be a cause of the problem -- see precursor
> > commit 3cb73bc3fa2a, which describes exactly the problem. I still need to
> > do some testing, but without that commit, kdump won't detect that it has
> > an EFI framebuffer, won't load the efifb driver, and so won't encounter the
> > problem. But we probably need to get Thomas Zimmerman to weigh in on
> > the implications of reverting c25a19afb81c.
> >
> > There's one additional variation of the problem. Assume the Hyper-V FB
> > driver is loaded (for example) during boot and moves the framebuffer. Then
> > system runs kexec as part of arming kdump during the boot sequence.
> > The most recent location of the framebuffer (and whether it is an EFI framebuffer)
> > gets picked at the time kexec runs, and is stored in the crash kernel memory area.
> > But what if the framebuffer later moves, perhaps because the Hyper-V FB driver
> > is unbound? The crash kernel memory area doesn’t get updated and kdump
> > could still have the wrong framebuffer address. This anomaly argues for the
> > commit 3cb73bc3fa2a approach of just ensuring that the efifb driver doesn't
> > load. Of course that approach means that the kdump kernel *must* contain
> > either the Hyper-V FB or Hyper-V DRM driver in order to work on a system
> > with only a framebuffer for text output. The efifb driver won't work. But
> > perhaps that's OK.
> >
> > Changing kexec (or the invoking script) to special case Hyper-V Gen 2 VMs and
> > always use kexec_load() instead of kexec_file_load() sounds like a big hack
> > to me.  And with that approach, you give up the ability to enforce loading only
> > properly signed kdump images. This is something kexec_file_load() provides
> > that kexec_load() doesn't, and is one of the main reasons that kexec_file_load()
> > was added.
> >
> > Whether the kexec_file_load() system call could be enhanced to get the
> > frame buffer information from the /dev/fb0 device, I'm not sure. That might
> > be a reasonable approach, though it still has the problem that the framebuffer
> > address could change *after* kexec_file_load() runs.
> >
> > Anyway, that's a dump of my current thoughts. I haven't reached a final
> > conclusion or recommendation yet. Comments from others on the
> > thread are welcome.
> 
> Hi!
> 
> Asking because I also had to do some digging in this area:
> 
> Do you think that the kernel can *ask* the hypervisor where the framebuffer is instead
> of relying on bios, the bootloader and/or kexec to somehow provide this information?
> 
> If hyperv doesn't provide this API, how hard it would be in your opinion to provide it?
> 
> I am asking because, I also had to debug a RHEL downstream issue where a slightly
> botched backport
> ensured that the first stage of the compressed uefi boot image, stopped passing the
> 'screen_info'
> to the second stage (the kernel itself), and as a result of this, the second stage stopped
> loading simplefb, and as a result of *this*, the PCI driver started to try to use the
> framebuffer
> range for its own use which failed and resulted in a cryptic error.
> 
> If the kernel was to just issue some form of a hypercall to ask the hypervisor where the
> framebuffer currently is,
> we could avoid a whole class of bugs similar to this.
> What do you think?
> 

I'm not aware of a way to ask Hyper-V about the framebuffer location.
I had not previously thought about such a possibility, so it's worth
thinking through. Here's how I see it: The issue is with generic drivers like
efifb (and others) that are hardcoded to read screen_info.lfb_base to
find the framebuffer. So the proposed new hypercall would need to be
made relatively early during boot, and it would update screen_info.lfb_base
to reflect the current location of the framebuffer. Hypercalls can only be
made after the setup in hyperv_init() is done. Fortunately, that's probably
before any framebuffer driver would read screen_info.lfb_base, though I'm
not completely sure.

Another factor is that the Hyper-V framebuffer is provided by the QEMU
equivalent that's embedded in the overall Hyper-V host, and not by
the hypervisor itself. The framebuffer is a VMBus device. So the Hyper-V
people would probably want getting the framebuffer location to be a
VMBus message to the framebuffer device, not a hypercall. And the VMBus
machinery isn't setup up until later -- too late, in fact, to change
screen_info.lfb_base before some generic driver reads it. So that's likely
to be a problem with the idea, though I'm speculating on what the 
Hyper-V folks would say.

The last factor is getting Hyper-V to add the feature. Somebody on the
Microsoft side would need to carry that request to the Hyper-V team.
I'm former Microsoft, but retired 1+ years ago, so I'm now just an unpaid
hobbyist contributing to the kernel because I enjoy the challenge. :-) But
I no longer have the Microsoft insider connection to the Hyper-V team.
From my past experience, getting such features added is hard, and takes
a long time (years?) to get implemented and rolled out across the Azure
fleet, unless there's some critical issue that needs to be addressed. This
kdump problem probably doesn't reach that level of criticality.

So I'm not super optimistic about the idea. But maybe your thinking
is different from what I've laid out. I'm happy to hear further discussion.

Michael