[Intel-gfx] [RFC PATCH v3 02/19] KVM: x86: inhibit APICv/AVIC when the guest and/or host changes apic id/base from the defaults.

Mon May 23 06:50:24 UTC 2022

On Sun, 2022-05-22 at 07:47 -0700, Jim Mattson wrote:
> On Sun, May 22, 2022 at 2:03 AM Maxim Levitsky <mlevitsk at redhat.com> wrote:
> > On Thu, 2022-05-19 at 16:06 +0000, Sean Christopherson wrote:
> > > On Wed, Apr 27, 2022, Maxim Levitsky wrote:
> > > > Neither of these settings should be changed by the guest and it is
> > > > a burden to support it in the acceleration code, so just inhibit
> > > > it instead.
> > > > 
> > > > Also add a boolean 'apic_id_changed' to indicate if apic id ever changed.
> > > > 
> > > > Signed-off-by: Maxim Levitsky <mlevitsk at redhat.com>
> > > > ---
> > > > +           return;
> > > > +
> > > > +   pr_warn_once("APIC ID change is unsupported by KVM");
> > > 
> > > It's supported (modulo x2APIC shenanigans), otherwise KVM wouldn't need to disable
> > > APICv.
> > 
> > Here, as I said, it would be nice to see that warning if someone complains.
> > Fact is that AVIC code was totally broken in this regard, and there are probably more,
> > so it would be nice to see if anybody complains.
> > 
> > If you insist, I'll remove this warning.
> 
> This may be fine for a hobbyist, but it's a terrible API in an
> enterprise environment. To be honest, I have no way of propagating
> this warning from /var/log/messages on a particular host to a
> potentially impacted customer. Worse, if they're not the first
> impacted customer since the last host reboot, there's no warning to
> propagate. I suppose I could just tell every later customer, "Your VM
> was scheduled to run on a host that previously reported, 'APIC ID
> change is unsupported by KVM.' If you notice any unusual behavior,
> that might be the reason for it," but that isn't going to inspire
> confidence. I could schedule a drain and reboot of the host, but that
> defeats the whole point of the "_once" suffix.

Mostly agree, and I read alrady few discussions about exactly this,
those warnings are mostly useless, but they are used in the
cases where we don't have the courage to just exit with KVM_EXIT_INTERNAL_ERROR.

I do not thing though that the warning is completely useless, 
as we often have the kernel log of the target machine when things go wrong, 
so *we* can notice it.
In other words a kernel warning is mostly useless but better that nothing.

About KVM_EXIT_WARNING, this is IMHO a very good idea, probably combined
with some form of taint flag, which could be read by qemu and then shown
over hmp/qmp interfaces.

Best regards,
	Maxim levitsky

> 
> I know that there's a long history of doing this in KVM, but I'd like
> to ask that we:
> a) stop piling on
> b) start fixing the existing uses
> 
> If KVM cannot emulate a perfectly valid operation, an exit to
> userspace with KVM_EXIT_INTERNAL_ERROR is warranted. Perhaps for
> operations that we suspect KVM might get wrong, we should have a new
> userspace exit: KVM_EXIT_WARNING?
> 
> I'm not saying that you should remove the warning. I'm just asking
> that it be augmented with a direct signal to userspace that KVM may no
> longer be reliable.
>