[RFC PATCH 0/3] KVM: x86: honor guest memory type

Chia-I Wu olvaffe at gmail.com
Fri Feb 14 21:47:08 UTC 2020


On Fri, Feb 14, 2020 at 11:52 AM Sean Christopherson
<sean.j.christopherson at intel.com> wrote:
>
> On Fri, Feb 14, 2020 at 11:26:06AM +0100, Paolo Bonzini wrote:
> > On 13/02/20 23:18, Chia-I Wu wrote:
> > >
> > > The bug you mentioned was probably this one
> > >
> > >   https://bugzilla.kernel.org/show_bug.cgi?id=104091
> >
> > Yes, indeed.
> >
> > > From what I can tell, the commit allowed the guests to create cached
> > > mappings to MMIO regions and caused MCEs.  That is different than what
> > > I need, which is to allow guests to create uncached mappings to system
> > > ram (i.e., !kvm_is_mmio_pfn) when the host userspace also has uncached
> > > mappings.  But it is true that this still allows the userspace & guest
> > > kernel to create conflicting memory types.
>
> This is ok.
>
> > Right, the question is whether the MCEs were tied to MMIO regions
> > specifically and if so why.
>
> 99.99999% likelihood the answer is "yes".  Cacheable accesses to non-existent
> memory and most (all?) MMIO regions will cause a #MC.  This includes
> speculative accesses.
>
> Commit fd717f11015f ("KVM: x86: apply guest MTRR virtualization on host
> reserved pages") explicitly had a comment "1. MMIO: trust guest MTRR",
> which is basically a direct avenue to generating #MCs.
>
> IIRC, WC accesses to non-existent memory will also cause #MC, but KVM has
> bigger problems if it has PRESENT EPTEs pointing at garbage.
>
> > An interesting remark is in the footnote of table 11-7 in the SDM.
> > There, for the MTRR (EPT for us) memory type UC you can read:
> >
> >   The UC attribute comes from the MTRRs and the processors are not
> >   required to snoop their caches since the data could never have
> >   been cached. This attribute is preferred for performance reasons.
> >
> > There are two possibilities:
> >
> > 1) the footnote doesn't apply to UC mode coming from EPT page tables.
> > That would make your change safe.
> >
> > 2) the footnote also applies when the UC attribute comes from the EPT
> > page tables rather than the MTRRs.  In that case, the host should use
> > UC as the EPT page attribute if and only if it's consistent with the host
> > MTRRs; it would be more or less impossible to honor UC in the guest MTRRs.
> > In that case, something like the patch below would be needed.
>
> (2), the EPTs effectively replace the MTRRs.  The expectation being that
> the VMM will use always use EPT memtypes consistent with the MTRRs.
This is my understanding as well.

> > It is not clear from the manual why the footnote would not apply to WC; that
> > is, the manual doesn't say explicitly that the processor does not do snooping
> > for accesses to WC memory.  But I guess that must be the case, which is why I
> > used MTRR_TYPE_WRCOMB in the patch below.
>
> A few paragraphs below table 11-12 states:
>
>   In particular, a WC page must never be aliased to a cacheable page because
>   WC writes may not check the processor caches.
>
> > Either way, we would have an explanation of why creating cached mapping to
> > MMIO regions would, and why in practice we're not seeing MCEs for guest RAM
> > (the guest would have set WB for that memory in its MTRRs, not UC).
>
> Aliasing (physical) RAM with different memtypes won't cause #MC, just
> memory corruption.

What we need potentially gives the userspace (the guest kernel, to be
exact) the ability to create conflicting memory types.  If we can be
sure that the worst scenario is for a guest to corrupt its own memory,
by only allowing aliases on physical ram, that seems alright.

AFAICT, it is currently allowed on ARM (verified) and AMD (not
verified, but svm_get_mt_mask returns 0 which supposedly means the NPT
does not restrict what the guest PAT can do).  This diff would do the
trick for Intel without needing any uapi change:

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e3394c839dea..88af11cc551a 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6905,12 +6905,6 @@ static u64 vmx_get_mt_mask(struct kvm_vcpu
*vcpu, gfn_t gfn, bool is_mmio)
                goto exit;
        }

-       if (!kvm_arch_has_noncoherent_dma(vcpu->kvm)) {
-               ipat = VMX_EPT_IPAT_BIT;
-               cache = MTRR_TYPE_WRBACK;
-               goto exit;
-       }
-
        if (kvm_read_cr0(vcpu) & X86_CR0_CD) {
                ipat = VMX_EPT_IPAT_BIT;
                if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED))



> > One thing you didn't say: how would userspace use KVM_MEM_DMA?  On which
> > regions would it be set?
> >
> > Thanks,
> >
> > Paolo
> >
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index dc331fb06495..2be6f7effa1d 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -6920,8 +6920,16 @@ static u64 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
> >       }
> >
> >       cache = kvm_mtrr_get_guest_memory_type(vcpu, gfn);
> > -
> >  exit:
> > +     if (cache == MTRR_TYPE_UNCACHABLE && !is_mmio) {
> > +             /*
> > +              * We cannot set UC in the EPT page tables as it can cause
> > +              * machine check exceptions (??).  Hopefully the guest is
> > +              * using PAT.
> > +              */
> > +             cache = MTRR_TYPE_WRCOMB;
>
> This is unnecessary.  Setting UC in the EPT tables will never directly lead
> to #MC.  Forcing WC is likely more dangerous.
>
> > +     }
> > +
> >       return (cache << VMX_EPT_MT_EPTE_SHIFT) | ipat;
> >  }
> >
> >


More information about the dri-devel mailing list