[RFC PATCH 0/3] KVM: x86: honor guest memory type

Jim Mattson jmattson at google.com
Fri Feb 14 21:56:48 UTC 2020


On Fri, Feb 14, 2020 at 1:47 PM Chia-I Wu <olvaffe at gmail.com> wrote:
>
> On Fri, Feb 14, 2020 at 11:52 AM Sean Christopherson
> <sean.j.christopherson at intel.com> wrote:
> >
> > On Fri, Feb 14, 2020 at 11:26:06AM +0100, Paolo Bonzini wrote:
> > > On 13/02/20 23:18, Chia-I Wu wrote:
> > > >
> > > > The bug you mentioned was probably this one
> > > >
> > > >   https://bugzilla.kernel.org/show_bug.cgi?id=104091
> > >
> > > Yes, indeed.
> > >
> > > > From what I can tell, the commit allowed the guests to create cached
> > > > mappings to MMIO regions and caused MCEs.  That is different than what
> > > > I need, which is to allow guests to create uncached mappings to system
> > > > ram (i.e., !kvm_is_mmio_pfn) when the host userspace also has uncached
> > > > mappings.  But it is true that this still allows the userspace & guest
> > > > kernel to create conflicting memory types.
> >
> > This is ok.
> >
> > > Right, the question is whether the MCEs were tied to MMIO regions
> > > specifically and if so why.
> >
> > 99.99999% likelihood the answer is "yes".  Cacheable accesses to non-existent
> > memory and most (all?) MMIO regions will cause a #MC.  This includes
> > speculative accesses.
> >
> > Commit fd717f11015f ("KVM: x86: apply guest MTRR virtualization on host
> > reserved pages") explicitly had a comment "1. MMIO: trust guest MTRR",
> > which is basically a direct avenue to generating #MCs.
> >
> > IIRC, WC accesses to non-existent memory will also cause #MC, but KVM has
> > bigger problems if it has PRESENT EPTEs pointing at garbage.
> >
> > > An interesting remark is in the footnote of table 11-7 in the SDM.
> > > There, for the MTRR (EPT for us) memory type UC you can read:
> > >
> > >   The UC attribute comes from the MTRRs and the processors are not
> > >   required to snoop their caches since the data could never have
> > >   been cached. This attribute is preferred for performance reasons.
> > >
> > > There are two possibilities:
> > >
> > > 1) the footnote doesn't apply to UC mode coming from EPT page tables.
> > > That would make your change safe.
> > >
> > > 2) the footnote also applies when the UC attribute comes from the EPT
> > > page tables rather than the MTRRs.  In that case, the host should use
> > > UC as the EPT page attribute if and only if it's consistent with the host
> > > MTRRs; it would be more or less impossible to honor UC in the guest MTRRs.
> > > In that case, something like the patch below would be needed.
> >
> > (2), the EPTs effectively replace the MTRRs.  The expectation being that
> > the VMM will use always use EPT memtypes consistent with the MTRRs.
> This is my understanding as well.
>
> > > It is not clear from the manual why the footnote would not apply to WC; that
> > > is, the manual doesn't say explicitly that the processor does not do snooping
> > > for accesses to WC memory.  But I guess that must be the case, which is why I
> > > used MTRR_TYPE_WRCOMB in the patch below.
> >
> > A few paragraphs below table 11-12 states:
> >
> >   In particular, a WC page must never be aliased to a cacheable page because
> >   WC writes may not check the processor caches.
> >
> > > Either way, we would have an explanation of why creating cached mapping to
> > > MMIO regions would, and why in practice we're not seeing MCEs for guest RAM
> > > (the guest would have set WB for that memory in its MTRRs, not UC).
> >
> > Aliasing (physical) RAM with different memtypes won't cause #MC, just
> > memory corruption.
>
> What we need potentially gives the userspace (the guest kernel, to be
> exact) the ability to create conflicting memory types.  If we can be
> sure that the worst scenario is for a guest to corrupt its own memory,
> by only allowing aliases on physical ram, that seems alright.
>
> AFAICT, it is currently allowed on ARM (verified) and AMD (not
> verified, but svm_get_mt_mask returns 0 which supposedly means the NPT
> does not restrict what the guest PAT can do).  This diff would do the
> trick for Intel without needing any uapi change:

I would be concerned about Intel CPU errata such as SKX40 and SKX59.


More information about the dri-devel mailing list