[Intel-gfx] [PATCH V6] drm/i915: Disable stolen memory when i915 runs in guest vm

Zhang, Xiong Y xiong.y.zhang at intel.com
Thu Apr 27 05:54:58 UTC 2017


> + David and Jon
> 
> On ti, 2017-04-25 at 18:34 +0800, Xiong Zhang wrote:
> 
> The blocking issue I see is that bisecting is still not pointing at
> relevant commits. Both bisected commits from Bugzilla are not related
> to changes in stolen memory usage behavior. I'd assume a successful
> bisect to land at the patches where we start creating kernel internal
> objects from stolen memory. Otherwise we could be ignoring a bug
> elsewhere. If it consistently lands on those patches, then there might
> be something wrong with them, in addition to stolen memory problems.
[Zhang, Xiong Y] I only try kernel 4.8 and 4.9 above, as the bugzilla descripted,
guest 4.8 kernel doesn't see gpu hang in guest dmesg, 4.9 kernel has gpu hang
in guest dmesg. From this point, we could do git bisect.
But tons of IOMMU DMA R/W exception to stolen memory exist in host dmesg 
when guest kernel is 4.8 and 4.9. This means guest domain iommu table doesn't 
have mapping for stolen memory and IGD fail in accessing stolen memory 
from guest kernel 4.8 and 4.9. From this point, this issue isn't a regression and
shouldn't go git bisect. You could check this host error message from the bugzilla 
attachment. And this should be fixed first.
Anyway, I will try my best to get the ideal commit through git bisect, but I'm afraid
the result is the same as past because we don't have a stable good point to start git
bisect.

> Disabling power saving makes many bugs go away, but we still don't
> disable power saving as a resolution to such bugs, but instead root
> cause and fix the individual bugs.
[Zhang, Xiong Y] I add i915.enable_rc6=0, i915.enable_dc=0, i915.enable_fbc=0,
I915.enable_psr=0, i915.disable_power_well=0,i915.enable_ips=0 to grub.
But gpu hang exist in guest and DMA R/W error exist in host.
> 
> > Stolen memory isn't a standard pci resource and exists in RMRR which has
> > identity mapping in iommu table when host boot up, so IGD could access
> > stolen memory in host OS. While according to 'commit c875d2c1b808
> > ("iommu/vt-d: Exclude devices using RMRRs from IOMMU API
> domains")',RMRR
> > isn't supported by kvm, then both EPT and guest iommu domain table lack
> > of maaping for stolen memory in kvm IGD passthrough environment.
> 
> Commit message text still fails to address that an exclusion was added
> by commit:
> 
> commit 18436afdc11a00ac881990b454cfb2eae81d6003
> Author: David Woodhouse <David.Woodhouse at intel.com>
> Date:   Wed Mar 25 15:05:47 2015 +0000
> 
>     iommu/vt-d: Allow RMRR on graphics devices too
> 
>     Commit c875d2c1 ("iommu/vt-d: Exclude devices using RMRRs from
> IOMMU API
>     domains") prevents certain options for devices with RMRRs. This even
>     prevents those devices from getting a 1:1 mapping with 'iommu=pt',
>     because we don't have the code to handle *preserving* the RMRR
> regions
>     when moving the device between domains.
> 
> <SNIP>
> 
> The quoted part of David's commit message leads me to believe it's
> simply lack of some code in kernel for juggling the RMRRs when moving a
> device between domains that is missing. Why is not that considered
> instead? With that implemented, we would have more transparent pass-
> through, which should be good.
[Zhang, Xiong Y] c875d2c1 ("iommu/vt-d: Exclude devices using RMRRs from
IOMMU API domains). This patch prevent devices associated with RMRRs from
assigning to a guest, the one of reason is it knows RMRR isn't supported in guest 
domain IOMMU table, If these device's driver still access RMRR from guest, 
serious error will happen.
18436afdc ("iommu/vt-d: Allow RMRR on graphics devices too "), add an exception
to above commit. So IGD could be assigned to a guest. But this doesn't mean IGD
1:1 mapping for RMRR will be support in guest domain iommu table
'iommu=pt' is to set 1:1 mapping for all pci device in host domain iommu table.

When one device is assigned to a guest and this guest boot up, this guest domain
Iommu table will take place of host domain iommu table on hardware. Our issue
is guest domain iommu table doesn't have 1:1 mapping for RMRR.
In order to set up 1:1 mapping for RMRR in guest domain iommu table, we have
to modify kvm and qemu and kvm community have declined this.
> 
> Also, was fixing the IGD driver loading with zero stolen memory
> considered instead? All this information should exist in the commit
> message.
[Zhang, Xiong Y] IGD and i915 driver read pci config register 0x50 to get 
the size of stolen memory. When guest read this register, qemu could trap
it and return one value to guest.
So in order to  " fixing the IGD driver loading with zero stolen memory ",
We have to modify both Qemu and IGD driver:
1) QEMU: trap read from pci cfg 0x50 register, then return zero to guest
2) IGD driver: when IGD driver see zero size of stolen memory, don't exit loading
and continue.
This doesn't give any benefit to i915, i915 will still disable stolen memory as i915
see zero size stolen memory . So I prefer to disable stolen memory in i915 directly 
and keep Qemu and IGD driver unchanged. 
> 
> After the bisecting is properly done, there is an agreement that
> suggested RMRR preservation is absolutely a no-go, other options are
> not viable, the commit message should be updated to reflect all that.
> Then we should look in more detail on how to detect the scenarios when
> we're running in a virtual machine that doesn't set up the 1:1 mapping
> for RMRRs.
[Zhang, Xiong Y] Sure, I will do this once we have an agreement.
I really need the help from others who could correct me if I am wrong.
> 
> Regards, Joonas
> --
> Joonas Lahtinen
> Open Source Technology Center
> Intel Corporation


More information about the Intel-gfx mailing list