[Spice-devel] [Qemu-devel] viewing continuous guest virtual memory as continuous in qemu

Mon Oct 3 01:49:16 PDT 2011

On Mon, Oct 03, 2011 at 10:37:55AM +0200, Alon Levy wrote:
> On Mon, Oct 03, 2011 at 10:17:59AM +0200, Yonit Halperin wrote:
> > On 10/02/2011 03:24 PM, Alon Levy wrote:
> > >Hi,
> > >
> > >  I'm trying to acheive the $subject. Some background: currently spice relies on a preallocated pci bar for both surfaces and for VGA framebuffer + commands. I have been trying to get rid of the surfaces bar. To do that I allocate memory in the guest and then translate it for spice-server consumption using cpu_physical_memory_map.
> > >
> > >  AFAIU this works only when the guest allocates a continuous range of physical pages. This is a large requirement from the guest, which I'd like to drop. So I would like to have the guest use a regular allocator, generating for instance two sequential pages in virtual memory that are scattered in physical memory. Those two physical guest page addresses (gp1 and gp2) correspond to two host virtual memory addresses (hv1, hv2). I would now like to provide to spice-server a single virtual address p that maps to those two pages in sequence. I don't want to handle my own scatter-gather list, I would like to have this mapping done once so I can use an existing library that requires a single pointer (for instance pixman or libGL) to do the rendering.
> > >
> > >  Is there any way to acheive that without host kernel support, in user space, i.e. in qemu? or with an existing host kernel device?
> > >
> > >  I'd appreciate any help,
> > >
> > >Alon
> > >_______________________________________________
> > >Spice-devel mailing list
> > >Spice-devel at lists.freedesktop.org
> > >http://lists.freedesktop.org/mailman/listinfo/spice-devel
> > 
> > Hi,
> > won't there be an overhead for rendering on a non continuous
> > surface? Will it be worthwhile comparing to not creating the
> > surface?
> 
> If I use a scatter-gather list there is overhead of allocating and
> copying the surface whenever I want to synchronize. Minimally once
> to copy from guest to host, and another copy from host to guest
> for any update_area. (we can only copy the required area.
> 
> If I use page remapping like remap_file_pages does, I don't think
> there is any overhead for rendering. There is overhead for doing
> the remap_file_pages calls, but they are minimal (or so the man page
> says). I should benchmark this.
> 
> The additional cost is not large - I suppose rendering should be more
> costly then a memcpy. But the question is true regardless of this -
> some surfaces should be punted probably, if we had an oracle to know they
> would be immediately update_area'ed and destroyed.

Actually we could delay pushing the commands to the server until there is a
command that relies on this surface, or an update_area occours. If update_area
happens first, the surface has not been created yet (we only need to store
the commands), and we can do the whole thing on the guest - we can't punt, since
it is too late, but we can create the gdi surface ourselves and replay all
the commands. If a command relying on this surface happens first (BitBlt to
another surface), then we push all the commands to the server. This would play well
with making the command ring hold bunches like the release ring already does, instead
of individual commands.

> 
> > 
> > BTW. We should test if the split to vram (surfaces) and devram
> > (commands and others) is more efficient than having one section.
> > Even if it is more efficient, we can remove the split and give to
> > the surfaces higher allocation priority on a part of the pci bar.
> > Anyway, by default, we can try allocating surfaces on the guest RAM.
> > If it fails, we can try to allocate on the pci-bar.
> > 
> 
> Right. What I was aiming at is removing the BAR all together. This reduces
> per vm allocation, and we can still ensure a maximum via the driver. It
> also reduces PCI requirements, which are a problem with more then one card.
> 
> Actually the more productive thing for reducing PCI memory would be to change
> to a single card for multiple monitor support. Another reason for allocating
> on guest RAM is to make migration simpler (but I'm not sure it really is).
> 
> > Cheers,
> > Yonit.
> > 
> > 
> _______________________________________________
> Spice-devel mailing list
> Spice-devel at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/spice-devel