[Intel-gfx] [PATCH v3 3/8] drm/i915: Partition the fence registers for vGPU in i915 driver

Thu Dec 18 00:08:02 PST 2014

On Thu, Dec 18, 2014 at 12:36:29AM +0000, Tian, Kevin wrote:
> > From: Daniel Vetter
> > Sent: Thursday, December 18, 2014 1:10 AM
> > 
> > On Wed, Dec 17, 2014 at 11:50:13AM +0000, Tvrtko Ursulin wrote:
> > >
> > > On 12/17/2014 11:25 AM, Yu, Zhang wrote:
> > > >On 12/17/2014 7:06 PM, Gerd Hoffmann wrote:
> > > >>   Hi,
> > > >>
> > > >>>>It's not possible to allow guests direct access to the fence registers
> > > >>>>though.  And if every fence register access traps into the hypervisor
> > > >>>>anyway the hypervisor can easily map the guest virtual fence to host
> > > >>>>physical fence, so there is no need to tell the guest which fences it
> > > >>>>owns, the number of fences is enough.
> > > >>>
> > > >>>That exactly is the part I don't understand - if it is not required to
> > > >>>tell the guest which fences it owns, why it is required to say how many?
> > > >>
> > > >>There is a fixed assignment of fences to guests, so it's a fixed number.
> > > >>But as the hypervisor is involved in any fence access anyway there is no
> > > >>need for the guest to know which of the fences it owns, the hypervisor
> > > >>can remap that transparently for the guest, without performance penalty.
> > > >Thanks Gerd. Exactly.
> > > >Although fence registers are parititioned to vGPU, it is not necessary
> > > >for a vGPU to know the physical mmio addresses of the allocated fence
> > > >registers.
> > > >For example, vGPU 1 with fence size 4 can access the fence registers
> > > >from 0x100000-10001f; at the same time, vGPU 2 with fence size 8 can
> > > >access the fence registers from 0x100000-0x10003f. Although this seems
> > > >conflicting, it does not matter. Because these mmio addresses are all
> > > >supposed to be trapped in the host side, which will keep a record of the
> > > >real fence offset of different vGPUs(say 0 for vGPU 1 and 4 for vGPU 2),
> > > >and then does the remapping. Therefore, the physical operations on the
> > > >fence register will be performed by host code on different ones(say,
> > > >0x100000-10001fh for vGPU 1 and 0x100020-0x10005f for vGPU 2).
> > >
> > > Okay, I think I get it now. What I had in mind is not really possible
> > > without a dedicated hypervisor<->guest communication channel. Or in other
> > > words you would have to extend the way i915 allocates them from mmio
> > writes
> > > to something bi-directional.
> > 
> > You could virtualize fences the same way we virtualize fences for
> > userspace for gtt mmap access: If we need to steal a fences we simply need
> > to unmap the relevant gtt mmio range from the guest ptes. This should work
> > well since on current platforms the only thing that really needs fences is
> > cpu access, the gpu doesn't need them. Well except for some oddball cases
> > in the display block, but those are virtualized anyway (not fbc for guests
> > or anything else like that).
> 
> doing this w/o guest's awareness is challenging. bear in mind that vCPU 
> scheduling (in hypervisor) is split from vGPU scheduling (in xengt driver).
> so xengt driver doesn't know when cpu access happens, and thus steal
> a fence under the hood is problematic.

Well i915 also doesn't interfere with the linux scheduler at all to manage
fences. All you have to do is kick out the ptes before yanking the fence
away, no need to talk to the scheduler at all. Ofc if some other vm is
using that area it'll immediately fault on the cpu, but the fault handler
can then rectify this and steal some other fence that's not used right
away again hopefully.

> based on current enlightenment framework, it's possible to have driver
> allocate/release fences through PVINFO window. That way we can avoid
> the current static partitioning scheme, and the changes shouldn't be
> that much.
> 
> btw curious I heard that typically 2-4 fences are OK for most workloads.
> Is that true? If yes, this fence partition along is not the only limiting
> factor on #VM. the 512M aperture size is more limiting (but will be better
> starting from bdw)
> 
> > 
> > This would also fit a bit more closely with how the host manages fences,
> > so benefiting the new kvm/xengt-on-i915 mode for the host instead of the
> > current implementation which also virtualizes host i915 access cycles.
> > -Daniel
> 
> for host yes we won't virtualize fence then. We're evaluating how to 
> remove most mediations on the host paths, with few exceptions e.g.
> for registers which requires manual save/restore at context swtich
> time.

My idea would be to use additional MI_LRI/SRM commands before/after the
call to load the context into the ring using the functions in
i915_gem_context.c. This is exactly what the execbuf code does too:
There's lots of additional stuff (per-platform) it does around the actual
context switch. So even there I think you can implement the VM switch on
top of i915 infrastructure. Maybe we need to duplicate/export some
execbuf internals, but that's a lot better than duplicating the entire
driver and having 2 drivers fight over the same hw.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch