[Intel-gfx] [PATCH v2 2/2] drm/i915: write fence reg only once on VGPU

Chris Wilson chris at chris-wilson.co.uk
Tue Jul 3 14:07:30 UTC 2018


Quoting Zhao, Yakui (2018-07-03 14:58:31)
> >-----Original Message-----
> >From: Chris Wilson [mailto:chris at chris-wilson.co.uk]
> >Sent: Tuesday, July 3, 2018 9:25 PM
> >To: Zhao, Yakui <yakui.zhao at intel.com>; Daniel Vetter <daniel at ffwll.ch>
> >Cc: intel-gfx at lists.freedesktop.org
> >Subject: RE: [Intel-gfx] [PATCH v2 2/2] drm/i915: write fence reg only once on
> >VGPU
> >
> >Quoting Zhao, Yakui (2018-07-03 13:47:46)
> >>
> >> >-----Original Message-----
> >> >From: Daniel Vetter [mailto:daniel.vetter at ffwll.ch] On Behalf Of
> >> >Daniel Vetter
> >> >Sent: Tuesday, July 3, 2018 5:52 PM
> >> >To: Chris Wilson <chris at chris-wilson.co.uk>
> >> >Cc: Daniel Vetter <daniel at ffwll.ch>; Zhao, Yakui
> >> ><yakui.zhao at intel.com>; intel-gfx at lists.freedesktop.org
> >> >Subject: Re: [Intel-gfx] [PATCH v2 2/2] drm/i915: write fence reg
> >> >only once on VGPU
> >> >
> >> >On Tue, Jul 03, 2018 at 10:05:28AM +0100, Chris Wilson wrote:
> >> >> Quoting Daniel Vetter (2018-07-03 09:51:03)
> >> >> > On Tue, Jul 03, 2018 at 10:56:17AM +0800, Zhao Yakui wrote:
> >> >> > > On VGPU scenario the read/write operation of fence_reg will be
> >> >> > > trapped by the GVT-g. And then gvt-g follows the HW spec to
> >> >> > > write the
> >> >fence_reg.
> >> >> > > So it is unnecessary to read/write fence reg several times.
> >> >> > > This will help to reduce the unnecessary trap of fence_reg mmio
> >operation.
> >> >> > >
> >> >> > > V1->V2: Fix one typo error of parameter when calling
> >> >> > > V1->intel_vgpu_active
> >> >> > >
> >> >> > > Signed-off-by: Zhao Yakui <yakui.zhao at intel.com>
> >> >> >
> >> >> > Ok this makes more sense. Except you need to put the 64bit
> >> >> > entirely into the vpgu block, with a comment explaining why this
> >> >> > is safe (since the vpgu will take care of updating fences correctly).
> >> >>
> >> >> Except, who cares? Are fence registers being rewritten that
> >> >> frequently that special casing vgpu is worth the hassle. Part of
> >> >> that is that you need to leave a hint behind in the code that (a)
> >> >> explains why it is safe after having the "here be dragons" and (b) why we
> >care.
> >> >>
> >> >> On a more pragmatic level if fencing doesn't plateau out to steady
> >> >> state, that is a worrying amount of contention -- the actual fence
> >> >> write itself would be the least of my worries.
> >> >
> >> >I can easily imagine that with the few per-client fences vgpu hands
> >> >out rewrites are much more common. But yeah some real data would be
> >good.
> >> >And more reasons to get mesa off of the gtt mmaps.
> >>
> >> Hi, Daniel/Chris
> >>
> >>       Thanks for your comments.
> >>       The fence reg is used to assure the access of Tiled surface
> >> through aperature window. When fence is needed, the driver helps to
> >> find one available fence reg and then configure it. After it is not used, the
> >fence will be turned off and then be allocated for next usage. It doesn't rely on
> >the state of fence reg.  In such case we don't need to worry about the
> >unsteady state.
> >>
> >>       For the VGPU operation: The op of fence reg is trapped.  Then the gvt-g
> >will follow the trapped value to program the fence_reg.
> >> (It will turn off and then write the expected value for any trapped write op
> >of fence reg). The trapped op in GVT-g is safe.
> >>
> >>       Based on the current logic,  it needs the five traps when one fence reg is
> >configured under VGPU mode.(Three writes, two reads).
> >> If it is programmed in one 64-bit op under VGPU mode, only one trap is
> >needed. And the GVT-g still can configure the expected fence_value.
> >> As the trap is quite heavy for VGPU, the trap time can be saved.
> >
> >But the argument is can we avoid it entirely by never changing the fence. You
> >say this is used for mapping through the aperture (GTT), we say userspace
> >shouldn't be doing that for performance reasons :) A slow trap on top of a
> >slow operation that is already causing contention seems more sensible to fix
> >at source. (Albeit so long as the maintenance burden is considered and found
> >to be reasonable, adding special cases with their rationale is acceptable.) So
> >you have to sell why this mmio is worthy of special attention and curtail any
> >future questions.
> 
> If the userspace driver/app can take care of the buffer allocation especially for the tiled
> surface, maybe it can reduce the ratio of changing the fence. But this can't be avoided if the tiled
> buffer is needed and allocated. This also depends on the userspace driver. And it is beyond the 
> responsibility of the kernel driver. 

We own the stack. If userspace is not behaving as well as it might, we
need to let them know. Improving algorithms is several orders more
beneficial than micro-optimising the implementation. (Speaking as one
who sadly does do much of the latter.) But you are asking us to maintain
this path with neither any CI coverage nor any indication of the
userspace impact.
 
> If it is configured in non-VGPU mode, the several writes of fence_reg doesn't matter. But under
> the VGPU mode, the trap of fence_reg will cause that it exits the mode of Virtual machine. After the trap
> emulation is finished, it can return to the guest OS and then resume the following sequence(For
> example: On KVMGT it will exit to the Linux host from the guest OS.). The exit of guest OS is quite
> costing. (For example: It needs to check the exit reason and check who/how to do the trap emulation).
> As it is mentioned in the previous email, the current sequence on VGPU needs five traps when one fence reg
> Is programmed.(Three writes, two reads). If only one trap is needed while it can configure the fence 
> reg correctly, it can save the time of unnecessary traps for fence_reg programming. Of course it will
> help to improve the efficiency on the guest OS.

Numbers help, we understand why it is slow to do several traps versus
one, what we don't know is what impact that actually has to userspace,
and whether it is worth optimising this versus every other register.

What is the before after of something like gem_fence_upload?
-Chris


More information about the Intel-gfx mailing list