[Intel-gfx] [PATCH v2 2/2] drm/i915: write fence reg only once on VGPU

Chris Wilson chris at chris-wilson.co.uk
Wed Jul 4 09:40:05 UTC 2018


Quoting Zhao, Yakui (2018-07-04 03:09:03)
> >-----Original Message-----
> >From: Chris Wilson [mailto:chris at chris-wilson.co.uk]
> >Sent: Tuesday, July 3, 2018 10:08 PM
> >To: Zhao, Yakui <yakui.zhao at intel.com>; Daniel Vetter <daniel at ffwll.ch>
> >Cc: intel-gfx at lists.freedesktop.org
> >Subject: RE: [Intel-gfx] [PATCH v2 2/2] drm/i915: write fence reg only once on
> >VGPU
> >
> >Quoting Zhao, Yakui (2018-07-03 14:58:31)
> >> >-----Original Message-----
> >> >From: Chris Wilson [mailto:chris at chris-wilson.co.uk]
> >> >Sent: Tuesday, July 3, 2018 9:25 PM
> >> >To: Zhao, Yakui <yakui.zhao at intel.com>; Daniel Vetter
> >> ><daniel at ffwll.ch>
> >> >Cc: intel-gfx at lists.freedesktop.org
> >> >Subject: RE: [Intel-gfx] [PATCH v2 2/2] drm/i915: write fence reg
> >> >only once on VGPU
> >> >
> >> >Quoting Zhao, Yakui (2018-07-03 13:47:46)
> >> >>
> >> >> >-----Original Message-----
> >> >> >From: Daniel Vetter [mailto:daniel.vetter at ffwll.ch] On Behalf Of
> >> >> >Daniel Vetter
> >> >> >Sent: Tuesday, July 3, 2018 5:52 PM
> >> >> >To: Chris Wilson <chris at chris-wilson.co.uk>
> >> >> >Cc: Daniel Vetter <daniel at ffwll.ch>; Zhao, Yakui
> >> >> ><yakui.zhao at intel.com>; intel-gfx at lists.freedesktop.org
> >> >> >Subject: Re: [Intel-gfx] [PATCH v2 2/2] drm/i915: write fence reg
> >> >> >only once on VGPU
> >> >> >
> >> >> >On Tue, Jul 03, 2018 at 10:05:28AM +0100, Chris Wilson wrote:
> >> >> >> Quoting Daniel Vetter (2018-07-03 09:51:03)
> >> >> >> > On Tue, Jul 03, 2018 at 10:56:17AM +0800, Zhao Yakui wrote:
> >> >> >> > > On VGPU scenario the read/write operation of fence_reg will
> >> >> >> > > be trapped by the GVT-g. And then gvt-g follows the HW spec
> >> >> >> > > to write the
> >> >> >fence_reg.
> >> >> >> > > So it is unnecessary to read/write fence reg several times.
> >> >> >> > > This will help to reduce the unnecessary trap of fence_reg
> >> >> >> > > mmio
> >> >operation.
> >> >> >> > >
> >> >> >> > > V1->V2: Fix one typo error of parameter when calling
> >> >> >> > > V1->intel_vgpu_active
> >> >> >> > >
> >> >> >> > > Signed-off-by: Zhao Yakui <yakui.zhao at intel.com>
> >> >> >> >
> >> >> >> > Ok this makes more sense. Except you need to put the 64bit
> >> >> >> > entirely into the vpgu block, with a comment explaining why
> >> >> >> > this is safe (since the vpgu will take care of updating fences correctly).
> >> >> >>
> >> >> >> Except, who cares? Are fence registers being rewritten that
> >> >> >> frequently that special casing vgpu is worth the hassle. Part of
> >> >> >> that is that you need to leave a hint behind in the code that
> >> >> >> (a) explains why it is safe after having the "here be dragons"
> >> >> >> and (b) why we
> >> >care.
> >> >> >>
> >> >> >> On a more pragmatic level if fencing doesn't plateau out to
> >> >> >> steady state, that is a worrying amount of contention -- the
> >> >> >> actual fence write itself would be the least of my worries.
> >> >> >
> >> >> >I can easily imagine that with the few per-client fences vgpu
> >> >> >hands out rewrites are much more common. But yeah some real data
> >> >> >would be
> >> >good.
> >> >> >And more reasons to get mesa off of the gtt mmaps.
> >> >>
> >> >> Hi, Daniel/Chris
> >> >>
> >> >>       Thanks for your comments.
> >> >>       The fence reg is used to assure the access of Tiled surface
> >> >> through aperature window. When fence is needed, the driver helps to
> >> >> find one available fence reg and then configure it. After it is not
> >> >> used, the
> >> >fence will be turned off and then be allocated for next usage. It
> >> >doesn't rely on the state of fence reg.  In such case we don't need
> >> >to worry about the unsteady state.
> >> >>
> >> >>       For the VGPU operation: The op of fence reg is trapped.  Then
> >> >> the gvt-g
> >> >will follow the trapped value to program the fence_reg.
> >> >> (It will turn off and then write the expected value for any trapped
> >> >> write op
> >> >of fence reg). The trapped op in GVT-g is safe.
> >> >>
> >> >>       Based on the current logic,  it needs the five traps when one
> >> >> fence reg is
> >> >configured under VGPU mode.(Three writes, two reads).
> >> >> If it is programmed in one 64-bit op under VGPU mode, only one trap
> >> >> is
> >> >needed. And the GVT-g still can configure the expected fence_value.
> >> >> As the trap is quite heavy for VGPU, the trap time can be saved.
> >> >
> >> >But the argument is can we avoid it entirely by never changing the
> >> >fence. You say this is used for mapping through the aperture (GTT),
> >> >we say userspace shouldn't be doing that for performance reasons :) A
> >> >slow trap on top of a slow operation that is already causing
> >> >contention seems more sensible to fix at source. (Albeit so long as
> >> >the maintenance burden is considered and found to be reasonable,
> >> >adding special cases with their rationale is acceptable.) So you have
> >> >to sell why this mmio is worthy of special attention and curtail any future
> >questions.
> >>
> >> If the userspace driver/app can take care of the buffer allocation
> >> especially for the tiled surface, maybe it can reduce the ratio of
> >> changing the fence. But this can't be avoided if the tiled buffer is
> >> needed and allocated. This also depends on the userspace driver. And it is
> >beyond the responsibility of the kernel driver.
> >
> >We own the stack. If userspace is not behaving as well as it might, we need to
> >let them know. Improving algorithms is several orders more beneficial than
> >micro-optimising the implementation. (Speaking as one who sadly does do
> >much of the latter.) But you are asking us to maintain this path with neither
> >any CI coverage nor any indication of the userspace impact.
> >
> >> If it is configured in non-VGPU mode, the several writes of fence_reg
> >> doesn't matter. But under the VGPU mode, the trap of fence_reg will
> >> cause that it exits the mode of Virtual machine. After the trap
> >> emulation is finished, it can return to the guest OS and then resume
> >> the following sequence(For
> >> example: On KVMGT it will exit to the Linux host from the guest OS.).
> >> The exit of guest OS is quite costing. (For example: It needs to check the exit
> >reason and check who/how to do the trap emulation).
> >> As it is mentioned in the previous email, the current sequence on VGPU
> >> needs five traps when one fence reg Is programmed.(Three writes, two
> >> reads). If only one trap is needed while it can configure the fence
> >> reg correctly, it can save the time of unnecessary traps for fence_reg
> >programming. Of course it will help to improve the efficiency on the guest OS.
> >
> >Numbers help, we understand why it is slow to do several traps versus one,
> >what we don't know is what impact that actually has to userspace, and
> >whether it is worth optimising this versus every other register.
> >
> Hi, Chris
> 
>        The userspace driver/library on guest OS has no sense that it is running on the virtual machine.

And? Fence thrashing is bad irrespective of virtualisation or not.
Improving algorithms to avoid fencing should be effective whether or
not userspace is running on a mediated vgpu, and avoiding everything in
the path leading to rewriting fences (struct_mutex!) so much effective
than improving the traps.

It is not that one precludes the other, it's just the
micro-optimisations still need to be justified, or they'll be forgotten
and removed.

> In such case the trap of fence_reg on VGPU is transparent to the userspace driver/library. 
> Only when trying to run the performance test, the performance will be affected by so many traps.

But how much so?
 
> If the trap of other register can also be optimized, it will help the performance of guest OS.  Of course the
> Prerequisite is that  we collect the corresponding trap info when running the corresponding workload.
> This is case by case.
>
> >What is the before after of something like gem_fence_upload?
> 
> I check the case of gem_fence_upload. It only checks the uploading performance after fence is already programmed.
> So this patch has no impact on gem_fence_upload.

Incorrect. By the same token, you haven't measured yourself the impact
of this change?
-Chris


More information about the Intel-gfx mailing list