[Mesa-dev] [PATCH] st/dri: decrease input lag by syncing sooner in SwapBuffers

Sun Apr 28 00:46:33 UTC 2019

The input lag is maxframes*1000/fps ms. For 10 fps and 2 frames, the lag is
200 ms. You might also say that it's the amount of time it will take the
GPU to catch up with the CPU after fence_finish returns.

Hopefully that clears up performance concerns.

Marek

On Sat, Apr 27, 2019, 4:07 PM Axel Davy <davyaxel0 at gmail.com> wrote:

> On 27/04/2019 21:02, Rob Clark wrote:
> > On Sat, Apr 27, 2019 at 9:52 AM Axel Davy <davyaxel0 at gmail.com> wrote:
> >> On 27/04/2019 18:13, Rob Clark wrote:
> >>> On Thu, Apr 25, 2019 at 7:06 PM Marek Olšák <maraeo at gmail.com> wrote:
> >>>> From: Marek Olšák <marek.olsak at amd.com>
> >>>>
> >>>> It's done by:
> >>>> - decrease the number of frames in flight by 1
> >>>> - flush before throttling in SwapBuffers
> >>>>     (instead of wait-then-flush, do flush-then-wait)
> >>>>
> >>>> The improvement is apparent with Unigine Heaven.
> >>>>
> >>>> Previously:
> >>>>       draw frame 2
> >>>>       wait frame 0
> >>>>       flush frame 2
> >>>>       present frame 2
> >>>>
> >>>>       The input lag is 2 frames.
> >>>>
> >>>> Now:
> >>>>       draw frame 2
> >>>>       flush frame 2
> >>>>       wait frame 1
> >>>>       present frame 2
> >>>>
> >>>>       The input lag is 1 frame. Flushing is done before waiting,
> because
> >>>>       otherwise the device would be idle after waiting.
> >>>>
> >>>> Nine is affected because it also uses the pipe cap.
> >>>> ---
> >>>>    src/gallium/auxiliary/util/u_screen.c         |  2 +-
> >>>>    src/gallium/state_trackers/dri/dri_drawable.c | 20
> +++++++++----------
> >>>>    2 files changed, 11 insertions(+), 11 deletions(-)
> >>>>
> >>>> diff --git a/src/gallium/auxiliary/util/u_screen.c
> b/src/gallium/auxiliary/util/u_screen.c
> >>>> index 27f51e0898e..410f17421e6 100644
> >>>> --- a/src/gallium/auxiliary/util/u_screen.c
> >>>> +++ b/src/gallium/auxiliary/util/u_screen.c
> >>>> @@ -349,21 +349,21 @@ u_pipe_screen_get_param_defaults(struct
> pipe_screen *pscreen,
> >>>>       case PIPE_CAP_MAX_VARYINGS:
> >>>>          return 8;
> >>>>
> >>>>       case PIPE_CAP_COMPUTE_GRID_INFO_LAST_BLOCK:
> >>>>          return 0;
> >>>>
> >>>>       case PIPE_CAP_COMPUTE_SHADER_DERIVATIVES:
> >>>>          return 0;
> >>>>
> >>>>       case PIPE_CAP_MAX_FRAMES_IN_FLIGHT:
> >>>> -      return 2;
> >>>> +      return 1;
> >>> would it be safer to leave the current default and let drivers opt-in
> >>> to the lower # individually?  I guess triple buffering would still be
> >>> better for some of the smaller gpu's?
> >>>
> >>> disclaimer: I haven't tested this myself or looked very closely at the
> >>> dri code.. so could be misunderstanding something..
> >>>
> >>> BR,
> >>> -R
> >>
> >> I think I can shed some light on the issue to justify (or not) the
> change.
> >>
> >> If we don't do throttling and the CPU renders frames at a faster rate
> >> than what the GPU can render,
> >> the GPU can become quite late and cause huge frame lag.
> >>
> >> The throttling involves forcing a (CPU) wait when a frame is presented
> >> if the 'x' previous images have not finished rendering.
> >>
> >> The lower 'x', the lower the frame lag.
> >>
> >> However if 'x' is too low (waiting current frame is rendered for
> >> example), the GPU can be idle until the CPU is flushing new commands.
> >>
> >> Now there is a choice to be made for the value of 'x'. 1 or 2 are
> >> reasonable values.
> >>
> >> if 'x' is 1, we wait the previous frame was rendered when we present the
> >> current frame. For '2' we wait the frame before.
> >>
> >>
> >> Thus for smaller gpu's, a value of 1 is better than 2 as it is more
> >> affected by the frame lag (as frames take longer to render).
> >>
> > I get the latency aspect.. but my comment was more about latency vs
> > framerate (or maybe more cynically, about games vs benchmarks :-P)
> >
> > BR,
> > -R
>
>
> As long at the GPU has work to do, performance should be maximized.
>
> However in the case I described below, if CPU and GPU render at about
> the same framerate and
> the framerate has some variations (whether it being the GPU taking more
> time for one frame, or the CPU),
> using more than 1 would give a bit better performance.
>
>
> Axel
>
>
>
> >
> >> However if a game is rendering at a very unstable framerate (some frames
> >> needing more work than others), using a value of 2 is safer
> >> to maximize performance. (As using a value of 1 would lead to wait if we
> >> get a frame that takes particularly long, as using 2 smooths that a bit)
> >>
> >>
> >> I remember years ago I had extremely unstable fps when using catalyst on
> >> Portal for example. But I believe it is more a driver issue than a game
> >> issue.
> >>
> >> If we assume games don't have unstable framerate, (which seems
> >> reasonable assumption), using 1 as default makes sense.
> >>
> >>
> >> If one wants to test experimentally for regression, the ideal test case
> >> if when the GPU renders at about the same framerate as the CPU feeds it.
> >>
> >>
> >>
> >> Axel
> >>
> >>
> >>
> >>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20190427/3ef3597e/attachment-0001.html>