<div dir="auto">The input lag is maxframes*1000/fps ms. For 10 fps and 2 frames, the lag is 200 ms. You might also say that it's the amount of time it will take the GPU to catch up with the CPU after fence_finish returns.<div dir="auto"> </div><div dir="auto">Hopefully that clears up performance concerns.</div><div dir="auto"> </div><div dir="auto">Marek</div></div> <div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, Apr 27, 2019, 4:07 PM Axel Davy <<a href="mailto:davyaxel0@gmail.com">davyaxel0@gmail.com</a>> wrote: </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 27/04/2019 21:02, Rob Clark wrote: > On Sat, Apr 27, 2019 at 9:52 AM Axel Davy <<a href="mailto:davyaxel0@gmail.com" target="_blank" rel="noreferrer">davyaxel0@gmail.com</a>> wrote: >> On 27/04/2019 18:13, Rob Clark wrote: >>> On Thu, Apr 25, 2019 at 7:06 PM Marek Olšák <<a href="mailto:maraeo@gmail.com" target="_blank" rel="noreferrer">maraeo@gmail.com</a>> wrote: >>>> From: Marek Olšák <<a href="mailto:marek.olsak@amd.com" target="_blank" rel="noreferrer">marek.olsak@amd.com</a>> >>>> >>>> It's done by: >>>> - decrease the number of frames in flight by 1 >>>> - flush before throttling in SwapBuffers >>>> (instead of wait-then-flush, do flush-then-wait) >>>> >>>> The improvement is apparent with Unigine Heaven. >>>> >>>> Previously: >>>> draw frame 2 >>>> wait frame 0 >>>> flush frame 2 >>>> present frame 2 >>>> >>>> The input lag is 2 frames. >>>> >>>> Now: >>>> draw frame 2 >>>> flush frame 2 >>>> wait frame 1 >>>> present frame 2 >>>> >>>> The input lag is 1 frame. Flushing is done before waiting, because >>>> otherwise the device would be idle after waiting. >>>> >>>> Nine is affected because it also uses the pipe cap. >>>> --- >>>> src/gallium/auxiliary/util/u_screen.c | 2 +- >>>> src/gallium/state_trackers/dri/dri_drawable.c | 20 +++++++++---------- >>>> 2 files changed, 11 insertions(+), 11 deletions(-) >>>> >>>> diff --git a/src/gallium/auxiliary/util/u_screen.c b/src/gallium/auxiliary/util/u_screen.c >>>> index 27f51e0898e..410f17421e6 100644 >>>> --- a/src/gallium/auxiliary/util/u_screen.c >>>> +++ b/src/gallium/auxiliary/util/u_screen.c >>>> @@ -349,21 +349,21 @@ u_pipe_screen_get_param_defaults(struct pipe_screen *pscreen, >>>> case PIPE_CAP_MAX_VARYINGS: >>>> return 8; >>>> >>>> case PIPE_CAP_COMPUTE_GRID_INFO_LAST_BLOCK: >>>> return 0; >>>> >>>> case PIPE_CAP_COMPUTE_SHADER_DERIVATIVES: >>>> return 0; >>>> >>>> case PIPE_CAP_MAX_FRAMES_IN_FLIGHT: >>>> - return 2; >>>> + return 1; >>> would it be safer to leave the current default and let drivers opt-in >>> to the lower # individually? I guess triple buffering would still be >>> better for some of the smaller gpu's? >>> >>> disclaimer: I haven't tested this myself or looked very closely at the >>> dri code.. so could be misunderstanding something.. >>> >>> BR, >>> -R >> >> I think I can shed some light on the issue to justify (or not) the change. >> >> If we don't do throttling and the CPU renders frames at a faster rate >> than what the GPU can render, >> the GPU can become quite late and cause huge frame lag. >> >> The throttling involves forcing a (CPU) wait when a frame is presented >> if the 'x' previous images have not finished rendering. >> >> The lower 'x', the lower the frame lag. >> >> However if 'x' is too low (waiting current frame is rendered for >> example), the GPU can be idle until the CPU is flushing new commands. >> >> Now there is a choice to be made for the value of 'x'. 1 or 2 are >> reasonable values. >> >> if 'x' is 1, we wait the previous frame was rendered when we present the >> current frame. For '2' we wait the frame before. >> >> >> Thus for smaller gpu's, a value of 1 is better than 2 as it is more >> affected by the frame lag (as frames take longer to render). >> > I get the latency aspect.. but my comment was more about latency vs > framerate (or maybe more cynically, about games vs benchmarks :-P) > > BR, > -R As long at the GPU has work to do, performance should be maximized. However in the case I described below, if CPU and GPU render at about the same framerate and the framerate has some variations (whether it being the GPU taking more time for one frame, or the CPU), using more than 1 would give a bit better performance. Axel > >> However if a game is rendering at a very unstable framerate (some frames >> needing more work than others), using a value of 2 is safer >> to maximize performance. (As using a value of 1 would lead to wait if we >> get a frame that takes particularly long, as using 2 smooths that a bit) >> >> >> I remember years ago I had extremely unstable fps when using catalyst on >> Portal for example. But I believe it is more a driver issue than a game >> issue. >> >> If we assume games don't have unstable framerate, (which seems >> reasonable assumption), using 1 as default makes sense. >> >> >> If one wants to test experimentally for regression, the ideal test case >> if when the GPU renders at about the same framerate as the CPU feeds it. >> >> >> >> Axel >> >> >> >> </blockquote></div>