[Mesa-dev] [PATCH 0/8] i965: Don't recycle BOs until they are idle

Sat Jun 16 06:23:49 UTC 2018

On Fri, Jun 15, 2018 at 4:44 PM, Eric Anholt <eric at anholt.net> wrote:

> Michel Dänzer <michel at daenzer.net> writes:
>
> > On 2018-06-15 05:25 PM, Jason Ekstrand wrote:
> >> On June 15, 2018 01:14:24 Michel Dänzer <michel at daenzer.net> wrote:
> >>> On 2018-06-15 07:31 AM, Jason Ekstrand wrote:
> >>>>
> >>>> I did some testing and x11perf -copywinwin500 is... exactly the same
> >>>> with
> >>>> or without my patches.  If anything they might improve it by just a
> >>>> hair.
> >>>
> >>> Possible explanations I can think of:
> >>>
> >>> 1. Your glamor still has its own FBO cache. Which version of xserver
> are
> >>> you testing with?
> >>>
> >> 1.19 I think
> >
> > Okay, that doesn't have the glamor FBO cache anymore.
> >
> >
> >>> 2. The i965 driver cache isn't hit even before these changes.
> >>
> >> It's definitely getting hit in both cases, it just may require a
> >> slightly larger cache of we aren't recycling BOs until they're idle.
> >
> > It might be more than just slightly, -copywinwin500 can queue many
> > overlapping copies between flushes. Can you compare the maximum total
> > cache size with and without this series?
>
> I suspect it'll be only about a factor of
> how-many-batchbuffers-before-throttling difference -- while the
> batchbuffer still references the BO, the bufmgr wouldn't see the buffer
> to reuse it anyway.  I suspect we hit the aperture limit and flush in
> the copywinwin500 case.
>

At Ken's suggestion, I ran some statistics for hits/misses.  I did three
runs each with master and with my branch:

Master:

hits = 455868,
misses = 388,
max_bucket_size = 160

hits = 404358,
misses = 113,
max_bucket_size = 34

hits = 497731,
misses = 363,
max_bucket_size = 148

With patches:

hits = 493634
misses = 253,
max_bucket_size = 85

hits = 495667,
misses = 237,
max_bucket_size = 83

hits = 454738,
misses = 358,
max_bucket_size = 132

Some of the numbers, as you can see, are rather noisy but the end result is
about the same: we get at least 1000x as many cache hits as misses when
running that test.  I don't think the choice to recycle busy BOs is really
gaining us anything whatsoever.  It is worth noting that I did both of
those runs in debug builds because I had to use gdb to get the data back
out of the driver (prints inside the GL driver used by glamor don't work
too well).  That probably affected things a bit but I doubt the end result
would have been that much different.

Which begs the question, why does Michel see such a big difference on
radeon?  Is there something else that's causing the slow-down?  Is
recomputing surface layouts expensive?  Is there more VMA shuffling that's
causing problems?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20180615/1e9acd0a/attachment.html>