[Mesa-dev] [PATCH 0/8] i965: Don't recycle BOs until they are idle
Jason Ekstrand
jason at jlekstrand.net
Sat Jun 16 06:23:49 UTC 2018
On Fri, Jun 15, 2018 at 4:44 PM, Eric Anholt <eric at anholt.net> wrote:
> Michel Dänzer <michel at daenzer.net> writes:
>
> > On 2018-06-15 05:25 PM, Jason Ekstrand wrote:
> >> On June 15, 2018 01:14:24 Michel Dänzer <michel at daenzer.net> wrote:
> >>> On 2018-06-15 07:31 AM, Jason Ekstrand wrote:
> >>>>
> >>>> I did some testing and x11perf -copywinwin500 is... exactly the same
> >>>> with
> >>>> or without my patches. If anything they might improve it by just a
> >>>> hair.
> >>>
> >>> Possible explanations I can think of:
> >>>
> >>> 1. Your glamor still has its own FBO cache. Which version of xserver
> are
> >>> you testing with?
> >>>
> >> 1.19 I think
> >
> > Okay, that doesn't have the glamor FBO cache anymore.
> >
> >
> >>> 2. The i965 driver cache isn't hit even before these changes.
> >>
> >> It's definitely getting hit in both cases, it just may require a
> >> slightly larger cache of we aren't recycling BOs until they're idle.
> >
> > It might be more than just slightly, -copywinwin500 can queue many
> > overlapping copies between flushes. Can you compare the maximum total
> > cache size with and without this series?
>
> I suspect it'll be only about a factor of
> how-many-batchbuffers-before-throttling difference -- while the
> batchbuffer still references the BO, the bufmgr wouldn't see the buffer
> to reuse it anyway. I suspect we hit the aperture limit and flush in
> the copywinwin500 case.
>
At Ken's suggestion, I ran some statistics for hits/misses. I did three
runs each with master and with my branch:
Master:
hits = 455868,
misses = 388,
max_bucket_size = 160
hits = 404358,
misses = 113,
max_bucket_size = 34
hits = 497731,
misses = 363,
max_bucket_size = 148
With patches:
hits = 493634
misses = 253,
max_bucket_size = 85
hits = 495667,
misses = 237,
max_bucket_size = 83
hits = 454738,
misses = 358,
max_bucket_size = 132
Some of the numbers, as you can see, are rather noisy but the end result is
about the same: we get at least 1000x as many cache hits as misses when
running that test. I don't think the choice to recycle busy BOs is really
gaining us anything whatsoever. It is worth noting that I did both of
those runs in debug builds because I had to use gdb to get the data back
out of the driver (prints inside the GL driver used by glamor don't work
too well). That probably affected things a bit but I doubt the end result
would have been that much different.
Which begs the question, why does Michel see such a big difference on
radeon? Is there something else that's causing the slow-down? Is
recomputing surface layouts expensive? Is there more VMA shuffling that's
causing problems?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20180615/1e9acd0a/attachment.html>
More information about the mesa-dev
mailing list