[Mesa-dev] [PATCH 0/8] i965: Don't recycle BOs until they are idle

Tue Jun 19 00:14:46 UTC 2018

Jason Ekstrand <jason at jlekstrand.net> writes:

> On Mon, Jun 18, 2018 at 2:14 AM, Michel Dänzer <michel at daenzer.net> wrote:
>
>> On 2018-06-16 08:23 AM, Jason Ekstrand wrote:
>> > On Fri, Jun 15, 2018 at 4:44 PM, Eric Anholt <eric at anholt.net> wrote:
>> >
>> >> Michel Dänzer <michel at daenzer.net> writes:
>> >>
>> >>> On 2018-06-15 05:25 PM, Jason Ekstrand wrote:
>> >>>> On June 15, 2018 01:14:24 Michel Dänzer <michel at daenzer.net> wrote:
>> >>>>> On 2018-06-15 07:31 AM, Jason Ekstrand wrote:
>> >>>>>>
>> >>>>>> I did some testing and x11perf -copywinwin500 is... exactly the same
>> >>>>>> with
>> >>>>>> or without my patches.  If anything they might improve it by just a
>> >>>>>> hair.
>> >>>>>
>> >>>>> Possible explanations I can think of:
>> >>>>>
>> >>>>> 1. Your glamor still has its own FBO cache. Which version of xserver
>> >> are
>> >>>>> you testing with?
>> >>>>>
>> >>>> 1.19 I think
>> >>>
>> >>> Okay, that doesn't have the glamor FBO cache anymore.
>> >>>
>> >>>
>> >>>>> 2. The i965 driver cache isn't hit even before these changes.
>> >>>>
>> >>>> It's definitely getting hit in both cases, it just may require a
>> >>>> slightly larger cache of we aren't recycling BOs until they're idle.
>> >>>
>> >>> It might be more than just slightly, -copywinwin500 can queue many
>> >>> overlapping copies between flushes. Can you compare the maximum total
>> >>> cache size with and without this series?
>> >>
>> >> I suspect it'll be only about a factor of
>> >> how-many-batchbuffers-before-throttling difference -- while the
>> >> batchbuffer still references the BO, the bufmgr wouldn't see the buffer
>> >> to reuse it anyway.  I suspect we hit the aperture limit and flush in
>> >> the copywinwin500 case.
>> >>
>> >
>> > At Ken's suggestion, I ran some statistics for hits/misses.  I did three
>> > runs each with master and with my branch:
>> >
>> > Master:
>> >
>> > hits = 455868,
>> > misses = 388,
>> > max_bucket_size = 160
>> >
>> > hits = 404358,
>> > misses = 113,
>> > max_bucket_size = 34
>> >
>> > hits = 497731,
>> > misses = 363,
>> > max_bucket_size = 148
>> >
>> > With patches:
>> >
>> > hits = 493634
>> > misses = 253,
>> > max_bucket_size = 85
>> >
>> > hits = 495667,
>> > misses = 237,
>> > max_bucket_size = 83
>> >
>> > hits = 454738,
>> > misses = 358,
>> > max_bucket_size = 132
>> >
>> > Some of the numbers, as you can see, are rather noisy but the end result
>> is
>> > about the same: we get at least 1000x as many cache hits as misses when
>> > running that test.  I don't think the choice to recycle busy BOs is
>> really
>> > gaining us anything whatsoever.  It is worth noting that I did both of
>> > those runs in debug builds because I had to use gdb to get the data back
>> > out of the driver (prints inside the GL driver used by glamor don't work
>> > too well).  That probably affected things a bit but I doubt the end
>> result
>> > would have been that much different.
>> >
>> > Which begs the question, why does Michel see such a big difference on
>> > radeon?
>>
>> The glamor FBO cache could reuse the temporary FBO even before flushing,
>> so only one such FBO was ever needed. From what Eric wrote above, it
>> sounds like the i965 cache can only reuse BOs after a flush, so there's
>> relatively little difference between reusing busy BOs or not.
>>
>
> It occurred to me today while talking to Jordan about this stuff that X may
> not be getting busy BO re-use.  We generally only allocate busy BOs for
> renderbuffers.  For textures we expect that there's a decent chance we'll
> map it so we allocate an idle BO.  Guess which one modesetting uses!  Yup,
> textures.  It wasn't getting the busy BO optimization at all.  I hacked up
> mesa to use the busy BO path for textures as well and x11perf
> -copywinwin500 improved by 25%.  It's no 3x but it's enough to make me
> think that this patch series may not be such a good idea. :-(  On the
> upside, I now know how to improve an X11 microbenchmark by 25%. :-)

If you allocate from busy by default, but also throw out and allocate
from idle if you do a full-texture upload, that will probably do
approximately what you would want for X11.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20180618/896024a1/attachment.sig>