[Mesa-dev] [PATCH 0/8] i965: Don't recycle BOs until they are idle

Mon Jun 18 09:14:25 UTC 2018

On 2018-06-16 08:23 AM, Jason Ekstrand wrote:
> On Fri, Jun 15, 2018 at 4:44 PM, Eric Anholt <eric at anholt.net> wrote:
> 
>> Michel Dänzer <michel at daenzer.net> writes:
>>
>>> On 2018-06-15 05:25 PM, Jason Ekstrand wrote:
>>>> On June 15, 2018 01:14:24 Michel Dänzer <michel at daenzer.net> wrote:
>>>>> On 2018-06-15 07:31 AM, Jason Ekstrand wrote:
>>>>>>
>>>>>> I did some testing and x11perf -copywinwin500 is... exactly the same
>>>>>> with
>>>>>> or without my patches.  If anything they might improve it by just a
>>>>>> hair.
>>>>>
>>>>> Possible explanations I can think of:
>>>>>
>>>>> 1. Your glamor still has its own FBO cache. Which version of xserver
>> are
>>>>> you testing with?
>>>>>
>>>> 1.19 I think
>>>
>>> Okay, that doesn't have the glamor FBO cache anymore.
>>>
>>>
>>>>> 2. The i965 driver cache isn't hit even before these changes.
>>>>
>>>> It's definitely getting hit in both cases, it just may require a
>>>> slightly larger cache of we aren't recycling BOs until they're idle.
>>>
>>> It might be more than just slightly, -copywinwin500 can queue many
>>> overlapping copies between flushes. Can you compare the maximum total
>>> cache size with and without this series?
>>
>> I suspect it'll be only about a factor of
>> how-many-batchbuffers-before-throttling difference -- while the
>> batchbuffer still references the BO, the bufmgr wouldn't see the buffer
>> to reuse it anyway.  I suspect we hit the aperture limit and flush in
>> the copywinwin500 case.
>>
> 
> At Ken's suggestion, I ran some statistics for hits/misses.  I did three
> runs each with master and with my branch:
> 
> Master:
> 
> hits = 455868,
> misses = 388,
> max_bucket_size = 160
> 
> hits = 404358,
> misses = 113,
> max_bucket_size = 34
> 
> hits = 497731,
> misses = 363,
> max_bucket_size = 148
> 
> With patches:
> 
> hits = 493634
> misses = 253,
> max_bucket_size = 85
> 
> hits = 495667,
> misses = 237,
> max_bucket_size = 83
> 
> hits = 454738,
> misses = 358,
> max_bucket_size = 132
> 
> Some of the numbers, as you can see, are rather noisy but the end result is
> about the same: we get at least 1000x as many cache hits as misses when
> running that test.  I don't think the choice to recycle busy BOs is really
> gaining us anything whatsoever.  It is worth noting that I did both of
> those runs in debug builds because I had to use gdb to get the data back
> out of the driver (prints inside the GL driver used by glamor don't work
> too well).  That probably affected things a bit but I doubt the end result
> would have been that much different.
> 
> Which begs the question, why does Michel see such a big difference on
> radeon?

The glamor FBO cache could reuse the temporary FBO even before flushing,
so only one such FBO was ever needed. From what Eric wrote above, it
sounds like the i965 cache can only reuse BOs after a flush, so there's
relatively little difference between reusing busy BOs or not.

-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer