<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Mon, Jun 18, 2018 at 2:14 AM, Michel Dänzer <span dir="ltr"><<a href="mailto:michel@daenzer.net" target="_blank">michel@daenzer.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On 2018-06-16 08:23 AM, Jason Ekstrand wrote:<br>
> On Fri, Jun 15, 2018 at 4:44 PM, Eric Anholt <<a href="mailto:eric@anholt.net">eric@anholt.net</a>> wrote:<br>
> <br>
>> Michel Dänzer <<a href="mailto:michel@daenzer.net">michel@daenzer.net</a>> writes:<br>
>><br>
>>> On 2018-06-15 05:25 PM, Jason Ekstrand wrote:<br>
>>>> On June 15, 2018 01:14:24 Michel Dänzer <<a href="mailto:michel@daenzer.net">michel@daenzer.net</a>> wrote:<br>
>>>>> On 2018-06-15 07:31 AM, Jason Ekstrand wrote:<br>
>>>>>><br>
>>>>>> I did some testing and x11perf -copywinwin500 is... exactly the same<br>
>>>>>> with<br>
>>>>>> or without my patches. If anything they might improve it by just a<br>
>>>>>> hair.<br>
>>>>><br>
>>>>> Possible explanations I can think of:<br>
>>>>><br>
>>>>> 1. Your glamor still has its own FBO cache. Which version of xserver<br>
>> are<br>
>>>>> you testing with?<br>
>>>>><br>
>>>> 1.19 I think<br>
>>><br>
>>> Okay, that doesn't have the glamor FBO cache anymore.<br>
>>><br>
>>><br>
>>>>> 2. The i965 driver cache isn't hit even before these changes.<br>
>>>><br>
>>>> It's definitely getting hit in both cases, it just may require a<br>
>>>> slightly larger cache of we aren't recycling BOs until they're idle.<br>
>>><br>
>>> It might be more than just slightly, -copywinwin500 can queue many<br>
>>> overlapping copies between flushes. Can you compare the maximum total<br>
>>> cache size with and without this series?<br>
>><br>
>> I suspect it'll be only about a factor of<br>
>> how-many-batchbuffers-before-<wbr>throttling difference -- while the<br>
>> batchbuffer still references the BO, the bufmgr wouldn't see the buffer<br>
>> to reuse it anyway. I suspect we hit the aperture limit and flush in<br>
>> the copywinwin500 case.<br>
>><br>
> <br>
> At Ken's suggestion, I ran some statistics for hits/misses. I did three<br>
> runs each with master and with my branch:<br>
> <br>
> Master:<br>
> <br>
> hits = 455868,<br>
> misses = 388,<br>
> max_bucket_size = 160<br>
> <br>
> hits = 404358,<br>
> misses = 113,<br>
> max_bucket_size = 34<br>
> <br>
> hits = 497731,<br>
> misses = 363,<br>
> max_bucket_size = 148<br>
> <br>
> With patches:<br>
> <br>
> hits = 493634<br>
> misses = 253,<br>
> max_bucket_size = 85<br>
> <br>
> hits = 495667,<br>
> misses = 237,<br>
> max_bucket_size = 83<br>
> <br>
> hits = 454738,<br>
> misses = 358,<br>
> max_bucket_size = 132<br>
> <br>
> Some of the numbers, as you can see, are rather noisy but the end result is<br>
> about the same: we get at least 1000x as many cache hits as misses when<br>
> running that test. I don't think the choice to recycle busy BOs is really<br>
> gaining us anything whatsoever. It is worth noting that I did both of<br>
> those runs in debug builds because I had to use gdb to get the data back<br>
> out of the driver (prints inside the GL driver used by glamor don't work<br>
> too well). That probably affected things a bit but I doubt the end result<br>
> would have been that much different.<br>
> <br>
> Which begs the question, why does Michel see such a big difference on<br>
> radeon?<br>
<br>
</div></div>The glamor FBO cache could reuse the temporary FBO even before flushing,<br>
so only one such FBO was ever needed. From what Eric wrote above, it<br>
sounds like the i965 cache can only reuse BOs after a flush, so there's<br>
relatively little difference between reusing busy BOs or not.<br>
</blockquote></div></div><div class="gmail_extra"><br></div><div class="gmail_extra">It occurred to me today while talking to Jordan about this stuff that X may not be getting busy BO re-use. We generally only allocate busy BOs for renderbuffers. For textures we expect that there's a decent chance we'll map it so we allocate an idle BO. Guess which one modesetting uses! Yup, textures. It wasn't getting the busy BO optimization at all. I hacked up mesa to use the busy BO path for textures as well and x11perf -copywinwin500 improved by 25%. It's no 3x but it's enough to make me think that this patch series may not be such a good idea. :-( On the upside, I now know how to improve an X11 microbenchmark by 25%. :-)<br></div></div>