<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Fri, Jun 15, 2018 at 4:44 PM, Eric Anholt <span dir="ltr"><<a href="mailto:eric@anholt.net" target="_blank">eric@anholt.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="gmail-">Michel Dänzer <<a href="mailto:michel@daenzer.net">michel@daenzer.net</a>> writes:<br>
<br>
> On 2018-06-15 05:25 PM, Jason Ekstrand wrote:<br>
>> On June 15, 2018 01:14:24 Michel Dänzer <<a href="mailto:michel@daenzer.net">michel@daenzer.net</a>> wrote:<br>
>>> On 2018-06-15 07:31 AM, Jason Ekstrand wrote:<br>
>>>><br>
>>>> I did some testing and x11perf -copywinwin500 is... exactly the same<br>
>>>> with<br>
>>>> or without my patches. If anything they might improve it by just a<br>
>>>> hair.<br>
>>><br>
>>> Possible explanations I can think of:<br>
>>><br>
>>> 1. Your glamor still has its own FBO cache. Which version of xserver are<br>
>>> you testing with?<br>
>>><br>
>> 1.19 I think<br>
><br>
> Okay, that doesn't have the glamor FBO cache anymore.<br>
><br>
><br>
>>> 2. The i965 driver cache isn't hit even before these changes.<br>
>> <br>
>> It's definitely getting hit in both cases, it just may require a<br>
>> slightly larger cache of we aren't recycling BOs until they're idle.<br>
><br>
> It might be more than just slightly, -copywinwin500 can queue many<br>
> overlapping copies between flushes. Can you compare the maximum total<br>
> cache size with and without this series?<br>
<br>
</span>I suspect it'll be only about a factor of<br>
how-many-batchbuffers-before-<wbr>throttling difference -- while the<br>
batchbuffer still references the BO, the bufmgr wouldn't see the buffer<br>
to reuse it anyway. I suspect we hit the aperture limit and flush in<br>
the copywinwin500 case.<br>
</blockquote></div></div><div class="gmail_extra"><br></div><div class="gmail_extra">At Ken's suggestion, I ran some statistics for hits/misses. I did three runs each with master and with my branch:</div><div class="gmail_extra"><br>Master:<br><br>hits = 455868,<br>misses = 388,<br>max_bucket_size = 160<br><br>hits = 404358,<br>misses = 113,<br>max_bucket_size = 34<br><br>hits = 497731,<br>misses = 363,<br>max_bucket_size = 148<br></div><div class="gmail_extra"><br></div><div class="gmail_extra">With patches:<br><br>hits = 493634<br>misses = 253,<br>max_bucket_size = 85<br><br>hits = 495667,<br>misses = 237,<br>max_bucket_size = 83<br><br>hits = 454738,<br>misses = 358,<br>max_bucket_size = 132<br></div><div class="gmail_extra"><br></div><div class="gmail_extra">Some of the numbers, as you can see, are rather noisy but the end result is about the same: we get at least 1000x as many cache hits as misses when running that test. I don't think the choice to recycle busy BOs is really gaining us anything whatsoever. It is worth noting that I did both of those runs in debug builds because I had to use gdb to get the data back out of the driver (prints inside the GL driver used by glamor don't work too well). That probably affected things a bit but I doubt the end result would have been that much different.<br></div><div class="gmail_extra"><br></div><div class="gmail_extra">Which begs the question, why does Michel see such a big difference on radeon? Is there something else that's causing the slow-down? Is recomputing surface layouts expensive? Is there more VMA shuffling that's causing problems?</div></div>