[Mesa-dev] [PATCH 0/8] i965: Don't recycle BOs until they are idle

Fri Jun 15 08:14:17 UTC 2018

On 2018-06-15 07:31 AM, Jason Ekstrand wrote:
> On Thu, Jun 14, 2018 at 10:55 AM, Jason Ekstrand <jason at jlekstrand.net>
> wrote:
>> On June 14, 2018 01:43:12 Michel Dänzer <michel at daenzer.net> wrote:
>> On 2018-06-13 10:26 PM, Jason Ekstrand wrote:
>>>
>>>> The current BO cache puts BOs back into the recycle bucket the moment the
>>>> refcount hits zero.  If the BO is busy, we just don't re-use it until it
>>>> isn't or we re-use it for a render target which we assume will be used
>>>> first for drawing.  This patch series reworks the way the BO cache works
>>>> a
>>>> bit so that we don't ever recycle a busy BO.  On the down side, it means
>>>> that we don't get the "keep busy BOs busy" heuristic (which we have no
>>>> proof actually helps).  On the up side, we can now easily use a MRU
>>>> heuristic instead of round-robin for all buffers and not just the busy
>>>> ones.  Will this be an improvement, a regression or a wash?  I don't know
>>>> but I doubt it will have a major effect one way or another.
>>>>
>>>
>>> FWIW, I suspect this could be a significant loss with overlapping copies
>>> in glamor (e.g. x11perf -copywinwin500), because it won't be able to
>>> reuse the busy BOs anymore (glamor creates a temporary FBO for each
>>> overlapping copy).
>>>
>>
>> That's rather horrific... That seems like something glamour could do
>> better.

As of xserver 1.20, glamor can use GL_MESA_tile_raster_order if
available.

>>  How common are overlapping copies in practice?  Are we talking a
>> couple per frame or hundreds?

X doesn't have a "frame" concept per se, but overlapping copies can be
quite common e.g. when scrolling, or moving windows without a
compositor.

> I did some testing and x11perf -copywinwin500 is... exactly the same with
> or without my patches.  If anything they might improve it by just a hair.

Possible explanations I can think of:

1. Your glamor still has its own FBO cache. Which version of xserver are
you testing with?

2. The i965 driver cache isn't hit even before these changes.

3. Allocating BOs from the kernel is significantly cheaper with i915 vs
amdgpu.

(4. Your GPU is too slow for it to matter. What kind of numbers are you
getting?)

FWIW, on a Radeon R9 285 I get

     360000 trep @   0.0257 msec ( 38900.0/sec): Copy 500x500 from window to window

with glamor's FBO cache and

     240000 trep @   0.0700 msec ( 14300.0/sec): Copy 500x500 from window to window

without (radeonsi's cache doesn't reclaim BOs either until they are
idle), i.e. almost a factor of 3.

-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer