[Mesa-dev] [RFC] dynamic IB size tuning for radeonsi

Nicolai Hähnle nhaehnle at gmail.com
Sun Apr 17 18:13:27 UTC 2016


On 15.04.2016 12:50, Grigori Goronzy wrote:
> apps that cause a lot of synchronization benefit from small IB
> sizes. The current IB size is a bit on the large side for this class
> of apps. On the other hand, if there isn't much synchronization going
> on, increasing the IB size can slightly improve performance, too.
>
> Here's a quick hack that tunes the IB size based on feedback from
> buffer_wait_time. What do you think? I see good results with Unigine
> Heaven (no synchronization, benefits from larger IB size), Metro Last
> Light (lots of synchronization, benefits from small IBs) as well as
> OpenArena and Xonotic (same).

Interesting, and thanks for poking at this issue. I've been thinking 
about tuning IB sizes as well. I'd like for us to get this right, so I 
wonder: What's your theory for _why_ your change helps?

I'll be honest with you: Right now, I think your approach contains too 
much unexplained "magic". What's the theory that explains using buffer 
wait averages in this way?

My theory for why your change helps is about CPU/GPU parallelism. When 
we wait for buffer idle, this most likely means the GPU becomes idle.[1] 
If you use a large IB to start the GPU up again, you'll wait a longer 
time before the GPU starts doing work again. Basically, in ASCII art:

                 GPU idle
GPU =========+..............+=====
       |      |              |
CPU ==+......+==============+=====
        buffer
         wait

By reducing the size of the IB, the picture changes like this:

              GPU idle
GPU =========+......+=====
       |      |      |
CPU ==+......+======+=====
        buffer
        wait

It takes a shorter amount of CPU time before the GPU gets new work, the 
GPU is utilized more fully and the program runs faster.

If this explanation is correct and all there is to it, then it suggest 
the logic for when IBs should be shorter. Basically, we should use short 
IBs when the GPU is idle.[2]

There are a bunch of different options. A simple one that comes closest 
to what your patch does - without actually querying for GPU idle - is to 
just make the first IB after each buffer wait a small one. The length of 
the buffer wait doesn't seem important because what we need to address 
is the fact that the GPU is idle. That's a boolean matter.

Because of [1] it would probably be a better approach to use fences to 
determine whether or how many previous IBs are still in flight.

Cheers,
Nicolai

[1] Although not necessarily. We may be trying to map a buffer that is 
still in flight, but only referenced by an older IB.

[2] Or about to become idle, since we want to keep the pipeline full. 
Although both doesn't apply in the rare case where the CPU driver 
overhead for constructing the IBs is consistently higher than the GPU 
work that needs to be done for those IBs. In that case, we should still 
use large IBs to reduce the driver overhead.

> Note: this patch applies on top of Bas' constant engine patchset.
>
> Grigori
>
> In-Reply-To:
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>


More information about the mesa-dev mailing list