[Mesa-dev] [RFC] dynamic IB size tuning for radeonsi

Sun Apr 17 18:52:59 UTC 2016

> Interesting, and thanks for poking at this issue. I've been thinking
> about tuning IB sizes as well. I'd like for us to get this right, so I
> wonder: What's your theory for _why_ your change helps?
> 

See below. I think you discovered it yourself.

> I'll be honest with you: Right now, I think your approach contains too
> much unexplained "magic". What's the theory that explains using buffer
> wait averages in this way?
> 

I agree that there is too much, magic, e.g. the cutoff buffer-wait-time 
for small IBs is quite magical and can't be explained well.

> My theory for why your change helps is about CPU/GPU parallelism. When
> we wait for buffer idle, this most likely means the GPU becomes
> idle.[1] If you use a large IB to start the GPU up again, you'll wait
> a longer time before the GPU starts doing work again. Basically, in
> ASCII art:
> 
>                 GPU idle
> GPU =========+..............+=====
>       |      |              |
> CPU ==+......+==============+=====
>        buffer
>         wait
> 
> By reducing the size of the IB, the picture changes like this:
> 
>              GPU idle
> GPU =========+......+=====
>       |      |      |
> CPU ==+......+======+=====
>        buffer
>        wait
> 
> It takes a shorter amount of CPU time before the GPU gets new work,
> the GPU is utilized more fully and the program runs faster.
> 

Yes, that is the basic idea. :)
When it is likely that we need to synchronize work with the GPU later 
on, it pays off to queue work sooner to keep the GPU busy most of the 
time, and that is enforced by smaller IBs.

> If this explanation is correct and all there is to it, then it suggest
> the logic for when IBs should be shorter. Basically, we should use
> short IBs when the GPU is idle.[2]
> 

Right, but the problem is to cheaply and reliably determine idleness.

> There are a bunch of different options. A simple one that comes
> closest to what your patch does - without actually querying for GPU
> idle - is to just make the first IB after each buffer wait a small
> one. The length of the buffer wait doesn't seem important because what
> we need to address is the fact that the GPU is idle. That's a boolean
> matter.
> 

Let me give that a try, sounds like a good idea. Particularly, we could 
use *really* small IBs without affecting general performance in this 
case, at least in theory.

For the moment, the slightly "magic" way with buffer-wait-time still 
leads to consistent improvements (I did not see any regressions, 
either). So I'll try to describe the magic somewhat in a upcoming patch 
and hope that's alright for inclusion.

Grigori

PS: "about to become idle" is probably hard to measure, so the small IB 
approach maybe has some merit even if we can easily check idleness.