[Mesa-dev] [PATCH] st/mesa: use u_bit_scan64() on 64-bit CPUs

Fri Oct 21 15:24:40 UTC 2016

On Fri, Oct 21, 2016 at 4:41 PM, Roland Scheidegger <sroland at vmware.com> wrote:
> Am 21.10.2016 um 15:17 schrieb Marek Olšák:
>> On Oct 21, 2016 12:06 PM, "Jan Ziak" <0xe2.0x9a.0x9b at gmail.com
>> <mailto:0xe2.0x9a.0x9b at gmail.com>> wrote:
>>>
>>> On Fri, Oct 21, 2016 at 12:04 PM, Marek Olšák <maraeo at gmail.com
>> <mailto:maraeo at gmail.com>> wrote:
>>> > This won't make it faster.
>>>
>>> Why?
>>
>> It's obviously a micro optimization that adds more stuff than it
>> benefits the runtime. I don't think that real performance improvements
>> will be so simple and obvious.
>>
>> Marek
>>
>
> Still, shouldn't it be faster though, even if just very very minimally so?

If at least one of the atoms gets updated, the overhead of the update
function will be so high that the saved couple of instructions in the
while loop will be unmeasurable.

Yes, st_validate_state is pretty huge in profilers. However, the
problem is not only in st_validate_state and the atoms, but also in
most of mesa/main that sets the _NEW_* flags, which in turn invokes
more state updates than needed. That's a big problem of mesa/main and
quite a complex one to solve. It's also partly a Gallium design
problem, because OpenGL doesn't have per-stage shader resource slots
like DX11 has, but instead all texture etc. slots are global, e.g.
texture unit 0 is slot 0 in all shader stages.

This is going into some deep design issues and even if we had a
perfect design, how much performance could we get with that? Isn't a
threaded GL dispatch 100x easier than rewriting Mesa?

Marek