[Mesa-dev] [PATCH v2] glsl: move uniform calculation to link_uniforms

Wed Jan 20 02:45:22 PST 2016

On Wed, Jan 20, 2016 at 4:44 AM, Tapani Pälli <tapani.palli at intel.com> wrote:
> On 01/20/2016 11:31 AM, Ilia Mirkin wrote:
>>
>> On Wed, Jan 20, 2016 at 4:22 AM, Tapani Pälli <tapani.palli at intel.com>
>> wrote:
>>>
>>> On 01/20/2016 11:16 AM, Ilia Mirkin wrote:
>>>>
>>>> On Wed, Jan 20, 2016 at 4:09 AM, Tapani Pälli <tapani.palli at intel.com>
>>>> wrote:
>>>>>
>>>>> On 01/20/2016 10:26 AM, Ilia Mirkin wrote:
>>>>>>
>>>>>> On Tue, Jan 19, 2016 at 6:35 AM, Tapani Pälli <tapani.palli at intel.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> On 01/19/2016 01:14 PM, Ilia Mirkin wrote:
>>>>>>>>
>>>>>>>> The data structure is a (memory) heap... there appears to be one in
>>>>>>>> mesa/main/mm.h. There's also one in nouveau_heap.h which is quite
>>>>>>>> simple and totally unreliant on nouveau, just happens to be there.
>>>>>>>> How
>>>>>>>> hard would it be to integrate something like that?
>>>>>>>>
>>>>>>>> The trouble with adding slow things is that you forget about them,
>>>>>>>> and
>>>>>>>> they're not _that_ slow, but this stuff adds up.
>>>>>>>
>>>>>>>
>>>>>>> The solution I had in mind is to build a list of empty slots when
>>>>>>> allocating
>>>>>>> remaptable or while finding slots (keep pushing unused empty slots to
>>>>>>> list)
>>>>>>> ... but if possible I would prefer optimization later. First of all,
>>>>>>> this
>>>>>>> is
>>>>>>> quite exotic path to hit with a real program (last words ... yes
>>>>>>> yes).
>>>>>>> Secondly, and more importantly, we can apply for certification
>>>>>>> sooner,
>>>>>>> there
>>>>>>> are very few failures left.
>>>>>>
>>>>>> I see you pushed this patch without concluding this discussion.
>>>>>> Certification may be something that you (personally, as a company,
>>>>>> whatever) are striving for, but that doesn't mean that you get to
>>>>>> ignore reviewer feedback.
>>>>>
>>>>>
>>>>> I'm sorry if you have that impression but I'm not ignoring review
>>>>> feedback.
>>>>> I agree that the find function is not 'optimal' and have planned how to
>>>>> optimize it and I'm happy with any changes if someone wants to optimize
>>>>> and
>>>>> refactor it instead. However, I've noticed this to be not a bottleneck
>>>>> and
>>>>> cold path so because of the schedule I'm asking to do this later.
>>>>>
>>>>>> Perhaps in the end you're actually right, I don't know, but we
>>>>>> certainly didn't agree on anything. I'm inclined to push out a revert
>>>>>> while this is being sorted out.
>>>>>
>>>>>
>>>>> I'm surprised to see this as such a big deal.
>>>>>
>>>>> // Tapani
>>>>>
>>>> The big deal is pushing the patch before concluding the discussion.
>>>>
>>>> Getting back to the matter at hand, what's the absolute worst case
>>>> here? How big does the UniformRemapTable get? How many times can this
>>>> function get called?
>>>
>>>
>>> As example with Intel Haswell we have max as 98304, this is the biggest
>>> size
>>> with HSW.
>>>
>>> This function gets called only if the remaptable has 'holes' in it,
>>> meaning
>>> that explicit uniforms locations get scattered in this available space, I
>>> consider this very rare for anyone or some engine to do. It could only
>>> really happen if you use both explicit locations (non continuous
>>> locations)
>>> and implicit locations together.
>>
>> So... what's the worst case? What would that test look like? How long
>> would it take to execute?
>
>
> A shader that has max amount of uniforms possible. They need to be invidual
> (not arrays or structs), declare half of them with explicit location so that
> they fill every other location (leaving as much holes as possible), this
> should be the worst case.

I played around a bunch with this. I couldn't trigger the badness, and
then I realized it's because you have a bug: You need to use "j - i <
entries", otherwise you end up picking the wrong location.

I also hit another issue when trying to create a bad shader, with
something like:
layout (location=0) uniform Foo {
layout (location=0) float a;
vec4 ff[2048];
};

I end up hitting

link_uniforms.cpp:1241: void
link_assign_uniform_locations(gl_shader_program*, unsigned int,
unsigned int): Assertion `prog->UniformRemapTable[element_loc] ==
((gl_uniform_storage *) -1)' failed.

Not sure if that's related to your work or not.

However it dawns on me that I had misunderstood your algorithm. I
thought it was triggering O(n^2) behaviour *for each search*, but it
is not -- it's, in essence, a linear scan over the remap table. There
could, of course, be a bunch of these scans. And you skip the process
if the remap table is complete, so it really has to be a bunch of
holes in the thing.

Of course the size is finite, so at worst you'd have a remap table of
size N with N/2 holes, and you'd end up scanning up to the first hole
N/2 times. Which is roughly N^2/4, which is still 4M iterations for N
= 4096.

Unfortunately putting such a shader together is a bit of a pain, since
all the uniforms have to be used. I still really think you need to
build a heap. Or at least store a "first empty slot" so that you don't
repeat the iteration *every* time.

  -ilia