[Mesa-dev] [PATCH v2] glsl: move uniform calculation to link_uniforms

Wed Jan 20 03:02:37 PST 2016

On 01/20/2016 12:45 PM, Ilia Mirkin wrote:
> On Wed, Jan 20, 2016 at 4:44 AM, Tapani Pälli <tapani.palli at intel.com> wrote:
>> On 01/20/2016 11:31 AM, Ilia Mirkin wrote:
>>> On Wed, Jan 20, 2016 at 4:22 AM, Tapani Pälli <tapani.palli at intel.com>
>>> wrote:
>>>> On 01/20/2016 11:16 AM, Ilia Mirkin wrote:
>>>>> On Wed, Jan 20, 2016 at 4:09 AM, Tapani Pälli <tapani.palli at intel.com>
>>>>> wrote:
>>>>>> On 01/20/2016 10:26 AM, Ilia Mirkin wrote:
>>>>>>> On Tue, Jan 19, 2016 at 6:35 AM, Tapani Pälli <tapani.palli at intel.com>
>>>>>>> wrote:
>>>>>>>> On 01/19/2016 01:14 PM, Ilia Mirkin wrote:
>>>>>>>>> The data structure is a (memory) heap... there appears to be one in
>>>>>>>>> mesa/main/mm.h. There's also one in nouveau_heap.h which is quite
>>>>>>>>> simple and totally unreliant on nouveau, just happens to be there.
>>>>>>>>> How
>>>>>>>>> hard would it be to integrate something like that?
>>>>>>>>>
>>>>>>>>> The trouble with adding slow things is that you forget about them,
>>>>>>>>> and
>>>>>>>>> they're not _that_ slow, but this stuff adds up.
>>>>>>>>
>>>>>>>> The solution I had in mind is to build a list of empty slots when
>>>>>>>> allocating
>>>>>>>> remaptable or while finding slots (keep pushing unused empty slots to
>>>>>>>> list)
>>>>>>>> ... but if possible I would prefer optimization later. First of all,
>>>>>>>> this
>>>>>>>> is
>>>>>>>> quite exotic path to hit with a real program (last words ... yes
>>>>>>>> yes).
>>>>>>>> Secondly, and more importantly, we can apply for certification
>>>>>>>> sooner,
>>>>>>>> there
>>>>>>>> are very few failures left.
>>>>>>> I see you pushed this patch without concluding this discussion.
>>>>>>> Certification may be something that you (personally, as a company,
>>>>>>> whatever) are striving for, but that doesn't mean that you get to
>>>>>>> ignore reviewer feedback.
>>>>>>
>>>>>> I'm sorry if you have that impression but I'm not ignoring review
>>>>>> feedback.
>>>>>> I agree that the find function is not 'optimal' and have planned how to
>>>>>> optimize it and I'm happy with any changes if someone wants to optimize
>>>>>> and
>>>>>> refactor it instead. However, I've noticed this to be not a bottleneck
>>>>>> and
>>>>>> cold path so because of the schedule I'm asking to do this later.
>>>>>>
>>>>>>> Perhaps in the end you're actually right, I don't know, but we
>>>>>>> certainly didn't agree on anything. I'm inclined to push out a revert
>>>>>>> while this is being sorted out.
>>>>>>
>>>>>> I'm surprised to see this as such a big deal.
>>>>>>
>>>>>> // Tapani
>>>>>>
>>>>> The big deal is pushing the patch before concluding the discussion.
>>>>>
>>>>> Getting back to the matter at hand, what's the absolute worst case
>>>>> here? How big does the UniformRemapTable get? How many times can this
>>>>> function get called?
>>>>
>>>> As example with Intel Haswell we have max as 98304, this is the biggest
>>>> size
>>>> with HSW.
>>>>
>>>> This function gets called only if the remaptable has 'holes' in it,
>>>> meaning
>>>> that explicit uniforms locations get scattered in this available space, I
>>>> consider this very rare for anyone or some engine to do. It could only
>>>> really happen if you use both explicit locations (non continuous
>>>> locations)
>>>> and implicit locations together.
>>> So... what's the worst case? What would that test look like? How long
>>> would it take to execute?
>>
>> A shader that has max amount of uniforms possible. They need to be invidual
>> (not arrays or structs), declare half of them with explicit location so that
>> they fill every other location (leaving as much holes as possible), this
>> should be the worst case.
> I played around a bunch with this. I couldn't trigger the badness, and
> then I realized it's because you have a bug: You need to use "j - i <
> entries", otherwise you end up picking the wrong location.

True, thanks for finding this out.

> I also hit another issue when trying to create a bad shader, with
> something like:
> layout (location=0) uniform Foo {
> layout (location=0) float a;
> vec4 ff[2048];
> };
>
> I end up hitting
>
> link_uniforms.cpp:1241: void
> link_assign_uniform_locations(gl_shader_program*, unsigned int,
> unsigned int): Assertion `prog->UniformRemapTable[element_loc] ==
> ((gl_uniform_storage *) -1)' failed.
>
> Not sure if that's related to your work or not.

This looks like unrelated bug, we should probably catch this in parser 
already (?)

> However it dawns on me that I had misunderstood your algorithm. I
> thought it was triggering O(n^2) behaviour *for each search*, but it
> is not -- it's, in essence, a linear scan over the remap table. There
> could, of course, be a bunch of these scans. And you skip the process
> if the remap table is complete, so it really has to be a bunch of
> holes in the thing.
>
> Of course the size is finite, so at worst you'd have a remap table of
> size N with N/2 holes, and you'd end up scanning up to the first hole
> N/2 times. Which is roughly N^2/4, which is still 4M iterations for N
> = 4096.
>
> Unfortunately putting such a shader together is a bit of a pain, since
> all the uniforms have to be used. I still really think you need to
> build a heap. Or at least store a "first empty slot" so that you don't
> repeat the iteration *every* time.

Yeah, I agree "first empty slot" would be good addition and will already 
help a lot in problematic cases.

// Tapani