[Mesa-dev] [PATCH v2] glsl: move uniform calculation to link_uniforms

Wed Jan 20 01:44:05 PST 2016

On 01/20/2016 11:31 AM, Ilia Mirkin wrote:
> On Wed, Jan 20, 2016 at 4:22 AM, Tapani Pälli <tapani.palli at intel.com> wrote:
>> On 01/20/2016 11:16 AM, Ilia Mirkin wrote:
>>> On Wed, Jan 20, 2016 at 4:09 AM, Tapani Pälli <tapani.palli at intel.com>
>>> wrote:
>>>> On 01/20/2016 10:26 AM, Ilia Mirkin wrote:
>>>>> On Tue, Jan 19, 2016 at 6:35 AM, Tapani Pälli <tapani.palli at intel.com>
>>>>> wrote:
>>>>>> On 01/19/2016 01:14 PM, Ilia Mirkin wrote:
>>>>>>> The data structure is a (memory) heap... there appears to be one in
>>>>>>> mesa/main/mm.h. There's also one in nouveau_heap.h which is quite
>>>>>>> simple and totally unreliant on nouveau, just happens to be there. How
>>>>>>> hard would it be to integrate something like that?
>>>>>>>
>>>>>>> The trouble with adding slow things is that you forget about them, and
>>>>>>> they're not _that_ slow, but this stuff adds up.
>>>>>>
>>>>>> The solution I had in mind is to build a list of empty slots when
>>>>>> allocating
>>>>>> remaptable or while finding slots (keep pushing unused empty slots to
>>>>>> list)
>>>>>> ... but if possible I would prefer optimization later. First of all,
>>>>>> this
>>>>>> is
>>>>>> quite exotic path to hit with a real program (last words ... yes yes).
>>>>>> Secondly, and more importantly, we can apply for certification sooner,
>>>>>> there
>>>>>> are very few failures left.
>>>>> I see you pushed this patch without concluding this discussion.
>>>>> Certification may be something that you (personally, as a company,
>>>>> whatever) are striving for, but that doesn't mean that you get to
>>>>> ignore reviewer feedback.
>>>>
>>>> I'm sorry if you have that impression but I'm not ignoring review
>>>> feedback.
>>>> I agree that the find function is not 'optimal' and have planned how to
>>>> optimize it and I'm happy with any changes if someone wants to optimize
>>>> and
>>>> refactor it instead. However, I've noticed this to be not a bottleneck
>>>> and
>>>> cold path so because of the schedule I'm asking to do this later.
>>>>
>>>>> Perhaps in the end you're actually right, I don't know, but we
>>>>> certainly didn't agree on anything. I'm inclined to push out a revert
>>>>> while this is being sorted out.
>>>>
>>>> I'm surprised to see this as such a big deal.
>>>>
>>>> // Tapani
>>>>
>>> The big deal is pushing the patch before concluding the discussion.
>>>
>>> Getting back to the matter at hand, what's the absolute worst case
>>> here? How big does the UniformRemapTable get? How many times can this
>>> function get called?
>>
>> As example with Intel Haswell we have max as 98304, this is the biggest size
>> with HSW.
>>
>> This function gets called only if the remaptable has 'holes' in it, meaning
>> that explicit uniforms locations get scattered in this available space, I
>> consider this very rare for anyone or some engine to do. It could only
>> really happen if you use both explicit locations (non continuous locations)
>> and implicit locations together.
> So... what's the worst case? What would that test look like? How long
> would it take to execute?

A shader that has max amount of uniforms possible. They need to be 
invidual (not arrays or structs), declare half of them with explicit 
location so that they fill every other location (leaving as much holes 
as possible), this should be the worst case.

> The fact that it's rare isn't that interesting to me. You put in a
> very slow algorithm when a faster one isn't considerably harder to do.
> Basically a linked list of free/used areas... search through them to
> find a block of the appropriate size, and split it into used/non-used
> sections (combining with adjacent areas). This is what nouveau_heap
> implements, and is fully reusable (if moved). Or the mesa/main/mm impl
> which at first glance implements the same thing, but I'm not 100%
> sure.
>
> But perhaps the worst case isn't as bad as I think it is. So what
> would a worst case shader/usage have to look like? Shouldn't be too
> difficult to write and benchmark, and if it's still fairly fast, that
> would counter my performance argument quite nicely as well.

Sorry but I don't feel I need to prove something here, I haven't yet 
seen any actual proof that my algorithm would be too slow. When we have 
rest of the defects fixed I will get back to this as well as other 
planned cleanups for program resource lists.

// Tapani