[Mesa-dev] [PATCH v2] glsl: move uniform calculation to link_uniforms

Wed Jan 20 01:31:22 PST 2016

On Wed, Jan 20, 2016 at 4:22 AM, Tapani Pälli <tapani.palli at intel.com> wrote:
> On 01/20/2016 11:16 AM, Ilia Mirkin wrote:
>>
>> On Wed, Jan 20, 2016 at 4:09 AM, Tapani Pälli <tapani.palli at intel.com>
>> wrote:
>>>
>>> On 01/20/2016 10:26 AM, Ilia Mirkin wrote:
>>>>
>>>> On Tue, Jan 19, 2016 at 6:35 AM, Tapani Pälli <tapani.palli at intel.com>
>>>> wrote:
>>>>>
>>>>> On 01/19/2016 01:14 PM, Ilia Mirkin wrote:
>>>>>>
>>>>>> The data structure is a (memory) heap... there appears to be one in
>>>>>> mesa/main/mm.h. There's also one in nouveau_heap.h which is quite
>>>>>> simple and totally unreliant on nouveau, just happens to be there. How
>>>>>> hard would it be to integrate something like that?
>>>>>>
>>>>>> The trouble with adding slow things is that you forget about them, and
>>>>>> they're not _that_ slow, but this stuff adds up.
>>>>>
>>>>>
>>>>> The solution I had in mind is to build a list of empty slots when
>>>>> allocating
>>>>> remaptable or while finding slots (keep pushing unused empty slots to
>>>>> list)
>>>>> ... but if possible I would prefer optimization later. First of all,
>>>>> this
>>>>> is
>>>>> quite exotic path to hit with a real program (last words ... yes yes).
>>>>> Secondly, and more importantly, we can apply for certification sooner,
>>>>> there
>>>>> are very few failures left.
>>>>
>>>> I see you pushed this patch without concluding this discussion.
>>>> Certification may be something that you (personally, as a company,
>>>> whatever) are striving for, but that doesn't mean that you get to
>>>> ignore reviewer feedback.
>>>
>>>
>>> I'm sorry if you have that impression but I'm not ignoring review
>>> feedback.
>>> I agree that the find function is not 'optimal' and have planned how to
>>> optimize it and I'm happy with any changes if someone wants to optimize
>>> and
>>> refactor it instead. However, I've noticed this to be not a bottleneck
>>> and
>>> cold path so because of the schedule I'm asking to do this later.
>>>
>>>> Perhaps in the end you're actually right, I don't know, but we
>>>> certainly didn't agree on anything. I'm inclined to push out a revert
>>>> while this is being sorted out.
>>>
>>>
>>> I'm surprised to see this as such a big deal.
>>>
>>> // Tapani
>>>
>> The big deal is pushing the patch before concluding the discussion.
>>
>> Getting back to the matter at hand, what's the absolute worst case
>> here? How big does the UniformRemapTable get? How many times can this
>> function get called?
>
>
> As example with Intel Haswell we have max as 98304, this is the biggest size
> with HSW.
>
> This function gets called only if the remaptable has 'holes' in it, meaning
> that explicit uniforms locations get scattered in this available space, I
> consider this very rare for anyone or some engine to do. It could only
> really happen if you use both explicit locations (non continuous locations)
> and implicit locations together.

So... what's the worst case? What would that test look like? How long
would it take to execute?

The fact that it's rare isn't that interesting to me. You put in a
very slow algorithm when a faster one isn't considerably harder to do.
Basically a linked list of free/used areas... search through them to
find a block of the appropriate size, and split it into used/non-used
sections (combining with adjacent areas). This is what nouveau_heap
implements, and is fully reusable (if moved). Or the mesa/main/mm impl
which at first glance implements the same thing, but I'm not 100%
sure.

But perhaps the worst case isn't as bad as I think it is. So what
would a worst case shader/usage have to look like? Shouldn't be too
difficult to write and benchmark, and if it's still fairly fast, that
would counter my performance argument quite nicely as well.

  -ilia