[Mesa-dev] Ideas on loop unrolling, loop examples, and my GSoC-blog

Fri Jun 5 04:39:06 PDT 2015

2015-06-01 12:15 GMT+02:00 Eero Tamminen <eero.t.tamminen at intel.com>:
> Hi,
>
> On 05/29/2015 07:04 PM, Connor Abbott wrote:
>>
>> On Fri, May 29, 2015 at 6:23 AM, Eero Tamminen
>> <eero.t.tamminen at intel.com> wrote:
>>>
>>> On 05/28/2015 10:19 PM, Thomas Helland wrote:
>>>>
>>>>
>>>> One more thing;
>>>> Is there a limit where the loop body gets so large that we
>>>> want to decide that "gah, this sucks, no point in unrolling this"?
>>>> I imagine as the loops get large there will be a case of
>>>> diminishing returns from the unrolling?
>>>
>>>
>>>
>>> I think only backend can say something to that.   You e.g. give backend
>>> unrolled and non-unrolled versions and backend decides which one is
>>> better
>>> (after applying additional optimizations)...
>>
>>
>> I don't really think it's going to be too good of an idea to do that,
>> mainly because it means you'd be duplicating a lot of work for the
>> normal vs. unrolled versions, and there might be some really large
>> loops where generating the unrolled version is going to kill your CPU
>> -- doing any amount of work that's proportional to the number of times
>> the loop runs, without any limit, seems like a recipe for disaster.
>
>
> Sure it should have sanity bounds, but my point was more that it depends on
> many factors and even backend doesn't necessarily know about all the factors
> up front either, because some of them depend on the passes done by the
> backend.
>
>
>> In GLSL IR, we've been fairly lax about figuring out when unrolling is
>> helpful and unhelpful -- we just have a simple "node count" plus a
>> threshold (as well as a few other heuristics). In NIR, we could
>> similarly have an instruction count plus a threshold and port over the
>> heuristics to whatever extent possible.  We do have some logic for
>> figuring out if an array access is constant after unrolling, and it
>> seems like we'd want to keep that around. The next level of
>> sophistication, I guess, is to give the backend a callback to give an
>> estimation of the execution cost of certain operations. For example,
>> unless a negate/absolute value instruction is used by something that
>> can't handle the modifier, then on i965 the cost of those instructions
>> would be 0. I think that would get us most of the way there to
>> something accurate, without needing to do an undue amount of work (in
>> terms of CPU time and man-effort).
>
>
> Some factors affecting whether to unroll or not:
> - which one can make pull into push
> - which one allows using higher SIMD mode
> - which one can do better latency compensation / scheduling
>   for memory accesses (e.g. texture fetches)
> - instruction count
> - instruction cache size
> - cycles (when they differ between instructions)
>
> How much of this information frontend has or can request from backend
> without it needing to actually compile both versions?
>
>
>         - Eero
>
> (In offline compiler compilation CPU usage would be less of an issue.)
>
>

A framework for the backends to report the cost of each operation
would be helpful no only for loop unrolling but for things like alg-opt.
I'm not sure if I'll get that far though, so I will prioritize getting things
up and running before poking around to much with thresholds and
heuristics. I imagine I will simply set a constant max_instructions
variable and an instructions_after_unrolling variable, and go with that
until things are working and bug free. Maybe add some thresholds to
compiler_options that can be used by the pass to unroll when it is likely
to be beneficial. There is obviously a whole lot of things that could
have a saying on whether to unroll, or to what extent, but I'm not
sure if we want to complicate things to much.
It doesn't seem like loops in shaders are that common.

-Thomas