[Mesa-dev] Flatland

Connor Abbott cwabbott0 at gmail.com
Fri Feb 7 12:54:28 PST 2014


On Fri, Feb 7, 2014 at 3:13 PM, Ian Romanick <idr at freedesktop.org> wrote:
> On 02/06/2014 09:34 PM, Connor Abbott wrote:
>> Hi,
>>
>> So I believe that we can all agree that the tree-based representation
>> that GLSL IR currently uses for shaders needs to go. For the benefit
>> of those that didn't watch Ian Romanick's talk at FOSDEM, I'll
>> reiterate some of the problems with it as of now:
>>
>> - All the ir_dereference chains blow up the memory usage, and the
>> constant pointer chasing in the recursive algorithms needed to handle
>> them is not just cache-unfriendly but "cache-mean."
>>
>> - The ir_hierachical_visitor pattern that we currently use for
>> optimization/analysis passes has to examine every piece of IR, even
>> the irrelevant stuff, making the above problems even worse.
>>
>> - Nobody else does it this way, meaning that the existing well-known
>> optimizations don't apply as much here, and oftentimes we have to
>> write some pretty nasty code in order to make necessary optimizations
>> (like tree grafting).
>>
>> - It turns out that the original advantage of a tree-based IR, to be
>> able to automatically generate pattern-matching code for optimizing
>> certain code patterns, only really matters for CPU's with weird
>> instruction sets with lots of exotic instructions; GPU's tend to be
>> pretty regular and consistent in their ISA's, so being able to
>> pattern-match with trees doesn't help us much here.
>>
>> Finally, it seems like a lot of important SSA-based passes assume that
>> we have a flat IR, and so moving to SSA won't be nearly as beneficial
>> as we would like it to be; we could flatten the IR before doing these
>> passes, but that would make the first problem even worse. So we can't
>> really take advantage of SSA too much either until we have a flat IR.
>>
>> The real issue is, how do we let this transition occur gradually, in
>> pieces, without breaking existing code? Ian proposed one solution at
>> FOSDEM, but here's my idea of another.
>>
>> So, my idea is that rather than slowly introducing changes across the
>> board, we create the IR in its final form in the beginning, write
>
> That's not how you ship production software.  Except in the most dire
> circumstances, we can't just put everything on hold for six months or a
> year while we re-write the world and hope it works out.  We did that
> with the original compiler re-write.  I think it was the right choice,
> but even that was only after quite a lot of debate and soul searching...
> and it was a big risk.
>
> A plan that lets you incrementally make improvements without introducing
> undue risk is almost always the better choice.

Except, that's really what this plan is. I believe that introducing
the new IR wouldn't be a six-month task at all, but more like a couple
weeks (maybe a few months for me since I'm doing this in my free
time), and the rest of the compiler would be completely unaffected, or
perhaps only a few trivial changes to how inputs are accessed in e.g.
texture fetches and if conditions (and even then we don't have to do
that at first - at the beginning, we can just use loads/stores to
ir_variables to talk to parts of the IR that don't understand
registers/SSA yet) - it's even easier than the work I did for SSA,
because the changes to the core IR are roughly the same in terms of
invasiveness, except the passes to convert back-and-forth to the new
form (flatten and unflatten) are even easier. Doing this will bring
immediate benefits, like being able to do GVN-GCM and other such
things without running out of memory and taking forever. Next,
different parts of the compiler can be converted, but the system will
still be useable as-is, and each optimization pass that we convert
will again bring an immediate benefit in that it will run faster. And
if we never complete the transition or realize early on that we need
to change it, then that's still ok - we still have the old IR to fall
back on, and we'll have improved things already.

In the end, any plan to transition to a flat IR is going to cause some
risk, but I believe that this approach will lead to the least amount
of risk for the greatest amount of gain up-front - even less than the
plan you proposed at FOSDEM.

>
>> passes to flatten and unflatten the IR, and then piece-by-piece
>> rewrite the rest of the compiler. We're going to have to rewrite a lot
>> of the passes to support SSA in the first place, so why not convert
>> them to a flat IR while we're at it? The benefit of this is that it's
>> much easier to do asynchronously and in parallel; rather than
>> introducing changes to the entire thing at once, several people can
>> convert this and that pass, the frontend, the linker, etc.
>> independently. It would entail some extra overhead during the
>> transition in the form of the flattening and unflattening passes, but
>> I think it would be worth it for the immediate benefits (optimizations
>> like GVN-GCM and CSE made possible, etc.).
>>
>> The first part to be converted would be my passes to convert to and
>> from SSA, so that the compiler optimization part would look like this:
>>
>> flatten -> convert to SSA -> (the new hotness) -> out of SSA ->
>> unflatten -> (the old stuff)
>>
>> Then we gradually convert ast_to_hir, various passes, the linker,
>> backends, etc. to this form while now actually having the
>> infrastructure to implement any advanced compiler optimization
>> designed in the last ~15 years or so by more-or-less copying down the
>> pseudocode. Hopefully, then, we can reach a point where we can rip out
>> the old IR and the converters.
>>
>> So what would this new IR look like? Well, here's my 2 cents (in the
>> form of some abridged class definitions, you should get the point...)
>>
>> struct ir_calc_source
>> {
>>     mode; /** < SSA or non-SSA */
>>     union {
>>         ir_calculation *def; /** < for SSA sources */
>>         unsigned int reg; /** < for non-SSA sources */
>>     } src;
>>     unsigned swizzle : 8;
>> };
>>
>> struct ir_calc_dest
>> {
>>     mode; /** < SSA or non-SSA */
>>     union {
>>         unsigned int reg; /** < for non-SSA destinations */
>>
>>         /**
>>          * For SSA destinations. Types are needed here because
>> normally they're part
>>          * of the register, but SSA doesn't have registers.
>>          */
>>         glsl_type *type;
>>     } reg_or_type; /* this name is kinda ugly but couldn't think of
>> anything better. */
>> };
>>
>> /*
>>  * This is Ian's name for it, personally I would vote for
>> s/ir_instruction/ir_node/ and
>>  * call this ir_instruction
>>  */
>>
>> class ir_calculation
>> {
>>     ir_calc_dest dest;
>>     ir_expression_operation op;
>>     unsigned write_mask : 4;
>>     ir_calc_source srcs[4];
>> };
>>
>> class ir_load_var
>> {
>>     ir_calc_dest dest;
>>     ir_variable *src;
>>
>>     /**
>>      * For array and record loads, whether we're loading a specific
>> member or the whole
>>      * thing.
>>      */
>>     bool deref_member;
>>     ir_calc_source array_index; /** < for array loads if
>> deref_array_index is true */
>>     char *record_index; /** < for structure loads */
>> };
>>
>> class ir_store_var
>> {
>>     ir_variable *dest;
>>     ir_calc_source src;
>>     bool deref_member;
>>     ir_calc_source array_index; /** < for array loads */
>>     char *record_index; /** < for structure loads */
>>     unsigned write_mask : 4;
>> };
>>
>> So ir_variable still exists, but it will only be used for function
>> parameters, shader in/outs and uniforms, and arrays and structures.
>> Registers will be much more lightweight, only requiring a table with
>> each register's type and perhaps uses and definitions. The flattening
>> pass, and later ast_to_hir, will emit loads and stores wherever there
>> is an ir_dereference now, but there will be an ir_variable -> register
>> pass that converts these to moves that will later be eliminated by
>> copy propagation (in SSA form, after converting the registers to SSA
>> writes). This is similar to how LLVM works, with everything starting
>> out allocated on the stack using alloca (equivalent to ir_variables
>> here) and accessed explicitly using loads and stores, but then some of
>> these loads/stores are optimized out.
>>
>> What do you guys think about this? If everybody thinks this is a good
>> idea, I can write up a patch series that implements the basic concept
>> as well as the flatten and unflatten passes.
>>
>> Connor
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
>


More information about the mesa-dev mailing list