[Mesa-dev] Flatland

Thu Feb 6 21:34:44 PST 2014

Hi,

So I believe that we can all agree that the tree-based representation
that GLSL IR currently uses for shaders needs to go. For the benefit
of those that didn't watch Ian Romanick's talk at FOSDEM, I'll
reiterate some of the problems with it as of now:

- All the ir_dereference chains blow up the memory usage, and the
constant pointer chasing in the recursive algorithms needed to handle
them is not just cache-unfriendly but "cache-mean."

- The ir_hierachical_visitor pattern that we currently use for
optimization/analysis passes has to examine every piece of IR, even
the irrelevant stuff, making the above problems even worse.

- Nobody else does it this way, meaning that the existing well-known
optimizations don't apply as much here, and oftentimes we have to
write some pretty nasty code in order to make necessary optimizations
(like tree grafting).

- It turns out that the original advantage of a tree-based IR, to be
able to automatically generate pattern-matching code for optimizing
certain code patterns, only really matters for CPU's with weird
instruction sets with lots of exotic instructions; GPU's tend to be
pretty regular and consistent in their ISA's, so being able to
pattern-match with trees doesn't help us much here.

Finally, it seems like a lot of important SSA-based passes assume that
we have a flat IR, and so moving to SSA won't be nearly as beneficial
as we would like it to be; we could flatten the IR before doing these
passes, but that would make the first problem even worse. So we can't
really take advantage of SSA too much either until we have a flat IR.

The real issue is, how do we let this transition occur gradually, in
pieces, without breaking existing code? Ian proposed one solution at
FOSDEM, but here's my idea of another.

So, my idea is that rather than slowly introducing changes across the
board, we create the IR in its final form in the beginning, write
passes to flatten and unflatten the IR, and then piece-by-piece
rewrite the rest of the compiler. We're going to have to rewrite a lot
of the passes to support SSA in the first place, so why not convert
them to a flat IR while we're at it? The benefit of this is that it's
much easier to do asynchronously and in parallel; rather than
introducing changes to the entire thing at once, several people can
convert this and that pass, the frontend, the linker, etc.
independently. It would entail some extra overhead during the
transition in the form of the flattening and unflattening passes, but
I think it would be worth it for the immediate benefits (optimizations
like GVN-GCM and CSE made possible, etc.).

The first part to be converted would be my passes to convert to and
from SSA, so that the compiler optimization part would look like this:

flatten -> convert to SSA -> (the new hotness) -> out of SSA ->
unflatten -> (the old stuff)

Then we gradually convert ast_to_hir, various passes, the linker,
backends, etc. to this form while now actually having the
infrastructure to implement any advanced compiler optimization
designed in the last ~15 years or so by more-or-less copying down the
pseudocode. Hopefully, then, we can reach a point where we can rip out
the old IR and the converters.

So what would this new IR look like? Well, here's my 2 cents (in the
form of some abridged class definitions, you should get the point...)

struct ir_calc_source
{
    mode; /** < SSA or non-SSA */
    union {
        ir_calculation *def; /** < for SSA sources */
        unsigned int reg; /** < for non-SSA sources */
    } src;
    unsigned swizzle : 8;
};

struct ir_calc_dest
{
    mode; /** < SSA or non-SSA */
    union {
        unsigned int reg; /** < for non-SSA destinations */

        /**
         * For SSA destinations. Types are needed here because
normally they're part
         * of the register, but SSA doesn't have registers.
         */
        glsl_type *type;
    } reg_or_type; /* this name is kinda ugly but couldn't think of
anything better. */
};

/*
 * This is Ian's name for it, personally I would vote for
s/ir_instruction/ir_node/ and
 * call this ir_instruction
 */

class ir_calculation
{
    ir_calc_dest dest;
    ir_expression_operation op;
    unsigned write_mask : 4;
    ir_calc_source srcs[4];
};

class ir_load_var
{
    ir_calc_dest dest;
    ir_variable *src;

    /**
     * For array and record loads, whether we're loading a specific
member or the whole
     * thing.
     */
    bool deref_member;
    ir_calc_source array_index; /** < for array loads if
deref_array_index is true */
    char *record_index; /** < for structure loads */
};

class ir_store_var
{
    ir_variable *dest;
    ir_calc_source src;
    bool deref_member;
    ir_calc_source array_index; /** < for array loads */
    char *record_index; /** < for structure loads */
    unsigned write_mask : 4;
};

So ir_variable still exists, but it will only be used for function
parameters, shader in/outs and uniforms, and arrays and structures.
Registers will be much more lightweight, only requiring a table with
each register's type and perhaps uses and definitions. The flattening
pass, and later ast_to_hir, will emit loads and stores wherever there
is an ir_dereference now, but there will be an ir_variable -> register
pass that converts these to moves that will later be eliminated by
copy propagation (in SSA form, after converting the registers to SSA
writes). This is similar to how LLVM works, with everything starting
out allocated on the stack using alloca (equivalent to ir_variables
here) and accessed explicitly using loads and stores, but then some of
these loads/stores are optimized out.

What do you guys think about this? If everybody thinks this is a good
idea, I can write up a patch series that implements the basic concept
as well as the flatten and unflatten passes.

Connor