[Mesa-dev] Flatland

Fri Feb 7 02:18:14 PST 2014

For those interested Ian's talk is available here:
http://video.fosdem.org/2014/H1301_Cornil/Saturday/Three_Years_Experience_with_a_Treelike_Shader_IR.webm

On Fri, 2014-02-07 at 00:34 -0500, Connor Abbott wrote:
> Hi,
> 
> So I believe that we can all agree that the tree-based representation
> that GLSL IR currently uses for shaders needs to go. For the benefit
> of those that didn't watch Ian Romanick's talk at FOSDEM, I'll
> reiterate some of the problems with it as of now:
> 
> - All the ir_dereference chains blow up the memory usage, and the
> constant pointer chasing in the recursive algorithms needed to handle
> them is not just cache-unfriendly but "cache-mean."
> 
> - The ir_hierachical_visitor pattern that we currently use for
> optimization/analysis passes has to examine every piece of IR, even
> the irrelevant stuff, making the above problems even worse.
> 
> - Nobody else does it this way, meaning that the existing well-known
> optimizations don't apply as much here, and oftentimes we have to
> write some pretty nasty code in order to make necessary optimizations
> (like tree grafting).
> 
> - It turns out that the original advantage of a tree-based IR, to be
> able to automatically generate pattern-matching code for optimizing
> certain code patterns, only really matters for CPU's with weird
> instruction sets with lots of exotic instructions; GPU's tend to be
> pretty regular and consistent in their ISA's, so being able to
> pattern-match with trees doesn't help us much here.
> 
> Finally, it seems like a lot of important SSA-based passes assume that
> we have a flat IR, and so moving to SSA won't be nearly as beneficial
> as we would like it to be; we could flatten the IR before doing these
> passes, but that would make the first problem even worse. So we can't
> really take advantage of SSA too much either until we have a flat IR.
> 
> The real issue is, how do we let this transition occur gradually, in
> pieces, without breaking existing code? Ian proposed one solution at
> FOSDEM, but here's my idea of another.
> 
> So, my idea is that rather than slowly introducing changes across the
> board, we create the IR in its final form in the beginning, write
> passes to flatten and unflatten the IR, and then piece-by-piece
> rewrite the rest of the compiler. We're going to have to rewrite a lot
> of the passes to support SSA in the first place, so why not convert
> them to a flat IR while we're at it? The benefit of this is that it's
> much easier to do asynchronously and in parallel; rather than
> introducing changes to the entire thing at once, several people can
> convert this and that pass, the frontend, the linker, etc.
> independently. It would entail some extra overhead during the
> transition in the form of the flattening and unflattening passes, but
> I think it would be worth it for the immediate benefits (optimizations
> like GVN-GCM and CSE made possible, etc.).
> 
> The first part to be converted would be my passes to convert to and
> from SSA, so that the compiler optimization part would look like this:
> 
> flatten -> convert to SSA -> (the new hotness) -> out of SSA ->
> unflatten -> (the old stuff)
> 
> Then we gradually convert ast_to_hir, various passes, the linker,
> backends, etc. to this form while now actually having the
> infrastructure to implement any advanced compiler optimization
> designed in the last ~15 years or so by more-or-less copying down the
> pseudocode. Hopefully, then, we can reach a point where we can rip out
> the old IR and the converters.
> 
> So what would this new IR look like? Well, here's my 2 cents (in the
> form of some abridged class definitions, you should get the point...)
> 
> struct ir_calc_source
> {
>     mode; /** < SSA or non-SSA */
>     union {
>         ir_calculation *def; /** < for SSA sources */
>         unsigned int reg; /** < for non-SSA sources */
>     } src;
>     unsigned swizzle : 8;
> };
> 
> struct ir_calc_dest
> {
>     mode; /** < SSA or non-SSA */
>     union {
>         unsigned int reg; /** < for non-SSA destinations */
> 
>         /**
>          * For SSA destinations. Types are needed here because
> normally they're part
>          * of the register, but SSA doesn't have registers.
>          */
>         glsl_type *type;
>     } reg_or_type; /* this name is kinda ugly but couldn't think of
> anything better. */
> };
> 
> /*
>  * This is Ian's name for it, personally I would vote for
> s/ir_instruction/ir_node/ and
>  * call this ir_instruction
>  */
> 
> class ir_calculation
> {
>     ir_calc_dest dest;
>     ir_expression_operation op;
>     unsigned write_mask : 4;
>     ir_calc_source srcs[4];
> };
> 
> class ir_load_var
> {
>     ir_calc_dest dest;
>     ir_variable *src;
> 
>     /**
>      * For array and record loads, whether we're loading a specific
> member or the whole
>      * thing.
>      */
>     bool deref_member;
>     ir_calc_source array_index; /** < for array loads if
> deref_array_index is true */
>     char *record_index; /** < for structure loads */
> };
> 
> class ir_store_var
> {
>     ir_variable *dest;
>     ir_calc_source src;
>     bool deref_member;
>     ir_calc_source array_index; /** < for array loads */
>     char *record_index; /** < for structure loads */
>     unsigned write_mask : 4;
> };
> 
> So ir_variable still exists, but it will only be used for function
> parameters, shader in/outs and uniforms, and arrays and structures.
> Registers will be much more lightweight, only requiring a table with
> each register's type and perhaps uses and definitions. The flattening
> pass, and later ast_to_hir, will emit loads and stores wherever there
> is an ir_dereference now, but there will be an ir_variable -> register
> pass that converts these to moves that will later be eliminated by
> copy propagation (in SSA form, after converting the registers to SSA
> writes). This is similar to how LLVM works, with everything starting
> out allocated on the stack using alloca (equivalent to ir_variables
> here) and accessed explicitly using loads and stores, but then some of
> these loads/stores are optimized out.
> 
> What do you guys think about this? If everybody thinks this is a good
> idea, I can write up a patch series that implements the basic concept
> as well as the flatten and unflatten passes.
> 
> Connor
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev